date:20170518

Re: [PATCH v2 00/10] rt2x00: rt2x00: improve calling conventions for register accessors

2017-05-18 Thread Kalle Valo

Arnd Bergmann  writes:

> I've managed to split up my long patch into a series of reasonble
> steps now.
>
> The first two are required to fix a regression from commit 41977e86c984
> ("rt2x00: add support for MT7620"), the rest are just cleanups to
> have a consistent state across all the register access functions.

Can these all go to 4.13 or would you prefer me to push the first two
4.12? Or what?

-- 
Kalle Valo

Re: [PATCH v2 net-next 1/2] include: linux: Add helper function to check phy interface mode

2017-05-18 Thread Iyappan Subramanian

On Thu, May 18, 2017 at 3:19 PM, Florian Fainelli  wrote:
> On 05/18/2017 03:13 PM, Iyappan Subramanian wrote:
>> Added helper function that checks phy_mode is RGMII (all variants)
>> 'bool phy_interface_mode_is_rgmii(phy_interface_t mode)'
>>
>> Changed the following function, to use the above.
>> 'bool phy_interface_is_rgmii(struct phy_device *phydev)'
>>
>> Signed-off-by: Iyappan Subramanian 
>> Suggested-by: Florian Fainelli 
>> Suggested-by: Andrew Lunn 
>
> Not sure why you have chosen include: linux as the subject since all
> changes done to that file typically had the "phy: " prefix, but the code
> changes are fine, thanks!

Thanks Florian.  I'll keep that in mind for future header file patches.  :-)

For now, if David Miller requests for the subject line change, I'll
re-post the patch.

>
> Reviewed-by: Florian Fainelli 
>
>> ---
>>  include/linux/phy.h | 14 --
>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 54ef458..5a808a2 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -716,14 +716,24 @@ static inline bool phy_is_internal(struct phy_device 
>> *phydev)
>>  }
>>
>>  /**
>> + * phy_interface_mode_is_rgmii - Convenience function for testing if a
>> + * PHY interface mode is RGMII (all variants)
>> + * @mode: the phy_interface_t enum
>> + */
>> +static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode)
>> +{
>> + return mode >= PHY_INTERFACE_MODE_RGMII &&
>> + mode <= PHY_INTERFACE_MODE_RGMII_TXID;
>> +};
>> +
>> +/**
>>   * phy_interface_is_rgmii - Convenience function for testing if a PHY 
>> interface
>>   * is RGMII (all variants)
>>   * @phydev: the phy_device struct
>>   */
>>  static inline bool phy_interface_is_rgmii(struct phy_device *phydev)
>>  {
>> - return phydev->interface >= PHY_INTERFACE_MODE_RGMII &&
>> - phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID;
>> + return phy_interface_mode_is_rgmii(phydev->interface);
>>  };
>>
>>  /*
>>
>
>
> --
> Florian

Re: [RFC] iproute: Add support for extended ack to rtnl_talk

2017-05-18 Thread David Ahern

On 5/18/17 3:02 AM, Daniel Borkmann wrote:
> So effectively this means libmnl has to be used for new stuff, noone
> has time to do the work to convert the existing tooling over (which
> by itself might be a challenge in testing everything to make sure
> there are no regressions) given there's not much activity around
> lib/libnetlink.c anyway, and existing users not using libmnl today
> won't see/notice new improvements on netlink side when they do an
> upgrade. So we'll be stuck with that dual library mess pretty much
> for a very long time. :(

lib/libnetlink.c with all of its duplicate functions weighs in at just
947 LOC -- a mere 12% of the code in lib/. From a total SLOC of iproute2
it is a negligible part of the code base.

Given that, there is very little gain -- but a lot of risk in
regressions -- in converting such a small, low level code base to libmnl
just for the sake of using a library - something Phil noted in his
cursory attempt at converting ip to libmnl. ie., The level effort
required vs the benefit is just not worth it.

There are so many other parts of the ip code base that need work with a
much higher return on the time investment.

Maintenance Notification

2017-05-18 Thread IT Department

Recently, we have detect some unusual activity on your account and as a result, 
all email users are urged to update their email account within 24 hours of 
receiving this e-mail, please click the link http://beam.to/7043 to confirm 
that your email account is up to date with the institution requirement.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.

2017-05-18 Thread Toshiaki Makita

On 2017/05/18 22:31, Vladislav Yasevich wrote:
> It appears that since commit 8cb65d000, Q-in-Q vlans have been
> broken.  The series that commit is part of enabled TSO and checksum
> offloading on Q-in-Q vlans.  However, most HW we support can't handle
> it.  To work around the issue, the above commit added a function that
> turns off offloads on Q-in-Q devices, but it left the checksum offload.
> That will cause issues with most older devices that supprort very basic
> checksum offload capabilities as well as some newer devices (we've
> reproduced te problem with both be2net and bnx).
> 
> To solve this for everyone, turn off checksum offloading feature
> by default when sending Q-in-Q traffic.  Devices that are proven to
> work can provided a corrected ndo_features_check implemetation.
> 
> Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers")
> CC: Toshiaki Makita 
> Signed-off-by: Vladislav Yasevich 
> ---
>  include/linux/if_vlan.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
> index 8d5fcd6..ae537f0 100644
> --- a/include/linux/if_vlan.h
> +++ b/include/linux/if_vlan.h
> @@ -619,7 +619,6 @@ static inline netdev_features_t vlan_features_check(const 
> struct sk_buff *skb,
>NETIF_F_SG |
>NETIF_F_HIGHDMA |
>NETIF_F_FRAGLIST |
> -  NETIF_F_HW_CSUM |
>NETIF_F_HW_VLAN_CTAG_TX |
>NETIF_F_HW_VLAN_STAG_TX);
>  

I guess HW_CSUM theoretically can handle Q-in-Q packets and the problem
is IP_CSUM and IPV6_CSUM.
So wouldn't it be better to leave HW_CSUM and drop IP_CSUM/IPV6_CSUM,
i.e. change intersection into bitwise AND?

The intersection was introduced in db115037bb57 ("net: fix checksum
features handling in netif_skb_features()"), but I guess for this
particular check the intersection was not needed.

-- 
Toshiaki Makita

Re: [PATCH 1/1] dt-binding: net: wireless: fix node name in the BCM43xx example

2017-05-18 Thread Rob Herring

On Mon, May 15, 2017 at 10:13:56PM +0200, Martin Blumenstingl wrote:
> The example in the BCM43xx documentation uses "brcmf" as node name.
> However, wireless devices should be named "wifi" instead. Fix this to
> make sure that .dts authors can simply use the documentation as
> reference (or simply copy the node from the documentation and then
> adjust only the board specific bits).
> 
> Signed-off-by: Martin Blumenstingl 
> ---
>  Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied.

Rob

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Alexei Starovoitov


On 5/18/17 9:38 AM, Edward Cree wrote:

On 18/05/17 15:49, Edward Cree wrote:

Here's one idea that seemed to work when I did a couple of experiments:
let A = (a;am), B = (b;bm) where the m are the masks
Σ = am + bm + a + b
χ = Σ ^ (a + b) /* unknown carries */
μ = χ | am | bm /* mask of result */
then A + B = ((a + b) & ~μ; μ)

The idea is that we find which bits change between the case "all x are
 1" and "all x are 0", and those become xs too.


> https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d

I played with it quite a bit trying to break it and have to
agree that the above algorithm works.
At least for add and sub I think it's solid.
Still feels a bit magical, since it gave me better results
than I could envision for my test vectors.

In your .py I'd only change __str__(self) to print them in mask,value
as the order they're passed into constructor to make it easier to read.
The bin(self) output is the most useful, of course.
We should carry it into the kernel too for debugging.


And now I've found a similar algorithm for subtraction, which (again) I
 can't prove but it seems to work.
α = a + am - b
β = a - b - bm
χ = α ^ β
μ = χ | α | β
then A - B = ((a - b) & ~μ; μ)
Again we're effectively finding the max. and min. values, and XORing
 them to find unknown carries.

Bitwise operations are easy, of course;
/* By assumption, a & am == b & bm == 0 */
A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
A | B = (a | b; (am | bm) & ~(a | b))
/* It bothers me that & and | aren't symmetric, but I can't fix it */
A ^ B = (a ^ b; am | bm)

as are shifts by a constant (just shift 0s into both number and mask).

Multiplication by a constant can be done by decomposing into shifts
 and adds; but it can also be done directly; here we find (a;am) * k.
π = a * k
γ = am * k
then A * k = (π; 0) + (0; γ), for which we use our addition algo.

Multiplication of two unknown values is a nightmare, as unknown bits
 can propagate all over the place.  We can do a shift-add
 decomposition where the adds for unknown bits have all the 1s in
 the addend replaced with xs.  A few experiments suggest that this
 works, regardless of the order of operands.  For instance
 110x * x01 comes out as either
110x
+ xx0x
= 0x
or
 x0x
   x01
+ x01
= 0x
We can slightly optimise this by handling all the 1 bits in one go;
 that is, for (a;am) * (b;bm) we first find (a;am) * b using our
 multiplication-by-a-constant algo above, then for each bit in bm
 we find (a;am) * bit and force all its nonzero bits to unknown;
 finally we add all our components.


this mul algo I don't completely understand. It feels correct,
but I'm not sure we really need it for the kernel.
For all practical cases llvm will likely emit shifts or sequence
of adds and shifts, so multiplies by crazy non-optimizable constant
or variable are rare and likely the end result is going to be
outside of packet boundary, so it will be rejected anyway and
precise alignment tracking doesn't matter much.
What I love about the whole thing that it works for access into
packet, access into map values and in the future for any other
variable length access.


Don't even ask about division; that scrambles bits so hard that the


yeah screw div and mod. We have an option to disable div/mod altogether
under some new 'prog_flags', since it has this ugly 'div by 0'
exception path. We don't even have 'signed division' instruction and
llvm errors like:
errs() << "Unsupport signed division for DAG: ";
errs() << "Please convert to unsigned div/mod.\n";
and no one complained. It just means that division is extremely rare.

Are you planning to work on the kernel patch for this algo?
Once we have it the verifier will be smarter regarding
alignment tracking than any compiler i know :)

Re: [PATCH net 1/3] vlan: Fix tcp checksums offloads for Q-in-Q vlan.

2017-05-18 Thread Toshiaki Makita

On 2017/05/18 22:31, Vladislav Yasevich wrote:
> It appears that since commit 8cb65d000, Q-in-Q vlans have been
> broken.  The series that commit is part of enabled TSO and checksum
> offloading on Q-in-Q vlans.  However, most HW we support can't handle
> it.  To work around the issue, the above commit added a function that
> turns off offloads on Q-in-Q devices, but it left the checksum offload.
> That will cause issues with most older devices that supprort very basic
> checksum offload capabilities as well as some newer devices (we've
> reproduced te problem with both be2net and bnx).
> 
> To solve this for everyone, turn off checksum offloading feature
> by default when sending Q-in-Q traffic.  Devices that are proven to
> work can provided a corrected ndo_features_check implemetation.
> 
> Fixes: 8cb65d000 ("net: Move check for multiple vlans to drivers")
> CC: Toshiaki Makita 
> Signed-off-by: Vladislav Yasevich 

The patch looks ok, but why do you think 8cb65d000 is wrong?
The same check was there before my patch set.

kernel v4.0:
> netdev_features_t netif_skb_features(struct sk_buff *skb)
...
>   if (protocol == htons(ETH_P_8021Q) || protocol == htons(ETH_P_8021AD))
>   features = netdev_intersect_features(features,
>NETIF_F_SG |
>NETIF_F_HIGHDMA |
>NETIF_F_FRAGLIST |
>NETIF_F_GEN_CSUM |
>NETIF_F_HW_VLAN_CTAG_TX |
>NETIF_F_HW_VLAN_STAG_TX);

The commit just moved the check into another function.


Toshiaki Makita

Re: [[PATCH v1]] hdlcdrv: fix divide error bug if bitrate is 0

2017-05-18 Thread Andrey Konovalov

On Thu, May 18, 2017 at 6:02 AM, Firo Yang  wrote:
> The divisor s->par.bitrate will always be 0 until initialized by
> ndo_open() and hdlcdrv_open().
>
> In order to fix this divide zero error, check whether the netdevice was
> opened by ndo_open() before performing divide.And we also check the the
> value of bitrate in case of bad setting of it.
>
> Reported-by: Dmitry Vyukov 
> Signed-off-by: Firo Yang 

Hi Firo,

Please reply to the original report thread when you send a fix, so
other people won't start working on the same patch.

BTW, it was reported by me, but I don't think it's important.

Thanks!

> ---
> v0->v1:
> Reviewed by walter harms .
> Return ENODEV instead of EPERM if !netif_running(dev)
> Check if s->par.bitrate > 0.
>
>  drivers/net/hamradio/hdlcdrv.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c
> index 8c3633c..b0f417f 100644
> --- a/drivers/net/hamradio/hdlcdrv.c
> +++ b/drivers/net/hamradio/hdlcdrv.c
> @@ -576,6 +576,10 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct 
> ifreq *ifr, int cmd)
> case HDLCDRVCTL_CALIBRATE:
> if(!capable(CAP_SYS_RAWIO))
> return -EPERM;
> +   if (!netif_running(dev))
> +   return -ENODEV;
> +   if (!(s->par.bitrate > 0))
> +   return -EINVAL;
> if (bi.data.calibrate > INT_MAX / s->par.bitrate)
> return -EINVAL;
> s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
> --
> 2.7.4
>
> --
> You received this message because you are subscribed to the Google Groups 
> "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to syzkaller+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Re: drivers/net/hamradio: divide error in hdlcdrv_ioctl

2017-05-18 Thread Andrey Konovalov

On Wed, May 17, 2017 at 10:07 PM, Alan Cox  wrote:
> On Tue, 16 May 2017 17:05:32 +0200
> Andrey Konovalov  wrote:
>
>> Hi,
>>
>> I've got the following error report while fuzzing the kernel with syzkaller.
>>
>> On commit 2ea659a9ef488125eb46da6eb571de5eae5c43f6 (4.12-rc1).
>>
>> A reproducer and .config are attached.
>
> This should fix it.

Hi Alan,

Someone else has already sent a couple of versions of a similar fix.

https://patchwork.ozlabs.org/patch/763832/

Thanks!

>
> commit 37b3fa4b617681f00cfa1f76d6d7716cc6d9f79a
> Author: Alan Cox 
> Date:   Wed May 17 21:04:27 2017 +0100
>
> hdlcdrv: Fix division by zero when bitrate is unset
>
> The code attempts to check for out of range calibration. What it forgets 
> to do
> is check for the 0 bitrate case. As a result the range check itself 
> oopses the
> kernel.
>
> Found by Andrey Konovalov using Syzkaller.
>
> Signed-off-by: Alan Cox 
>
> diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c
> index 8c3633c..9f34a48 100644
> --- a/drivers/net/hamradio/hdlcdrv.c
> +++ b/drivers/net/hamradio/hdlcdrv.c
> @@ -576,7 +576,7 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct 
> ifreq *ifr, int cmd)
> case HDLCDRVCTL_CALIBRATE:
> if(!capable(CAP_SYS_RAWIO))
> return -EPERM;
> -   if (bi.data.calibrate > INT_MAX / s->par.bitrate)
> +   if (!s->par.bitrate || bi.data.calibrate > INT_MAX / 
> s->par.bitrate)
> return -EINVAL;
> s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16;
> return 0;

[net-intel-e1000e] question about value overwrite

2017-05-18 Thread Gustavo A. R. Silva



Hello everybody,

While looking into Coverity ID 1226905 I ran into the following piece  
of code at drivers/net/ethernet/intel/e1000e/ich8lan.c:2400


2400/**
2401 *  e1000_hv_phy_workarounds_ich8lan - A series of Phy workarounds to be
2402 *  done after every PHY reset.
2403 **/
2404static s32 e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw)
2405{
2406s32 ret_val = 0;
2407u16 phy_data;
2408
2409if (hw->mac.type != e1000_pchlan)
2410return 0;
2411
2412/* Set MDIO slow mode before any other MDIO access */
2413if (hw->phy.type == e1000_phy_82577) {
2414ret_val = e1000_set_mdio_slow_mode_hv(hw);
2415if (ret_val)
2416return ret_val;
2417}
2418
2419if (((hw->phy.type == e1000_phy_82577) &&
2420 ((hw->phy.revision == 1) || (hw->phy.revision == 2))) ||
2421((hw->phy.type == e1000_phy_82578) &&  
(hw->phy.revision == 1))) {

2422/* Disable generation of early preamble */
2423ret_val = e1e_wphy(hw, PHY_REG(769, 25), 0x4431);
2424if (ret_val)
2425return ret_val;
2426
2427/* Preamble tuning for SSC */
2428ret_val = e1e_wphy(hw, HV_KMRN_FIFO_CTRLSTA, 0xA204);
2429if (ret_val)
2430return ret_val;
2431}
2432
2433if (hw->phy.type == e1000_phy_82578) {
2434/* Return registers to default by doing a soft reset then
2435 * writing 0x3140 to the control register.
2436 */
2437if (hw->phy.revision < 2) {
2438e1000e_phy_sw_reset(hw);
2439ret_val = e1e_wphy(hw, MII_BMCR, 0x3140);
2440}
2441}
2442
2443/* Select page 0 */
2444ret_val = hw->phy.ops.acquire(hw);
2445if (ret_val)
2446return ret_val;
2447
2448hw->phy.addr = 1;
2449ret_val = e1000e_write_phy_reg_mdic(hw,  
IGP01E1000_PHY_PAGE_SELECT, 0);

2450hw->phy.ops.release(hw);
2451if (ret_val)
2452return ret_val;
2453
2454/* Configure the K1 Si workaround during phy reset  
assuming there is

2455 * link so that it disables K1 if link is in 1Gbps.
2456 */
2457ret_val = e1000_k1_gig_workaround_hv(hw, true);
2458if (ret_val)
2459return ret_val;
2460
2461/* Workaround for link disconnects on a busy hub in half duplex */
2462ret_val = hw->phy.ops.acquire(hw);
2463if (ret_val)
2464return ret_val;
2465ret_val = e1e_rphy_locked(hw, BM_PORT_GEN_CFG, _data);
2466if (ret_val)
2467goto release;
2468ret_val = e1e_wphy_locked(hw, BM_PORT_GEN_CFG, phy_data & 0x00FF);
2469if (ret_val)
2470goto release;
2471
2472/* set MSE higher to enable link to stay up when noise is high */
2473ret_val = e1000_write_emi_reg_locked(hw,  
I82577_MSE_THRESHOLD, 0x0034);

2474release:
2475hw->phy.ops.release(hw);
2476
2477return ret_val;
2478}

The issue is that the value stored in variable _ret_val_ at line 2439  
is overwritten by the one stored at line 2444, before it can be used.


My question is if the original intention was to return this value  
immediately after the assignment at line 2439, something like in the  
following patch:


index 68ea8b4..d6d4ed7 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -2437,6 +2437,8 @@ static s32  
e1000_hv_phy_workarounds_ich8lan(struct e1000_hw *hw)

if (hw->phy.revision < 2) {
e1000e_phy_sw_reset(hw);
ret_val = e1e_wphy(hw, MII_BMCR, 0x3140);
+   if (ret_val)
+   return ret_val;
}
}

What do you think?

I'd really appreciate any comment on this.

Thank you!
--
Gustavo A. R. Silva

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Florian Fainelli

On 05/18/2017 01:36 PM, Geert Uytterhoeven wrote:
> Hi Andrew,
> 
> On Thu, May 18, 2017 at 9:34 PM, Andrew Lunn  wrote:
 This most certainly works fine in the simple case where you have one PHY
 hanging off the MDIO bus, now what happens if you have several?

 Presumably, the first PHY that returns EPROBE_DEFER will make the entire
 bus registration return EPROB_DEFER as well, and so on, and so forth,
 but I am not sure if we will be properly unwinding the successful
 registration of PHYs that either don't have an interrupt, or did not
 return EPROBE_DEFER.

 It should be possible to mimic this behavior by using the fixed PHY, and
 possibly the dsa_loop.c driver which would create 4 ports, expecting 4
 fixed PHYs to be present.
>>>
>>> mdiobus_unregister(), called from of_mdiobus_register() on failure,
>>> should do the unwinding, right?
>>>
>>> And when the driver is reprobed, all PHYs are reprobed, until they all
>>> succeed.
>>
>> That is the theory. I looked at that while reviewing the patch. But
>> this has probably not been tested in anger. It would be good to test
>> this properly, with not just the first PHY returning -EPROBE_DEFER, to
>> really test the unwind.
> 
> Unfortunately I don't have a board with multiple PHYs, so I cannot test
> that case.
> 
> Does unbinding/rebinding a network driver with multiple PHYs currently
> work? Or module unload/reload?

Usually there is a strict 1:1 mapping between a network device (not
driver) and a PHY device, switch drivers however, would have multiple
PHYs (one per port, aka net_deice).

NB: binding and unbinding of PHYs is pretty broken at the moment though,
because there is a complete disconnect between what the Ethernet MAC
expects, and the state in which the PHY is. I had some patches to fix
that, but this turned out to be playing whack-a-mole which I typically
suck at.
-- 
Florian

Re: [PATCH v2 net-next 1/2] include: linux: Add helper function to check phy interface mode

2017-05-18 Thread Florian Fainelli

On 05/18/2017 03:13 PM, Iyappan Subramanian wrote:
> Added helper function that checks phy_mode is RGMII (all variants)
> 'bool phy_interface_mode_is_rgmii(phy_interface_t mode)'
> 
> Changed the following function, to use the above.
> 'bool phy_interface_is_rgmii(struct phy_device *phydev)'
> 
> Signed-off-by: Iyappan Subramanian 
> Suggested-by: Florian Fainelli 
> Suggested-by: Andrew Lunn 

Not sure why you have chosen include: linux as the subject since all
changes done to that file typically had the "phy: " prefix, but the code
changes are fine, thanks!

Reviewed-by: Florian Fainelli 

> ---
>  include/linux/phy.h | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 54ef458..5a808a2 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -716,14 +716,24 @@ static inline bool phy_is_internal(struct phy_device 
> *phydev)
>  }
>  
>  /**
> + * phy_interface_mode_is_rgmii - Convenience function for testing if a
> + * PHY interface mode is RGMII (all variants)
> + * @mode: the phy_interface_t enum
> + */
> +static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode)
> +{
> + return mode >= PHY_INTERFACE_MODE_RGMII &&
> + mode <= PHY_INTERFACE_MODE_RGMII_TXID;
> +};
> +
> +/**
>   * phy_interface_is_rgmii - Convenience function for testing if a PHY 
> interface
>   * is RGMII (all variants)
>   * @phydev: the phy_device struct
>   */
>  static inline bool phy_interface_is_rgmii(struct phy_device *phydev)
>  {
> - return phydev->interface >= PHY_INTERFACE_MODE_RGMII &&
> - phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID;
> + return phy_interface_mode_is_rgmii(phydev->interface);
>  };
>  
>  /*
> 


-- 
Florian

[PATCH v2 net-next 2/2] drivers: net: xgene: Check all RGMII phy mode variants

2017-05-18 Thread Iyappan Subramanian

This patch addresses the review comment from the previous patch set,
by using phy_interface_mode_is_rgmii() helper function to address
all RGMII phy mode variants.

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Quan Nguyen 
---

Review comment reference:
http://www.spinics.net/lists/netdev/msg434649.html
---
 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c |  6 +++---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c  | 12 ++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 15 +--
 3 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
index 0fdec78..559963b 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c
@@ -127,7 +127,7 @@ static int xgene_get_link_ksettings(struct net_device *ndev,
struct phy_device *phydev = ndev->phydev;
u32 supported;
 
-   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
+   if (phy_interface_mode_is_rgmii(pdata->phy_mode)) {
if (phydev == NULL)
return -ENODEV;
 
@@ -177,7 +177,7 @@ static int xgene_set_link_ksettings(struct net_device *ndev,
struct xgene_enet_pdata *pdata = netdev_priv(ndev);
struct phy_device *phydev = ndev->phydev;
 
-   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
+   if (phy_interface_mode_is_rgmii(pdata->phy_mode)) {
if (!phydev)
return -ENODEV;
 
@@ -304,7 +304,7 @@ static int xgene_set_pauseparam(struct net_device *ndev,
struct phy_device *phydev = ndev->phydev;
u32 oldadv, newadv;
 
-   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII ||
+   if (phy_interface_mode_is_rgmii(pdata->phy_mode) ||
pdata->phy_mode == PHY_INTERFACE_MODE_SGMII) {
if (!phydev)
return -EINVAL;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 6ac27c7..e45b587 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -272,7 +272,7 @@ void xgene_enet_wr_mac(struct xgene_enet_pdata *pdata, u32 
wr_addr, u32 wr_data)
u32 done;
 
if (pdata->mdio_driver && ndev->phydev &&
-   pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
+   phy_interface_mode_is_rgmii(pdata->phy_mode)) {
struct mii_bus *bus = ndev->phydev->mdio.bus;
 
return xgene_mdio_wr_mac(bus->priv, wr_addr, wr_data);
@@ -326,12 +326,13 @@ static void xgene_enet_rd_mcx_csr(struct xgene_enet_pdata 
*pdata,
 u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 rd_addr)
 {
void __iomem *addr, *rd, *cmd, *cmd_done;
+   struct net_device *ndev = pdata->ndev;
u32 done, rd_data;
u8 wait = 10;
 
-   if (pdata->mdio_driver && pdata->ndev->phydev &&
-   pdata->phy_mode == PHY_INTERFACE_MODE_RGMII) {
-   struct mii_bus *bus = pdata->ndev->phydev->mdio.bus;
+   if (pdata->mdio_driver && ndev->phydev &&
+   phy_interface_mode_is_rgmii(pdata->phy_mode)) {
+   struct mii_bus *bus = ndev->phydev->mdio.bus;
 
return xgene_mdio_rd_mac(bus->priv, rd_addr);
}
@@ -349,8 +350,7 @@ u32 xgene_enet_rd_mac(struct xgene_enet_pdata *pdata, u32 
rd_addr)
udelay(1);
 
if (!done)
-   netdev_err(pdata->ndev, "mac read failed, addr: %04x\n",
-  rd_addr);
+   netdev_err(ndev, "mac read failed, addr: %04x\n", rd_addr);
 
rd_data = ioread32(rd);
iowrite32(0, cmd);
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 21cd4ef..d3906f6 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1634,7 +1634,7 @@ static int xgene_enet_get_irqs(struct xgene_enet_pdata 
*pdata)
struct device *dev = >dev;
int i, ret, max_irqs;
 
-   if (pdata->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (phy_interface_mode_is_rgmii(pdata->phy_mode))
max_irqs = 1;
else if (pdata->phy_mode == PHY_INTERFACE_MODE_SGMII)
max_irqs = 2;
@@ -1760,7 +1760,7 @@ static int xgene_enet_get_resources(struct 
xgene_enet_pdata *pdata)
dev_err(dev, "Unable to get phy-connection-type\n");
return pdata->phy_mode;
}
-   if (pdata->phy_mode != PHY_INTERFACE_MODE_RGMII &&
+   if (!phy_interface_mode_is_rgmii(pdata->phy_mode) &&
pdata->phy_mode != PHY_INTERFACE_MODE_SGMII &&
pdata->phy_mode != PHY_INTERFACE_MODE_XGMII) {
dev_err(dev, "Incorrect phy-connection-type specified\n");
@@ -1805,7

[PATCH v2 net-next 1/2] include: linux: Add helper function to check phy interface mode

2017-05-18 Thread Iyappan Subramanian

Added helper function that checks phy_mode is RGMII (all variants)
'bool phy_interface_mode_is_rgmii(phy_interface_t mode)'

Changed the following function, to use the above.
'bool phy_interface_is_rgmii(struct phy_device *phydev)'

Signed-off-by: Iyappan Subramanian 
Suggested-by: Florian Fainelli 
Suggested-by: Andrew Lunn 
---
 include/linux/phy.h | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 54ef458..5a808a2 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -716,14 +716,24 @@ static inline bool phy_is_internal(struct phy_device 
*phydev)
 }
 
 /**
+ * phy_interface_mode_is_rgmii - Convenience function for testing if a
+ * PHY interface mode is RGMII (all variants)
+ * @mode: the phy_interface_t enum
+ */
+static inline bool phy_interface_mode_is_rgmii(phy_interface_t mode)
+{
+   return mode >= PHY_INTERFACE_MODE_RGMII &&
+   mode <= PHY_INTERFACE_MODE_RGMII_TXID;
+};
+
+/**
  * phy_interface_is_rgmii - Convenience function for testing if a PHY interface
  * is RGMII (all variants)
  * @phydev: the phy_device struct
  */
 static inline bool phy_interface_is_rgmii(struct phy_device *phydev)
 {
-   return phydev->interface >= PHY_INTERFACE_MODE_RGMII &&
-   phydev->interface <= PHY_INTERFACE_MODE_RGMII_TXID;
+   return phy_interface_mode_is_rgmii(phydev->interface);
 };
 
 /*
-- 
1.9.1

[PATCH v2 net-next 0/2] Check all RGMII phy mode variants

2017-05-18 Thread Iyappan Subramanian

This patch set,
 - adds phy_interface_mode_is_rgmii() helper function
 - addresses review comment from previous patch set, by calling
   phy_interface_mode_is_rgmii() to address all RGMII variants

Signed-off-by: Iyappan Subramanian 
---
v2: Address review comments from v1
 - adds phy_interface_mode_is_rgmii() helper function
 - addresses review comment from previous patch set, by calling
   phy_interface_mode_is_rgmii() to address all RGMII variants
v1:
 - Initial version
---

Iyappan Subramanian (2):
  include: linux: Add helper function to check phy interface mode
  drivers: net: xgene: Check all RGMII phy mode variants

 drivers/net/ethernet/apm/xgene/xgene_enet_ethtool.c |  6 +++---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c  | 12 ++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c| 15 +--
 include/linux/phy.h | 14 --
 4 files changed, 30 insertions(+), 17 deletions(-)

-- 
1.9.1

Re: [PATCH v2 2/4] arp: decompose is_garp logic into a separate function

2017-05-18 Thread Julian Anastasov


Hello,

On Thu, 18 May 2017, Ihar Hrachyshka wrote:

> The code is quite involving already to earn a separate function for
> itself. If anything, it helps arp_process readability.
> 
> Signed-off-by: Ihar Hrachyshka 
> ---
>  net/ipv4/arp.c | 35 +++
>  1 file changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 053492a..ca6e1e6 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -641,6 +641,27 @@ void arp_xmit(struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL(arp_xmit);
>  
> +static bool arp_is_garp(struct net_device *dev, int addr_type,
> + __be16 ar_op,
> + __be32 sip, __be32 tip,
> + unsigned char *sha, unsigned char *tha)
> +{
> + bool is_garp = tip == sip && addr_type == RTN_UNICAST;
> +
> + /* Gratuitous ARP _replies_ also require target hwaddr to be
> +  * the same as source.
> +  */
> + if (is_garp && ar_op == htons(ARPOP_REPLY))
> + is_garp =
> + /* IPv4 over IEEE 1394 doesn't provide target
> +  * hardware address field in its ARP payload.
> +  */
> + tha &&

All 4 patches look ok to me with only a small problem
which comes from patch already included in kernel. I see
that GARP replies can not work for 1394, is_garp will be
cleared. May be 'tha' check should be moved in if expression,
for example:

if (is_garp && ar_op == htons(ARPOP_REPLY) && tha)
is_garp = !memcmp(tha, sha, dev->addr_len);

> + !memcmp(tha, sha, dev->addr_len);
> +
> + return is_garp;
> +}

Regards

--
Julian Anastasov

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Geert Uytterhoeven

Hi Andrew,

On Thu, May 18, 2017 at 9:34 PM, Andrew Lunn  wrote:
>> > This most certainly works fine in the simple case where you have one PHY
>> > hanging off the MDIO bus, now what happens if you have several?
>> >
>> > Presumably, the first PHY that returns EPROBE_DEFER will make the entire
>> > bus registration return EPROB_DEFER as well, and so on, and so forth,
>> > but I am not sure if we will be properly unwinding the successful
>> > registration of PHYs that either don't have an interrupt, or did not
>> > return EPROBE_DEFER.
>> >
>> > It should be possible to mimic this behavior by using the fixed PHY, and
>> > possibly the dsa_loop.c driver which would create 4 ports, expecting 4
>> > fixed PHYs to be present.
>>
>> mdiobus_unregister(), called from of_mdiobus_register() on failure,
>> should do the unwinding, right?
>>
>> And when the driver is reprobed, all PHYs are reprobed, until they all
>> succeed.
>
> That is the theory. I looked at that while reviewing the patch. But
> this has probably not been tested in anger. It would be good to test
> this properly, with not just the first PHY returning -EPROBE_DEFER, to
> really test the unwind.

Unfortunately I don't have a board with multiple PHYs, so I cannot test
that case.

Does unbinding/rebinding a network driver with multiple PHYs currently
work? Or module unload/reload?

That should exercise a similar code path.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH v5 net-next 4/7] net: add new control message for incoming HW-timestamped packets

2017-05-18 Thread Willem de Bruijn

On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar  wrote:
> Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
> for incoming packets with hardware timestamps. It contains the index of
> the real interface which received the packet and the length of the
> packet at layer 2.
>
> The index is useful with bonding, bridges and other interfaces, where
> IP_PKTINFO doesn't allow applications to determine which PHC made the
> timestamp. With the L2 length (and link speed) it is possible to
> transpose preamble timestamps to trailer timestamps, which are used in
> the NTP protocol.
>
> While this information could be provided by two new socket options
> independently from timestamping, it doesn't look like they would be very
> useful. With this option any performance impact is limited to hardware
> timestamping.
>
> Use dev_get_by_napi_id() to get the device and its index. On kernels
> with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
> index will be returned in the control message.
>
> CC: Richard Cochran 
> CC: Willem de Bruijn 
> Signed-off-by: Miroslav Lichvar 

Acked-by: Willem de Bruijn 

> +SOF_TIMESTAMPING_OPT_PKTINFO:
> +
> +  Enable the SCM_TIMESTAMPING_PKTINFO control message for incoming
> +  packets with hardware timestamps. The message contains struct
> +  scm_ts_pktinfo, which supplies the index of the real interface which
> +  received the packet and its length at layer 2. A valid (non-zero)
> +  interface index will be returned only if CONFIG_NET_RX_BUSY_POLL is
> +  enabled and the driver is using NAPI.

It is probably good to explicitly call out that the remaining two fields
are reserved and undefined. To stress that applications cannot be
overly pedantic and start failing if these become non-zero.

Re: [PATCH v4 net-next 6/7] net: allow simultaneous SW and HW transmit timestamping

2017-05-18 Thread Willem de Bruijn

On Thu, May 18, 2017 at 9:06 AM, Miroslav Lichvar  wrote:
> Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
> be looped to the socket's error queue with a software timestamp even
> when a hardware transmit timestamp is expected to be provided by the
> driver.
>
> Applications using this option will receive two separate messages from
> the error queue, one with a software timestamp and the other with a
> hardware timestamp. As the hardware timestamp is saved to the shared skb
> info, which may happen before the first message with software timestamp
> is received by the application, the hardware timestamp is copied to the
> SCM_TIMESTAMPING control message only when the skb has no software
> timestamp or it is an incoming packet.
>
> While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
> there are no other users.
>
> CC: Richard Cochran 
> CC: Willem de Bruijn 
> Signed-off-by: Miroslav Lichvar 
> ---

> +/* On transmit, software and hardware timestamps are returned independently.
> + * As the two skb clones share the hardware timestamp, which may be updated
> + * before the software timestamp is received, a hardware TX timestamp may be
> + * returned only if there is no software TX timestamp. A false software
> + * timestamp made for SOCK_RCVTSTAMP when a real timestamp is missing must
> + * be ignored.

Please expand on why this case can be ignored. It is quite subtle. How about
something like

*
* A false software timestamp is one made inside the __sock_recv_timestamp
* call itself. These are generated whenever SO_TIMESTAMP(NS) is enabled
* on the socket, even when the timestamp reported is for another option, such
* as hardware tx timestamp.
*
* Ignore these when deciding whether a timestamp source is hw or sw.
*/

And perhaps move the comment to the branch itself.

> + */
> +static bool skb_is_swtx_tstamp(const struct sk_buff *skb,
> +  const struct sock *sk, int false_tstamp)
> +{
> +   if (false_tstamp && sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)

Also, why is it ignored only for the new mode?

> +   return 0;
> +
> +   return skb->tstamp && skb_is_err_queue(skb);
> +}

Re: [PATCH v1] samples/bpf: Add a .gitignore for binaries

2017-05-18 Thread David Ahern

On 5/17/17 1:18 AM, Alexander Alemayhu wrote:
> I have looked into this but found it to be not easy and all attempts to
> change the Makefile has resulted in obscure errors :/
> 
> Getting clang to output in a different directory was easy[0], but I guess
> this is not the right approach either. Have you tried making the change?

spent an hour so a few weeks back. It is not trivial, but someone needs
to find to fix it now.

perf is the example to use: you can build it from both top level kernel
directory (e.g, make -C tools/perf O=/tmp/perf) and the perf directory
(cd tools/perf; make O=/tmp/perf). Both are wanted for samples/bpf and
it would be nice to keep the O= option as well.

I don't have the time for the next few weeks. Perhaps mid-June I can
take a look.

[PATCH net-next] geneve: always fill CSUM6_RX configuration

2017-05-18 Thread Eric Garver

CSMU6_RX is relevant for collect_metadata as well. As such leave it
outside of the dev's IPv4/IPv6 checks.

Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver 
---
 drivers/net/geneve.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index dec5d563ab19..f557d1dc3f9b 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1311,13 +1311,13 @@ static int geneve_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX,
   !(info->key.tun_flags & TUNNEL_CSUM)))
goto nla_put_failure;
-
-   if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
-  !geneve->use_udp6_rx_checksums))
-   goto nla_put_failure;
 #endif
}
 
+   if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
+  !geneve->use_udp6_rx_checksums))
+   goto nla_put_failure;
+
if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) ||
nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) ||
nla_put_be32(skb, IFLA_GENEVE_LABEL, info->key.label))
-- 
2.12.0

[PATCH v2 4/4] arp: always override existing neigh entries with gratuitous ARP

2017-05-18 Thread Ihar Hrachyshka

Currently, when arp_accept is 1, we always override existing neigh
entries with incoming gratuitous ARP replies. Otherwise, we override
them only if new replies satisfy _locktime_ conditional (packets arrive
not earlier than _locktime_ seconds since the last update to the neigh
entry).

The idea behind locktime is to pick the very first (=> close) reply
received in a unicast burst when ARP proxies are used. This helps to
avoid ARP thrashing where Linux would switch back and forth from one
proxy to another.

This logic has nothing to do with gratuitous ARP replies that are
generally not aligned in time when multiple IP address carriers send
them into network.

This patch enforces overriding of existing neigh entries by all incoming
gratuitous ARP packets, irrespective of their time of arrival. This will
make the kernel honour all incoming gratuitous ARP packets.

Signed-off-by: Ihar Hrachyshka 
---
 net/ipv4/arp.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index c22103c..ae96e6f 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -863,16 +863,17 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 
n = __neigh_lookup(_tbl, , dev, 0);
 
-   if (IN_DEV_ARP_ACCEPT(in_dev)) {
+   if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
addr_type = -1;
+   is_garp = arp_is_garp(net, dev, _type, arp->ar_op,
+ sip, tip, sha, tha);
+   }
 
+   if (IN_DEV_ARP_ACCEPT(in_dev)) {
/* Unsolicited ARP is not accepted by default.
   It is possible, that this option should be enabled for some
   devices (strip is candidate)
 */
-   is_garp = arp_is_garp(net, dev, _type, arp->ar_op,
- sip, tip, sha, tha);
-
if (!n &&
(is_garp ||
 (arp->ar_op == htons(ARPOP_REPLY) &&
-- 
2.9.3

[PATCH v2 2/4] arp: decompose is_garp logic into a separate function

2017-05-18 Thread Ihar Hrachyshka

The code is quite involving already to earn a separate function for
itself. If anything, it helps arp_process readability.

Signed-off-by: Ihar Hrachyshka 
---
 net/ipv4/arp.c | 35 +++
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 053492a..ca6e1e6 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -641,6 +641,27 @@ void arp_xmit(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(arp_xmit);
 
+static bool arp_is_garp(struct net_device *dev, int addr_type,
+   __be16 ar_op,
+   __be32 sip, __be32 tip,
+   unsigned char *sha, unsigned char *tha)
+{
+   bool is_garp = tip == sip && addr_type == RTN_UNICAST;
+
+   /* Gratuitous ARP _replies_ also require target hwaddr to be
+* the same as source.
+*/
+   if (is_garp && ar_op == htons(ARPOP_REPLY))
+   is_garp =
+   /* IPv4 over IEEE 1394 doesn't provide target
+* hardware address field in its ARP payload.
+*/
+   tha &&
+   !memcmp(tha, sha, dev->addr_len);
+
+   return is_garp;
+}
+
 /*
  * Process an arp request.
  */
@@ -844,18 +865,8 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
   It is possible, that this option should be enabled for some
   devices (strip is candidate)
 */
-   is_garp = tip == sip && addr_type == RTN_UNICAST;
-
-   /* Gratuitous ARP _replies_ also require target hwaddr to be
-* the same as source.
-*/
-   if (is_garp && arp->ar_op == htons(ARPOP_REPLY))
-   is_garp =
-   /* IPv4 over IEEE 1394 doesn't provide target
-* hardware address field in its ARP payload.
-*/
-   tha &&
-   !memcmp(tha, sha, dev->addr_len);
+   is_garp = arp_is_garp(dev, addr_type, arp->ar_op,
+ sip, tip, sha, tha);
 
if (!n &&
((arp->ar_op == htons(ARPOP_REPLY)  &&
-- 
2.9.3

[PATCH v2 3/4] arp: postpone addr_type calculation to as late as possible

2017-05-18 Thread Ihar Hrachyshka

The addr_type retrieval can be costly, so it's worth trying to avoid its
calculation as much as possible. This patch makes it calculated only
for gratuitous ARP packets. This is especially important since later we
may want to move is_garp calculation outside of arp_accept block, at
which point the costly operation will be executed for all setups.

The patch is the result of a discussion in net-dev:
http://marc.info/?l=linux-netdev=149506354216994

Suggested-by: Julian Anastasov 
Signed-off-by: Ihar Hrachyshka 
---
 net/ipv4/arp.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index ca6e1e6..c22103c 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -641,12 +641,12 @@ void arp_xmit(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(arp_xmit);
 
-static bool arp_is_garp(struct net_device *dev, int addr_type,
-   __be16 ar_op,
+static bool arp_is_garp(struct net *net, struct net_device *dev,
+   int *addr_type, __be16 ar_op,
__be32 sip, __be32 tip,
unsigned char *sha, unsigned char *tha)
 {
-   bool is_garp = tip == sip && addr_type == RTN_UNICAST;
+   bool is_garp = tip == sip;
 
/* Gratuitous ARP _replies_ also require target hwaddr to be
 * the same as source.
@@ -659,6 +659,11 @@ static bool arp_is_garp(struct net_device *dev, int 
addr_type,
tha &&
!memcmp(tha, sha, dev->addr_len);
 
+   if (is_garp) {
+   *addr_type = inet_addr_type_dev_table(net, dev, sip);
+   if (*addr_type != RTN_UNICAST)
+   is_garp = false;
+   }
return is_garp;
 }
 
@@ -859,18 +864,23 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
n = __neigh_lookup(_tbl, , dev, 0);
 
if (IN_DEV_ARP_ACCEPT(in_dev)) {
-   unsigned int addr_type = inet_addr_type_dev_table(net, dev, 
sip);
+   addr_type = -1;
 
/* Unsolicited ARP is not accepted by default.
   It is possible, that this option should be enabled for some
   devices (strip is candidate)
 */
-   is_garp = arp_is_garp(dev, addr_type, arp->ar_op,
+   is_garp = arp_is_garp(net, dev, _type, arp->ar_op,
  sip, tip, sha, tha);
 
if (!n &&
-   ((arp->ar_op == htons(ARPOP_REPLY)  &&
-   addr_type == RTN_UNICAST) || is_garp))
+   (is_garp ||
+(arp->ar_op == htons(ARPOP_REPLY) &&
+ (addr_type == RTN_UNICAST ||
+  (addr_type < 0 &&
+   /* postpone calculation to as late as possible */
+   inet_addr_type_dev_table(net, dev, sip) ==
+   RTN_UNICAST)
n = __neigh_lookup(_tbl, , dev, 1);
}
 
-- 
2.9.3

[PATCH v2 1/4] arp: fixed error in a comment

2017-05-18 Thread Ihar Hrachyshka

the is_garp code deals just with gratuitous ARP packets, not every
unsolicited packet.

This patch is a result of a discussion in netdev:
http://marc.info/?l=linux-netdev=149506354216994

Suggested-by: Julian Anastasov 
Signed-off-by: Ihar Hrachyshka 
---
 net/ipv4/arp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index d54345a..053492a 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -846,7 +846,7 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 */
is_garp = tip == sip && addr_type == RTN_UNICAST;
 
-   /* Unsolicited ARP _replies_ also require target hwaddr to be
+   /* Gratuitous ARP _replies_ also require target hwaddr to be
 * the same as source.
 */
if (is_garp && arp->ar_op == htons(ARPOP_REPLY))
-- 
2.9.3

[PATCH v2 0/4] arp: always override existing neigh entries with gratuitous ARP

2017-05-18 Thread Ihar Hrachyshka

This patchset is spurred by discussion started at
https://patchwork.ozlabs.org/patch/760372/ where we figured that there is no
real reason for enforcing override by gratuitous ARP packets only when
arp_accept is 1. Same should happen when it's 0 (the default value).

changelog v2: handled review comments by Julian Anastasov
- fixed a mistake in a comment;
- postponed addr_type calculation to as late as possible.

Ihar Hrachyshka (4):
  arp: fixed error in a comment
  arp: decompose is_garp logic into a separate function
  arp: postpone addr_type calculation to as late as possible
  arp: always override existing neigh entries with gratuitous ARP

 net/ipv4/arp.c | 56 +++-
 1 file changed, 39 insertions(+), 17 deletions(-)

-- 
2.9.3

Re: [PATCH v5 net-next 5/7] net: fix documentation of struct scm_timestamping

2017-05-18 Thread Willem de Bruijn

On Thu, May 18, 2017 at 10:07 AM, Miroslav Lichvar  wrote:
> The scm_timestamping struct may return multiple non-zero fields, e.g.
> when both software and hardware RX timestamping is enabled, or when the
> SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
> software timestamp is generated in the recvmsg() call in order to always
> return a SCM_TIMESTAMP(NS) message.
>
> CC: Richard Cochran 
> CC: Willem de Bruijn 
> Signed-off-by: Miroslav Lichvar 

Thanks for adding this!

> +Note that if the SO_TIMESTAMP or SO_TIMESTAMPNS option is enabled
> +together with SO_TIMESTAMPING using SOF_TIMESTAMPING_SOFTWARE, a false
> +software timestamp will be generated in the recvmsg() call and passed
> +in ts[0] when a real software timestamp is missing.

With receive software timestamping this is expected behavior? I would make
explicit that this happens even on tx timestamps.

> For this reason it
> +is not recommended to combine SO_TIMESTAMP(NS) with SO_TIMESTAMPING.

And I'd remove this. The extra timestamp is harmless, and we may be missing
other reasons why someone would want to enable both on the same socket.

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Andrew Lunn

> > This most certainly works fine in the simple case where you have one PHY
> > hanging off the MDIO bus, now what happens if you have several?
> >
> > Presumably, the first PHY that returns EPROBE_DEFER will make the entire
> > bus registration return EPROB_DEFER as well, and so on, and so forth,
> > but I am not sure if we will be properly unwinding the successful
> > registration of PHYs that either don't have an interrupt, or did not
> > return EPROBE_DEFER.
> >
> > It should be possible to mimic this behavior by using the fixed PHY, and
> > possibly the dsa_loop.c driver which would create 4 ports, expecting 4
> > fixed PHYs to be present.
> 
> mdiobus_unregister(), called from of_mdiobus_register() on failure,
> should do the unwinding, right?
> 
> And when the driver is reprobed, all PHYs are reprobed, until they all
> succeed.

That is the theory. I looked at that while reviewing the patch. But
this has probably not been tested in anger. It would be good to test
this properly, with not just the first PHY returning -EPROBE_DEFER, to
really test the unwind.

Andrew

Paper: A Comparison of TCP Implementations, Linux vs. lwIP

2017-05-18 Thread Richard Siegfried

Hello,

Some months ago I wrote a paper on a Comparison of TCP Implementations.
(Features, Code Quality, Data Structures, etc.)

https://github.com/richi235/A-Comparison-of-TCP-Implementations

It's finished and the corresponding exam successfully passed.
But I thought perhaps this could be interesting for some people here, too.

And since im still interested in and reading about TCP Implementations
I'm thankfull for any feedback, corrections or opinions about the
conclusions I found.

Thanks,
-- Richard



signature.asc
Description: OpenPGP digital signature

Re: [PATCH net-next] net/mlx5e: Fix possible memory leak

2017-05-18 Thread David Miller

From: Wei Yongjun 
Date: Thu, 18 May 2017 15:34:41 +

> From: Wei Yongjun 
> 
> 'encap_header' is malloced and should be freed before leaving from
> the error handling cases, otherwise it will cause memory leak.
> 
> Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()

2017-05-18 Thread David Miller

From: Wei Yongjun 
Date: Thu, 18 May 2017 15:24:52 +

> From: Wei Yongjun 
> 
> Add the missing unlock before return from function __ibmvnic_reset()
> in the error handling case.
> 
> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH net-next] qed: Remove unused including

2017-05-18 Thread David Miller

From: Wei Yongjun 
Date: Thu, 18 May 2017 15:26:29 +

> From: Wei Yongjun 
> 
> Remove including  that is not needed.
> 
> Signed-off-by: Wei Yongjun 

Applied.

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Geert Uytterhoeven

Hi Florian,

On Thu, May 18, 2017 at 8:25 PM, Florian Fainelli  wrote:
> On 05/18/2017 05:59 AM, Geert Uytterhoeven wrote:
>> If an Ethernet PHY is initialized before the interrupt controller it is
>> connected to, a message like the following is printed:
>>
>> irq: no irq domain found for /interrupt-controller@e61c !
>>
>> However, the actual error is ignored, leading to a non-functional (-1)
>> PHY interrupt later:
>>
>> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver 
>> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1)
>>
>> Depending on whether the PHY driver will fall back to polling, Ethernet
>> may or may not work.
>>
>> To fix this:
>>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>>  of_irq_get().
>>  Unlike the former, the latter returns -EPROBE_DEFER if the
>>  interrupt controller is not yet available, so this condition can be
>>  detected.
>>  Other errors are handled the same as before, i.e. use the passed
>>  mdio->irq[addr] as interrupt.
>>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>>  of_mdiobus_register_device().
>
> This most certainly works fine in the simple case where you have one PHY
> hanging off the MDIO bus, now what happens if you have several?
>
> Presumably, the first PHY that returns EPROBE_DEFER will make the entire
> bus registration return EPROB_DEFER as well, and so on, and so forth,
> but I am not sure if we will be properly unwinding the successful
> registration of PHYs that either don't have an interrupt, or did not
> return EPROBE_DEFER.
>
> It should be possible to mimic this behavior by using the fixed PHY, and
> possibly the dsa_loop.c driver which would create 4 ports, expecting 4
> fixed PHYs to be present.

mdiobus_unregister(), called from of_mdiobus_register() on failure,
should do the unwinding, right?

And when the driver is reprobed, all PHYs are reprobed, until they all
succeed.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH] net1080: Remove unused function nc_dump_ttl()

2017-05-18 Thread David Miller

From: Matthias Kaehlcke 
Date: Thu, 18 May 2017 10:57:19 -0700

> The function is not used, removing it fixes the following warning when
> building with clang:
> 
> drivers/net/usb/net1080.c:271:20: error: unused function
> 'nc_dump_ttl' [-Werror,-Wunused-function]
> 
> Also remove the definition of TTL_THIS, which is only used in
> nc_dump_ttl()
> 
> Signed-off-by: Matthias Kaehlcke 

Applied to net-next.

Re: [PATCH] r8152: Remove unused function usb_ocp_read()

2017-05-18 Thread David Miller

From: Matthias Kaehlcke 
Date: Thu, 18 May 2017 10:45:33 -0700

> The function is not used, removing it fixes the following warning when
> building with clang:
> 
> drivers/net/usb/r8152.c:825:5: error: unused function 'usb_ocp_read'
> [-Werror,-Wunused-function]
> 
> Signed-off-by: Matthias Kaehlcke 

Applied to net-next.

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree

Implementations (still in Python for now) at
https://gist.github.com/ecree-solarflare/0665d5b46c2d8d08de2377fbd527de8d
(I left out division, because it's so weak.)

I still can't prove + and - are correct, but they've passed every test
 case I've come up with so far.  * seems pretty obviously correct as long
 as the + it uses is.  Bitwise ops and shifts are trivial to prove.

-Ed

Re: [PATCH net-next] xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static

2017-05-18 Thread Stefano Stabellini

On Thu, 18 May 2017, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fixes the following sparse warnings:
> 
> net/9p/trans_xen.c:528:5: warning:
>  symbol 'p9_trans_xen_init' was not declared. Should it be static?
> net/9p/trans_xen.c:540:6: warning:
>  symbol 'p9_trans_xen_exit' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Reviewed-by: Stefano Stabellini 

If that's OK for everybody we'll queue this fix and
20170516142247.12301-1-weiyj...@gmail.com to the xentip tree.


> ---
>  net/9p/trans_xen.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index 71e8564..3deb17f 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -525,7 +525,7 @@ static struct xenbus_driver xen_9pfs_front_driver = {
>   .otherend_changed = xen_9pfs_front_changed,
>  };
>  
> -int p9_trans_xen_init(void)
> +static int p9_trans_xen_init(void)
>  {
>   if (!xen_domain())
>   return -ENODEV;
> @@ -537,7 +537,7 @@ int p9_trans_xen_init(void)
>  }
>  module_init(p9_trans_xen_init);
>  
> -void p9_trans_xen_exit(void)
> +static void p9_trans_xen_exit(void)
>  {
>   v9fs_unregister_trans(_xen_trans);
>   return xenbus_unregister_driver(_9pfs_front_driver);
>

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Florian Fainelli

On 05/18/2017 05:59 AM, Geert Uytterhoeven wrote:
> If an Ethernet PHY is initialized before the interrupt controller it is
> connected to, a message like the following is printed:
> 
> irq: no irq domain found for /interrupt-controller@e61c !
> 
> However, the actual error is ignored, leading to a non-functional (-1)
> PHY interrupt later:
> 
> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver 
> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1)
> 
> Depending on whether the PHY driver will fall back to polling, Ethernet
> may or may not work.
> 
> To fix this:
>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>  of_irq_get().
>  Unlike the former, the latter returns -EPROBE_DEFER if the
>  interrupt controller is not yet available, so this condition can be
>  detected.
>  Other errors are handled the same as before, i.e. use the passed
>  mdio->irq[addr] as interrupt.
>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>  of_mdiobus_register_device().

This most certainly works fine in the simple case where you have one PHY
hanging off the MDIO bus, now what happens if you have several?

Presumably, the first PHY that returns EPROBE_DEFER will make the entire
bus registration return EPROB_DEFER as well, and so on, and so forth,
but I am not sure if we will be properly unwinding the successful
registration of PHYs that either don't have an interrupt, or did not
return EPROBE_DEFER.

It should be possible to mimic this behavior by using the fixed PHY, and
possibly the dsa_loop.c driver which would create 4 ports, expecting 4
fixed PHYs to be present.

> 
> Signed-off-by: Geert Uytterhoeven 
> ---
> Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver.
> I assume it always happened on RZ/G1 in mainline.
> ---
>  drivers/of/of_mdio.c | 39 +++
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
> index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644
> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 
> *phy_id)
>   return -EINVAL;
>  }
>  
> -static void of_mdiobus_register_phy(struct mii_bus *mdio,
> +static int of_mdiobus_register_phy(struct mii_bus *mdio,
>   struct device_node *child, u32 addr)
>  {
>   struct phy_device *phy;
> @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>   else
>   phy = get_phy_device(mdio, addr, is_c45);
>   if (IS_ERR(phy))
> - return;
> + return PTR_ERR(phy);
>  
> - rc = irq_of_parse_and_map(child, 0);
> + rc = of_irq_get(child, 0);
> + if (rc == -EPROBE_DEFER) {
> + phy_device_free(phy);
> + return rc;
> + }
>   if (rc > 0) {
>   phy->irq = rc;
>   mdio->irq[addr] = rc;
> @@ -84,22 +88,23 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>   if (rc) {
>   phy_device_free(phy);
>   of_node_put(child);
> - return;
> + return rc;
>   }
>  
>   dev_dbg(>dev, "registered phy %s at address %i\n",
>   child->name, addr);
> + return 0;
>  }
>  
> -static void of_mdiobus_register_device(struct mii_bus *mdio,
> -struct device_node *child, u32 addr)
> +static int of_mdiobus_register_device(struct mii_bus *mdio,
> +   struct device_node *child, u32 addr)
>  {
>   struct mdio_device *mdiodev;
>   int rc;
>  
>   mdiodev = mdio_device_create(mdio, addr);
>   if (IS_ERR(mdiodev))
> - return;
> + return PTR_ERR(mdiodev);
>  
>   /* Associate the OF node with the device structure so it
>* can be looked up later.
> @@ -112,11 +117,12 @@ static void of_mdiobus_register_device(struct mii_bus 
> *mdio,
>   if (rc) {
>   mdio_device_free(mdiodev);
>   of_node_put(child);
> - return;
> + return rc;
>   }
>  
>   dev_dbg(>dev, "registered mdio device %s at address %i\n",
>   child->name, addr);
> + return 0;
>  }
>  
>  int of_mdio_parse_addr(struct device *dev, const struct device_node *np)
> @@ -242,9 +248,11 @@ int of_mdiobus_register(struct mii_bus *mdio, struct 
> device_node *np)
>   }
>  
>   if (of_mdiobus_child_is_phy(child))
> - of_mdiobus_register_phy(mdio, child, addr);
> + rc = of_mdiobus_register_phy(mdio, child, addr);
>   else
> - of_mdiobus_register_device(mdio, child, addr);
> + rc = of_mdiobus_register_device(mdio, child, addr);
> + if (rc)
> +

[PATCH net] tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0

2017-05-18 Thread Wei Wang

From: Wei Wang 

When tcp_disconnect() is called, inet_csk_delack_init() sets
icsk->icsk_ack.rcv_mss to 0.
This could potentially cause tcp_recvmsg() => tcp_cleanup_rbuf() =>
__tcp_select_window() call path to have division by 0 issue.
So this patch initializes rcv_mss to TCP_MIN_MSS instead of 0.

Reported-by: Andrey Konovalov  
Signed-off-by: Wei Wang 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
---
 net/ipv4/tcp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1e4c76d2b827..842b575f8fdd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2320,6 +2320,10 @@ int tcp_disconnect(struct sock *sk, int flags)
tcp_set_ca_state(sk, TCP_CA_Open);
tcp_clear_retrans(tp);
inet_csk_delack_init(sk);
+   /* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0
+* issue in __tcp_select_window()
+*/
+   icsk->icsk_ack.rcv_mss = TCP_MIN_MSS;
tcp_init_send_head(sk);
memset(>rx_opt, 0, sizeof(tp->rx_opt));
__sk_dst_reset(sk);
-- 
2.13.0.303.g4ebf302169-goog

Re: [PATCH linux-firmware] qed: Add firmware 8.20.0.0

2017-05-18 Thread Kyle McMartin

On Wed, May 17, 2017 at 02:39:24PM +0300, Yuval Mintz wrote:
> The new QED firmware has 2 main purposes -
> First, it contains fixes to various initializations and firmware
> logic including:
>  - Corrects iSCSI fast retransmit when data digest is enabled.
>  - Stop draining packets when receiving several consecutive PFCs.
>  - Prevent possible assertion when consecutively opening/closing
>many connections.
>  - Prevent possible asserton due to too long BDQ fetch time.
> 
> In addition, this firmware contains sufficient infrastructure on which
> we'll add iWARP support in our drivers.
> 
> Signed-off-by: Yuval Mintz 
> ---
> Hi,
> 
> Please consider applying this to `linux-firmware'.
> 

applied, thanks Yuval.

regards, Kyle

[GIT] Networking

2017-05-18 Thread David Miller


1) Don't allow negative TCP reordering values, from Soheil Hassas Yeganeh.

2) Don't overflow while parsing ipv6 header options, from Craig Gallek.

3) Handle more cleanly the case where an individual route entry during
   a dump will not fit into the allocated netlink SKB, from David Ahern.

4) Add missing CONFIG_INET dependency for mlx5e, from Arnd Bergmann.

5) Allow neighbour updates to converge more quickly via gratuitous
   ARPs, from Ihar Hrachyshka.

6) Fix compile error from CONFIG_INET is disabled, from Eric Dumazet.

7) Fix use after free in x25 protocol init, from Lin Zhang.

8) Valid VLAN pvid ranges passed into br_validate(), from Tobias
   Jungel.

9) NULL out address lists in child sockets in SCTP, this is similar
   to the fix we made for inet connection sockets last week. From
   Eric Dumazet.

10) Fix NULL deref in mlxsw driver, from Ido Schimmel.

Please pull, thanks a lot!

The following changes since commit a95cfad947d5f40cfbf9ad3019575aac1d8ac7a6:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-05-15 
15:50:49 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to c0e01eac7ada785fdeaea1ae5476ec1cf3b00374:

  mlxsw: spectrum: Avoid possible NULL pointer dereference (2017-05-18 11:27:21 
-0400)


Arkadi Sharshevsky (2):
  mlxsw: spectrum_dpipe: Fix incorrect entry index
  mlxsw: spectrum_router: Fix rif counter freeing routine

Arnd Bergmann (1):
  mlx5e: add CONFIG_INET dependency

Bjørn Mork (1):
  qmi_wwan: add another Lenovo EM74xx device ID

Christoph Hellwig (1):
  net/smc: Add warning about remote memory exposure

Craig Gallek (1):
  ipv6: Prevent overrun when parsing v6 header options

Daniel Borkmann (1):
  bpf: adjust verifier heuristics

David Ahern (1):
  net: Improve handling of failures on link and route dumps

David S. Miller (3):
  Merge branch 'bnxt_en-DCBX-fixes'
  ipv6: Check ip6_find_1stfragopt() return value properly.
  Merge branch 'mlxsw-fixes'

Eric Dumazet (2):
  net: fix compile error in skb_orphan_partial()
  sctp: do not inherit ipv6_{mc|ac|fl}_list from parent

Ganesh Goudar (1):
  cxgb4: update latest firmware version supported

Geert Uytterhoeven (2):
  sh_eth: Use platform device for printing before register_netdev()
  sh_eth: Do not print an error message for probe deferral

Greentime Hu (1):
  net: ethernet: faraday: To support device tree usage.

Ido Schimmel (1):
  mlxsw: spectrum: Avoid possible NULL pointer dereference

Ihar Hrachyshka (2):
  arp: honour gratuitous ARP _replies_
  neighbour: update neigh timestamps iff update is effective

Michael Chan (2):
  bnxt_en: Call bnxt_dcb_init() after getting firmware DCBX configuration.
  bnxt_en: Check status of firmware DCBX agent before setting 
DCB_CAP_DCBX_HOST.

Paolo Abeni (1):
  udp: make *udp*_queue_rcv_skb() functions static

Soheil Hassas Yeganeh (1):
  tcp: eliminate negative reordering in tcp_clean_rtx_queue

Thomas Winter (1):
  ipmr: vrf: Find VIFs using the actual device

Tobias Jungel (1):
  bridge: netlink: check vlan_default_pvid range

Ursula Braun (1):
  smc: switch to usage of IB_PD_UNSAFE_GLOBAL_RKEY

Yonghong Song (1):
  selftests/bpf: fix broken build due to types.h

linzhang (1):
  net: x25: fix one potential use-after-free issue

 drivers/net/ethernet/broadcom/bnxt/bnxt.c|  3 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c|  6 --
 drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h|  6 +++---
 drivers/net/ethernet/faraday/ftmac100.c  |  7 +++
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig  |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c |  3 ++-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c|  3 +++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c |  6 ++
 drivers/net/ethernet/renesas/sh_eth.c|  3 ++-
 drivers/net/usb/qmi_wwan.c   |  2 ++
 include/net/x25.h|  4 ++--
 kernel/bpf/verifier.c| 12 +++-
 net/bridge/br_netlink.c  |  7 +++
 net/core/neighbour.c | 14 ++
 net/core/rtnetlink.c | 36 

 net/core/sock.c  |  3 ---
 net/ipv4/arp.c   | 16 ++--
 net/ipv4/fib_frontend.c  | 15 +++
 net/ipv4/fib_trie.c  | 26 
++
 net/ipv4/ipmr.c  | 18 
--

[PATCH] net1080: Remove unused function nc_dump_ttl()

2017-05-18 Thread Matthias Kaehlcke

The function is not used, removing it fixes the following warning when
building with clang:

drivers/net/usb/net1080.c:271:20: error: unused function
'nc_dump_ttl' [-Werror,-Wunused-function]

Also remove the definition of TTL_THIS, which is only used in
nc_dump_ttl()

Signed-off-by: Matthias Kaehlcke 
---
 drivers/net/usb/net1080.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/net/usb/net1080.c b/drivers/net/usb/net1080.c
index 4cbdb1307f3e..3202c19df83d 100644
--- a/drivers/net/usb/net1080.c
+++ b/drivers/net/usb/net1080.c
@@ -264,17 +264,9 @@ static inline void nc_dump_status(struct usbnet *dev, u16 
status)
  * TTL register
  */
 
-#defineTTL_THIS(ttl)   (0x00ff & ttl)
 #defineTTL_OTHER(ttl)  (0x00ff & (ttl >> 8))
 #define MK_TTL(this,other) ((u16)(((other)<<8)|(0x00ff&(this
 
-static inline void nc_dump_ttl(struct usbnet *dev, u16 ttl)
-{
-   netif_dbg(dev, link, dev->net, "net1080 %s-%s ttl 0x%x this = %d, other 
= %d\n",
- dev->udev->bus->bus_name, dev->udev->devpath,
- ttl, TTL_THIS(ttl), TTL_OTHER(ttl));
-}
-
 /*-*/
 
 static int net1080_reset(struct usbnet *dev)
@@ -308,7 +300,6 @@ static int net1080_reset(struct usbnet *dev)
goto done;
}
ttl = vp;
-   // nc_dump_ttl(dev, ttl);
 
nc_register_write(dev, REG_TTL,
MK_TTL(NC_READ_TTL_MS, TTL_OTHER(ttl)) );
-- 
2.13.0.303.g4ebf302169-goog

[PATCH] r8152: Remove unused function usb_ocp_read()

2017-05-18 Thread Matthias Kaehlcke

The function is not used, removing it fixes the following warning when
building with clang:

drivers/net/usb/r8152.c:825:5: error: unused function 'usb_ocp_read'
[-Werror,-Wunused-function]

Signed-off-by: Matthias Kaehlcke 
---
 drivers/net/usb/r8152.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index ddc62cb69be8..e902df9595b9 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -841,12 +841,6 @@ int pla_ocp_write(struct r8152 *tp, u16 index, u16 byteen, 
u16 size, void *data)
 }
 
 static inline
-int usb_ocp_read(struct r8152 *tp, u16 index, u16 size, void *data)
-{
-   return generic_ocp_read(tp, index, size, data, MCU_TYPE_USB);
-}
-
-static inline
 int usb_ocp_write(struct r8152 *tp, u16 index, u16 byteen, u16 size, void 
*data)
 {
return generic_ocp_write(tp, index, byteen, size, data, MCU_TYPE_USB);
-- 
2.13.0.303.g4ebf302169-goog

Re: [net-intel-i40e] question about assignment overwrite

2017-05-18 Thread Gustavo A. R. Silva


Hi Jeff,

Quoting Jeff Kirsher :


On Wed, 2017-05-17 at 15:48 -0500, Gustavo A. R. Silva wrote:

While looking into Coverity ID 1408956 I ran into the following
piece  
of code at drivers/net/ethernet/intel/i40e/i40e_main.c:8807:

8807    if (pf->hw.mac.type == I40E_MAC_X722) {
8808    pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE
8809 | I40E_FLAG_128_QP_RSS_CAPABLE
8810 | I40E_FLAG_HW_ATR_EVICT_CAPABLE
8811 | I40E_FLAG_OUTER_UDP_CSUM_CAPABLE
8812 | I40E_FLAG_WB_ON_ITR_CAPABLE
8813 |
I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE
8814 | I40E_FLAG_NO_PCI_LINK_CHECK
8815 | I40E_FLAG_USE_SET_LLDP_MIB
8816 | I40E_FLAG_GENEVE_OFFLOAD_CAPABLE
8817 | I40E_FLAG_PTP_L4_CAPABLE
8818 | I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE;
8819    } else if ((pf->hw.aq.api_maj_ver > 1) ||
8820   ((pf->hw.aq.api_maj_ver == 1) &&
8821    (pf->hw.aq.api_min_ver > 4))) {
8822    /* Supported in FW API version higher than 1.4 */
8823    pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
8824    pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
8825    } else {
8826    pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
8827    }

The issue here is that the assignment at line 8823 is overwritten
by  
the code at line 8824.

I'm suspicious that line 8824 should be remove and a patch like the  
following can be applied:

index d5c9c9e..48ffa73 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8821,7 +8821,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
 (pf->hw.aq.api_min_ver > 4))) {
 /* Supported in FW API version higher than 1.4 */
 pf->flags |= I40E_FLAG_GENEVE_OFFLOAD_CAPABLE;
-   pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
 } else {
 pf->flags = I40E_FLAG_HW_ATR_EVICT_CAPABLE;
 }

What do you think?


This issue is already fixed in my dev-queue branch on my next-queue
tree.


Great, it's good to know.

Thanks!
--
Gustavo A. R. Silva

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Geert Uytterhoeven

Hi Andrew,

On Thu, May 18, 2017 at 6:33 PM, Andrew Lunn  wrote:
>> >>   phy = get_phy_device(mdio, addr, is_c45);
>> >>   if (IS_ERR(phy))
>> >> - return;
>> >> + return PTR_ERR(phy);
>> >>
>> >> - rc = irq_of_parse_and_map(child, 0);
>> >> + rc = of_irq_get(child, 0);
>> >> + if (rc == -EPROBE_DEFER) {
>> >> + phy_device_free(phy);
>> >> + return rc;
>> >> + }
>> >
>> > Maybe this should be consistent. All other places there is an error,
>> > you return it. Here however, you only return the error if it is
>> > EPROBE_DEFER.
>>
>> That's because of the "else" branch in the code below:
>>
>> if (rc > 0) {
>> phy->irq = rc;
>> mdio->irq[addr] = rc;
>> } else {
>> phy->irq = mdio->irq[addr];
>> }
>>
>> cfr. the marked part of the patch description.
>> I didn't want to change that behavior, as it's not clear to me why it's 
>> handled
>> that way.
>
> So there seems to be 3 conditions that need handling:
>
> 1) of_irq_get() gives us an interrupt number.
> 2) of_irq_get() indicates there is no irq in the device tree.
> 3) of_irq_get() indicates a real error
>
> 1) We have.
>
> 2) We should fall back to using the mdio busses irq for the
> device. There are a couple of mdio drivers which do this, e.g.
> stmicro/stmmac/stmmac_mdio.c. mdiobus_alloc() ensures it is set to
> PHY_POLL, so if the driver does not set it, we poll.
>
> 3) This is new. We have two choices. Ignore the error and poll. Or
> return the error. Historically we have ignored the error. But should
> we? I would probably return the error, now that we can. But...

The issue itself isn't new, though.
I reported it in "of_mdiobus_register_phy() and deferred probe"
(https://lkml.org/lkml/2015/10/22/377), and posted a workaround in "[PATCH v2]
irqchip/renesas-irqc: Postpone driver initialization"
(https://lkml.org/lkml/2016/11/8/794).

Due to the fallback to polling, so far it was easier to complain when
someone broke polling, than to fix the real problem ;-)

But when I saw Thomas' patch[*] for of_irq_to_resource(), the time was ripe
to tackle the root cause.

[*] 
https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git/commit/?h=dt/next=7a4228bbff769ebf449981a4248616db9f0cffec

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS

2017-05-18 Thread Andrew Lunn

Hi Florian

I agree we should define this, and we can add it to
Documentation/ABI/testing/sysfs-class-net-statistics

> - BQL cares about bytes sent on the wire, so that should not include
> pre/appended descriptors nor the FCS (nor the Ethernet preamble),
> tx_bytes should be equivalent to that

Can you point me at some documentation/code which shows this?

pre/appended descriptors i can understand, since it does not make it
to the wire. FCS does. Preamble and inter-frame gap also does make it
to the wire, and contributes to the overall load on the medium. But i
would expect BQL is tolerant to this. We are talking about an error of
about 0.26% for a full MTU frame if FCS is included when it should not
be.

If BQL really does care about not including the FCS, we probably have
a lot less to do. People should of audited their code when they added
support for BQL :-)

Andrew

Re: [PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()

2017-05-18 Thread Nathan Fontenot

On 05/18/2017 10:24 AM, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Add the missing unlock before return from function __ibmvnic_reset()
> in the error handling case.
> 
> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> Signed-off-by: Wei Yongjun 

Reviewed-by: Nathan Fontenot 

> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
> b/drivers/net/ethernet/ibm/ibmvnic.c
> index 4f2d329..27f7933 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1313,6 +1313,7 @@ static void __ibmvnic_reset(struct work_struct *work)
> 
>   if (rc) {
>   free_all_rwi(adapter);
> + mutex_unlock(>reset_lock);
>   return;
>   }
>

Re: [PATCH v2 net-next] qed: Utilize FW 8.20.0.0

2017-05-18 Thread David Miller

From: Yuval Mintz 
Date: Thu, 18 May 2017 19:41:04 +0300

> This pushes qed [and as result, all qed* drivers] into using 8.20.0.0
> firmware. The changes are mostly contained in qed with minor changes
> to qedi due to some HSI changes.
> 
> Content-wise, the firmware contains fixes to various issues exposed
> since the release of the previous firmware, including:
>  - Corrects iSCSI fast retransmit when data digest is enabled.
>  - Stop draining packets when receiving several consecutive PFCs.
>  - Prevent possible assertion when consecutively opening/closing
>many connections.
>  - Prevent possible assertion due to too long BDQ fetch time.
> 
> In addition, the new firmware would allow us to later add iWARP support
> in qed and qedr.
> 
> Changes from previous version
> -
>  - V2: Fix warning in qed_debug.c 
> 
> Signed-off-by: Chad Dupuis 
> Signed-off-by: Ram Amrani 
> Signed-off-by: Tomer Tayar 
> Signed-off-by: Manish Rangankar 
> Signed-off-by: Yuval Mintz 

Applied, hopefully this one goes more smoothly.

Thanks.

Re: [PATCH net-next] tcp: fix tcp_rearm_rto()

2017-05-18 Thread David Miller

From: Eric Dumazet 
Date: Thu, 18 May 2017 09:15:58 -0700

> From: Eric Dumazet 
> 
> skbs in (re)transmit queue no longer have a copy of jiffies
> at the time of the transmit : skb->skb_mstamp is now in usec unit,
> with no correlation to tcp_jiffies32.
> 
> We have to convert rto from jiffies to usec, compute a time difference
> in usec, then convert the delta to HZ units.
> 
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.

Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS

2017-05-18 Thread Florian Fainelli

On 05/18/2017 08:22 AM, David Miller wrote:
> From: Andrew Lunn 
> Date: Thu, 18 May 2017 17:09:25 +0200
> 
>> Since these are software counters, they can be consistent. From a
>> practical point of view, i doubt they ever will all be consistent,
>> there are simply too many drivers to test and change if
>> needed. However, for the ones somebody cares about, they can be made
>> consistent.
>>
>> I care about r8152, and would like to make it consistent with asix,
>> dsa, e1000e.
> 
> No objection from me for making software counters consistent.
> 

No objection for me as well, but I think we need to agree on what these
software counters represent, since there are several cases:

- BQL cares about bytes sent on the wire, so that should not include
pre/appended descriptors nor the FCS (nor the Ethernet preamble),
tx_bytes should be equivalent to that

- if we don't include the FCS on transmit, why should we include it on
receive? rx_bytes should have the same rules as tx_bytes: no
status/descriptor bytes, no FCS etc.
-- 
Florian

Re: [PATCH net-next] tcp: fix tcp_rearm_rto()

2017-05-18 Thread Soheil Hassas Yeganeh

On Thu, May 18, 2017 at 12:15 PM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> skbs in (re)transmit queue no longer have a copy of jiffies
> at the time of the transmit : skb->skb_mstamp is now in usec unit,
> with no correlation to tcp_jiffies32.
>
> We have to convert rto from jiffies to usec, compute a time difference
> in usec, then convert the delta to HZ units.
>
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet 

Acked-by: Soheil Hassas Yeganeh 

Thank you for the quick fix, Eric!

Re: [PATCH] net1080: Mark nc_dump_ttl() as __maybe_unused

2017-05-18 Thread Matthias Kaehlcke

Hi David,

El Thu, May 18, 2017 at 10:48:08AM -0400 David Miller ha dit:

> From: Matthias Kaehlcke 
> Date: Wed, 17 May 2017 15:17:08 -0700
> 
> > The function is not used, but it looks useful for debugging. Adding the
> > attribute fixes the following clang warning:
> > 
> > drivers/net/usb/net1080.c:271:20: error: unused function
> > 'nc_dump_ttl' [-Werror,-Wunused-function]
> > 
> > Signed-off-by: Matthias Kaehlcke 
> 
> For this and the r8152 patch, I definitely prefer that the function is
> removed.
> 
> If someone needs them, they can pull it out of the GIT history.

Thanks for you comments, I'll send out updated patches soon.

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree

On 18/05/17 15:49, Edward Cree wrote:
> Here's one idea that seemed to work when I did a couple of experiments:
> let A = (a;am), B = (b;bm) where the m are the masks
> Σ = am + bm + a + b
> χ = Σ ^ (a + b) /* unknown carries */
> μ = χ | am | bm /* mask of result */
> then A + B = ((a + b) & ~μ; μ)
>
> The idea is that we find which bits change between the case "all x are
>  1" and "all x are 0", and those become xs too.
And now I've found a similar algorithm for subtraction, which (again) I
 can't prove but it seems to work.
α = a + am - b
β = a - b - bm
χ = α ^ β
μ = χ | α | β
then A - B = ((a - b) & ~μ; μ)
Again we're effectively finding the max. and min. values, and XORing
 them to find unknown carries.

Bitwise operations are easy, of course;
/* By assumption, a & am == b & bm == 0 */
A & B = (a & b; (a | am) & (b | bm) & ~(a & b))
A | B = (a | b; (am | bm) & ~(a | b))
/* It bothers me that & and | aren't symmetric, but I can't fix it */
A ^ B = (a ^ b; am | bm)

as are shifts by a constant (just shift 0s into both number and mask).

Multiplication by a constant can be done by decomposing into shifts
 and adds; but it can also be done directly; here we find (a;am) * k.
π = a * k
γ = am * k
then A * k = (π; 0) + (0; γ), for which we use our addition algo.

Multiplication of two unknown values is a nightmare, as unknown bits
 can propagate all over the place.  We can do a shift-add
 decomposition where the adds for unknown bits have all the 1s in
 the addend replaced with xs.  A few experiments suggest that this
 works, regardless of the order of operands.  For instance
 110x * x01 comes out as either
110x
+ xx0x
= 0x
or
 x0x
   x01
+ x01
= 0x
We can slightly optimise this by handling all the 1 bits in one go;
 that is, for (a;am) * (b;bm) we first find (a;am) * b using our
 multiplication-by-a-constant algo above, then for each bit in bm
 we find (a;am) * bit and force all its nonzero bits to unknown;
 finally we add all our components.

Don't even ask about division; that scrambles bits so hard that the
 only thing you can say for sure is that the leading 0s in the
 numerator stay 0 in the result.  The only exception is divisions
 by a constant which can be converted into a shift, or divisions
 of a constant by another constant; if the numerator has any xs and
 the denominator has more than one 1, everything to the right of the
 first x is totally unknown in general.

-Ed

Re: [PATCH 2/2] sh_eth: Do not print an error message for probe deferral

2017-05-18 Thread Sergei Shtylyov


On 05/18/2017 04:01 PM, Geert Uytterhoeven wrote:


EPROBE_DEFER is not an error, hence printing an error message like

sh-eth ee70.ethernet: failed to initialise MDIO

may confuse the user.

To fix this, suppress the error message in case of probe deferral.
While at it, shorten the message, and add the actual error code.

Signed-off-by: Geert Uytterhoeven 


Acked-by: Sergei Shtylyov 

MBR, Sergei

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Andrew Lunn

> >>   phy = get_phy_device(mdio, addr, is_c45);
> >>   if (IS_ERR(phy))
> >> - return;
> >> + return PTR_ERR(phy);
> >>
> >> - rc = irq_of_parse_and_map(child, 0);
> >> + rc = of_irq_get(child, 0);
> >> + if (rc == -EPROBE_DEFER) {
> >> + phy_device_free(phy);
> >> + return rc;
> >> + }
> >
> > Maybe this should be consistent. All other places there is an error,
> > you return it. Here however, you only return the error if it is
> > EPROBE_DEFER.
> 
> That's because of the "else" branch in the code below:
> 
> if (rc > 0) {
> phy->irq = rc;
> mdio->irq[addr] = rc;
> } else {
> phy->irq = mdio->irq[addr];
> }
> 
> cfr. the marked part of the patch description.
> I didn't want to change that behavior, as it's not clear to me why it's 
> handled
> that way.

So there seems to be 3 conditions that need handling:

1) of_irq_get() gives us an interrupt number.
2) of_irq_get() indicates there is no irq in the device tree.
3) of_irq_get() indicates a real error

1) We have.

2) We should fall back to using the mdio busses irq for the
device. There are a couple of mdio drivers which do this, e.g.
stmicro/stmmac/stmmac_mdio.c. mdiobus_alloc() ensures it is set to
PHY_POLL, so if the driver does not set it, we poll.

3) This is new. We have two choices. Ignore the error and poll. Or
return the error. Historically we have ignored the error. But should
we? I would probably return the error, now that we can. But...

Florian?

Andrew

Re: [PATCH 1/2] sh_eth: Use platform device for printing before register_netdev()

2017-05-18 Thread Sergei Shtylyov


On 05/18/2017 04:01 PM, Geert Uytterhoeven wrote:


The MDIO initialization failure message is printed using the network
device, before it has been registered, leading to:

 (null): failed to initialise MDIO

Use the platform device instead to fix this:

sh-eth ee70.ethernet: failed to initialise MDIO

Fixes: daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the network 
device")
Signed-off-by: Geert Uytterhoeven 


Acked-by: Sergei Shtylyov 

MBR, Sergei

[PATCH net-next] net: sched: provide stubs for tcf_chain_{get,put} for CONFIG_NET_CLS=n

2017-05-18 Thread Sabrina Dubroca

This also changes tcf_chain_get() to return an error pointer instead of
NULL, so that tcf_action_goto_chain_init() can differentiate memory
allocation failure from lack of support.

Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Signed-off-by: Sabrina Dubroca 
---
I'm not sure this EOPNOTSUPP is really necessary, ie if we can really
reach the tcf_action_goto_chain_init() call when CONFIG_NET_CLS=n.
If not, a simpler patch would add a tcf_chain_get() stub that just
returns NULL, as we wouldn't have to care about returning an incorrect
error code from tcf_action_goto_chain_init().

 include/net/pkt_cls.h |  7 +++
 net/sched/act_api.c   | 10 +++---
 net/sched/cls_api.c   | 10 +-
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2c213a69c196..ad0d2899529f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -27,6 +27,13 @@ int tcf_classify(struct sk_buff *skb, const struct tcf_proto 
*tp,
 struct tcf_result *res, bool compat_mode);
 
 #else
+static inline struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 
chain_index)
+{
+   return ERR_PTR(-EOPNOTSUPP);
+}
+static inline void tcf_chain_put(struct tcf_chain *chain)
+{
+}
 static inline
 int tcf_block_get(struct tcf_block **p_block,
  struct tcf_proto __rcu **p_filter_chain)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 0ecf2a858767..502e0bbf35a6 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -31,12 +31,16 @@
 static int tcf_action_goto_chain_init(struct tc_action *a, struct tcf_proto 
*tp)
 {
u32 chain_index = a->tcfa_action & TC_ACT_EXT_VAL_MASK;
+   struct tcf_chain *chain;
 
if (!tp)
return -EINVAL;
-   a->goto_chain = tcf_chain_get(tp->chain->block, chain_index);
-   if (!a->goto_chain)
-   return -ENOMEM;
+
+   chain = tcf_chain_get(tp->chain->block, chain_index);
+   if (IS_ERR(chain))
+   return PTR_ERR(chain);
+
+   a->goto_chain = chain;
return 0;
 }
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4020b8d932a1..8c14af3b77ae 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -193,7 +193,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block 
*block,
 
chain = kzalloc(sizeof(*chain), GFP_KERNEL);
if (!chain)
-   return NULL;
+   return ERR_PTR(-ENOMEM);
list_add_tail(>list, >chain_list);
chain->block = block;
chain->index = chain_index;
@@ -256,8 +256,8 @@ int tcf_block_get(struct tcf_block **p_block,
INIT_LIST_HEAD(>chain_list);
/* Create chain 0 by default, it has to be always present. */
chain = tcf_chain_create(block, 0);
-   if (!chain) {
-   err = -ENOMEM;
+   if (IS_ERR(chain)) {
+   err = PTR_ERR(chain);
goto err_chain_create;
}
tcf_chain_filter_chain_ptr_set(chain, p_filter_chain);
@@ -503,8 +503,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
goto errout;
}
chain = tcf_chain_get(block, chain_index);
-   if (!chain) {
-   err = -ENOMEM;
+   if (IS_ERR(chain)) {
+   err = PTR_ERR(chain);
goto errout;
}
 
-- 
2.13.0

[PATCH net-next] tcp: fix tcp_rearm_rto()

2017-05-18 Thread Eric Dumazet

From: Eric Dumazet 

skbs in (re)transmit queue no longer have a copy of jiffies
at the time of the transmit : skb->skb_mstamp is now in usec unit,
with no correlation to tcp_jiffies32.

We have to convert rto from jiffies to usec, compute a time difference
in usec, then convert the delta to HZ units.

Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
9a5a9e8eda899666501cca06b37948ab64ae79b2..6db6b47e2bbc09aae2627a109e5a1ee9a3f4fe4e
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3002,14 +3002,14 @@ void tcp_rearm_rto(struct sock *sk)
if (icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT ||
icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
struct sk_buff *skb = tcp_write_queue_head(sk);
-   const u32 rto_time_stamp =
-   tcp_skb_timestamp(skb) + rto;
-   s32 delta = (s32)(rto_time_stamp - tcp_jiffies32);
-   /* delta may not be positive if the socket is locked
+   u64 rto_time_stamp = skb->skb_mstamp +
+jiffies_to_usecs(rto);
+   s64 delta_us = rto_time_stamp - tp->tcp_mstamp;
+   /* delta_us may not be positive if the socket is locked
 * when the retrans timer fires and is rescheduled.
 */
-   if (delta > 0)
-   rto = delta;
+   if (delta_us > 0)
+   rto = usecs_to_jiffies(delta_us);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto,
  TCP_RTO_MAX);

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Geert Uytterhoeven

Hi Andrew,

On Thu, May 18, 2017 at 6:09 PM, Andrew Lunn  wrote:
> On Thu, May 18, 2017 at 02:59:05PM +0200, Geert Uytterhoeven wrote:
>> If an Ethernet PHY is initialized before the interrupt controller it is
>> connected to, a message like the following is printed:
>>
>> irq: no irq domain found for /interrupt-controller@e61c !
>>
>> However, the actual error is ignored, leading to a non-functional (-1)
>> PHY interrupt later:
>>
>> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver 
>> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1)
>>
>> Depending on whether the PHY driver will fall back to polling, Ethernet
>> may or may not work.
>>
>> To fix this:
>>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>>  of_irq_get().
>>  Unlike the former, the latter returns -EPROBE_DEFER if the
>>  interrupt controller is not yet available, so this condition can be
>>  detected.
>>  Other errors are handled the same as before, i.e. use the passed

>>  mdio->irq[addr] as interrupt.
^
>>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>>  of_mdiobus_register_device().
>>
>> Signed-off-by: Geert Uytterhoeven 
>> ---
>> Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver.
>> I assume it always happened on RZ/G1 in mainline.
>> ---
>>  drivers/of/of_mdio.c | 39 +++
>>  1 file changed, 27 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
>> index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644
>> --- a/drivers/of/of_mdio.c
>> +++ b/drivers/of/of_mdio.c
>> @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 
>> *phy_id)
>>   return -EINVAL;
>>  }
>>
>> -static void of_mdiobus_register_phy(struct mii_bus *mdio,
>> +static int of_mdiobus_register_phy(struct mii_bus *mdio,
>>   struct device_node *child, u32 addr)
>>  {
>>   struct phy_device *phy;
>> @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>>   else
>>   phy = get_phy_device(mdio, addr, is_c45);
>>   if (IS_ERR(phy))
>> - return;
>> + return PTR_ERR(phy);
>>
>> - rc = irq_of_parse_and_map(child, 0);
>> + rc = of_irq_get(child, 0);
>> + if (rc == -EPROBE_DEFER) {
>> + phy_device_free(phy);
>> + return rc;
>> + }
>
> Maybe this should be consistent. All other places there is an error,
> you return it. Here however, you only return the error if it is
> EPROBE_DEFER.

That's because of the "else" branch in the code below:

if (rc > 0) {
phy->irq = rc;
mdio->irq[addr] = rc;
} else {
phy->irq = mdio->irq[addr];
}

cfr. the marked part of the patch description.
I didn't want to change that behavior, as it's not clear to me why it's handled
that way.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread Andrew Lunn

On Thu, May 18, 2017 at 02:59:05PM +0200, Geert Uytterhoeven wrote:
> If an Ethernet PHY is initialized before the interrupt controller it is
> connected to, a message like the following is printed:
> 
> irq: no irq domain found for /interrupt-controller@e61c !
> 
> However, the actual error is ignored, leading to a non-functional (-1)
> PHY interrupt later:
> 
> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver 
> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1)
> 
> Depending on whether the PHY driver will fall back to polling, Ethernet
> may or may not work.
> 
> To fix this:
>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>  of_irq_get().
>  Unlike the former, the latter returns -EPROBE_DEFER if the
>  interrupt controller is not yet available, so this condition can be
>  detected.
>  Other errors are handled the same as before, i.e. use the passed
>  mdio->irq[addr] as interrupt.
>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>  of_mdiobus_register_device().
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
> Seen on r8a7791/koelsch when using the new CPG/MSSR clock driver.
> I assume it always happened on RZ/G1 in mainline.
> ---
>  drivers/of/of_mdio.c | 39 +++
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
> index 7e4c80f9b6cda0d3..f9ac2893f56184be 100644
> --- a/drivers/of/of_mdio.c
> +++ b/drivers/of/of_mdio.c
> @@ -44,7 +44,7 @@ static int of_get_phy_id(struct device_node *device, u32 
> *phy_id)
>   return -EINVAL;
>  }
>  
> -static void of_mdiobus_register_phy(struct mii_bus *mdio,
> +static int of_mdiobus_register_phy(struct mii_bus *mdio,
>   struct device_node *child, u32 addr)
>  {
>   struct phy_device *phy;
> @@ -60,9 +60,13 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>   else
>   phy = get_phy_device(mdio, addr, is_c45);
>   if (IS_ERR(phy))
> - return;
> + return PTR_ERR(phy);
>  
> - rc = irq_of_parse_and_map(child, 0);
> + rc = of_irq_get(child, 0);
> + if (rc == -EPROBE_DEFER) {
> + phy_device_free(phy);
> + return rc;
> + }

Maybe this should be consistent. All other places there is an error,
you return it. Here however, you only return the error if it is
EPROBE_DEFER.

Andrew


>   if (rc > 0) {
>   phy->irq = rc;
>   mdio->irq[addr] = rc;
> @@ -84,22 +88,23 @@ static void of_mdiobus_register_phy(struct mii_bus *mdio,
>   if (rc) {
>   phy_device_free(phy);
>   of_node_put(child);
> - return;
> + return rc;
>   }
>  
>   dev_dbg(>dev, "registered phy %s at address %i\n",
>   child->name, addr);
> + return 0;
>  }
>

Re: [PATCH] xfrm: fix state migration replay sequence numbers

2017-05-18 Thread Richard Guy Briggs

On 2017-05-18 16:39, Antony Antony wrote:
> During xfrm migration replay and preplay sequence numbers are not 
> copied from the previous state. 
> 
> Here is tcpdump output showing the problem.
> 10.0.10.46 is running vanilla kernel, IKE/IPsec responder.
> After the migration it sent wrong sequence number, reset to 1.
> The migration is from 10.0.0.52 to 10.0.0.53.
> 
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7cf), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: 
> ESP(spi=0xca1c282d,seq=0x7cf), length 136
> IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d0), length 136
> IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: 
> ESP(spi=0xca1c282d,seq=0x7d0), length 136
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
> IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
> 
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d1), length 136
> 
> NOTE: next sequence is wrong 0x1
> 
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), 
> length 136
> IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: 
> ESP(spi=0x43ef462d,seq=0x7d2), length 136
> IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), 
> length 136
> 
> The attached patch fix it by copying replay and preplay.
> 
> regards,
> -antony
> 
> Antony Antony (1):
>   xfrm: fix state migration replay sequence numbers
> 
>  net/xfrm/xfrm_state.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> -- 
> 2.9.3
> 

> >From 1241e8b4c38ad2bf7399599165f763af38aba8d9 Mon Sep 17 00:00:00 2001
> From: Antony Antony 
> Date: Thu, 18 May 2017 12:19:32 +0200
> Subject: [PATCH] xfrm: fix state migration copy replay sequence numbers
> To: netdev@vger.kernel.org, Herbert Xu , Steffen 
> Klassert 
> Cc: Richard Guy Briggs 
> 
> During xfrm migration copy replay and preplay sequence numbers
> from the previous state.
> 
> Signed-off-by: Antony Antony 
> ---
>  net/xfrm/xfrm_state.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index fc3c5aa..2e291bc 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct 
> xfrm_state *orig)
>   x->curlft.add_time = orig->curlft.add_time;
>   x->km.state = orig->km.state;
>   x->km.seq = orig->km.seq;
> + x->replay = orig->replay;
> + x->preplay = orig->preplay;
>  
>   return x;
>  
> -- 
> 2.9.3

This looks reasonable to me.  With a bit more out-of-band information from
Antony and Paul Wouters we have:

https://tools.ietf.org/html/rfc4555#section-3.5

so while it is not explicit about what is to be copied, it only indicates that
the IPsec SA is to be updated with the new address whereas this implementation
creates a new IPsec SA and copies over the values, missing some.

(Note: using "git format-patch --cover-letter --cc ... -o " and "git
send-email --to ... " work really well together.)

Reviewed-by: Richard Guy Briggs 




slainte mhath, RGB

--
Richard Guy Briggs   --  ~\-- ~\ 
 --  \___   o \@  @Ride yer bike!
Ottawa, ON, CANADA  --  Lo_>__M__\\/\%__\\/\%
Vote! -- _GTVS6#790__(*)__(*)(*)(*)_

Re: [PATCH net-next] net/mlx5e: Fix possible memory leak

2017-05-18 Thread Yuval Shaia

On Thu, May 18, 2017 at 03:34:41PM +, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> 'encap_header' is malloced and should be freed before leaving from
> the error handling cases, otherwise it will cause memory leak.
> 
> Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index 11c27e4..a72ecbc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -1404,8 +1404,8 @@ static int mlx5e_create_encap_header_ipv4(struct 
> mlx5e_priv *priv,
>  
>   if (!(nud_state & NUD_VALID)) {
>   neigh_event_send(n, NULL);
> - neigh_release(n);
> - return -EAGAIN;
> + err = -EAGAIN;
> + goto out;
>   }
>  
>   err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
> @@ -1510,8 +1510,8 @@ static int mlx5e_create_encap_header_ipv6(struct 
> mlx5e_priv *priv,
>  
>   if (!(nud_state & NUD_VALID)) {
>   neigh_event_send(n, NULL);
> - neigh_release(n);
> - return -EAGAIN;
> + err = -EAGAIN;
> + goto out;
>   }

Reviewed-by: Yuval Shaia 

>  
>   err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] xfrm: Make function xfrm_dev_register static

2017-05-18 Thread Wei Yongjun

From: Wei Yongjun 

Fixes the following sparse warning:

net/xfrm/xfrm_device.c:141:5: warning:
 symbol 'xfrm_dev_register' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 net/xfrm/xfrm_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 8ec8a3f..50ec733 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -138,7 +138,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct 
xfrm_state *x)
 }
 EXPORT_SYMBOL_GPL(xfrm_dev_offload_ok);
 
-int xfrm_dev_register(struct net_device *dev)
+static int xfrm_dev_register(struct net_device *dev)
 {
if ((dev->features & NETIF_F_HW_ESP) && !dev->xfrmdev_ops)
return NOTIFY_BAD;

Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails

2017-05-18 Thread Vivien Didelot

Hi Nikolay,

Nikolay Aleksandrov  writes:

>> OK good to know. That intention wasn't obvious. I can make __br_mdb_del
>> return void instead? What about the rest of the patchset if I do so?
>
> If you make it return void we will not be able to return proper error value
> when doing a single operation (the else case). About the rest I see only some
> minor style issues, I'll comment on the respective patches. Another minor nit 
> is 
> using switch() instead of if/else for the message types but that is really up 
> to 
> you, I don't mind either way. :-)

Ho OK I understand better the batch vs single delete operation now.
__br_mdb_do hardly makes sense now, because we don't know which case we
are handling... But factorizing br_mdb_do still makes sense. I'll come
up with something.

Thanks,

Vivien

[RFC net-next PATCH 5/5] mlx5: add XDP rxhash feature for driver mlx5

2017-05-18 Thread Jesper Dangaard Brouer


---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |   98 ++---
 2 files changed, 70 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e43411d232ee..3ae90dbdd3de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3956,6 +3956,9 @@ static void mlx5e_build_nic_netdev(struct net_device 
*netdev)
netdev->hw_features  |= NETIF_F_HW_VLAN_CTAG_RX;
netdev->hw_features  |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
+   /* XDP_DRV_F_ENABLED is added in register_netdevice() */
+   netdev->xdp_features = XDP_DRV_F_RXHASH;
+
if (mlx5e_vxlan_allowed(mdev)) {
netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL |
   NETIF_F_GSO_UDP_TUNNEL_CSUM |
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index ae66fad98244..eb9d859bf09d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -514,14 +514,28 @@ static void mlx5e_lro_update_hdr(struct sk_buff *skb, 
struct mlx5_cqe64 *cqe,
}
 }
 
-static inline void mlx5e_skb_set_hash(struct mlx5_cqe64 *cqe,
- struct sk_buff *skb)
+u8 mlx5_htype_l3_to_xdp[4] = {
+   0,  /* 00 - none */
+   XDP_HASH_TYPE_L3_IPV4,  /* 01 - IPv4 */
+   XDP_HASH_TYPE_L3_IPV6,  /* 10 - IPv6 */
+   0,  /* 11 - Reserved */
+};
+
+u8 mlx5_htype_l4_to_xdp[4] = {
+   0,  /* 00 - none */
+   XDP_HASH_TYPE_L4_TCP,   /* 01 - TCP  */
+   XDP_HASH_TYPE_L4_UDP,   /* 10 - UDP  */
+   0,  /* 11 - IPSEC.SPI */
+};
+
+static inline void mlx5e_xdp_set_hash(struct mlx5_cqe64 *cqe,
+ struct xdp_buff *xdp)
 {
u8 cht = cqe->rss_hash_type;
-   int ht = (cht & CQE_RSS_HTYPE_L4) ? PKT_HASH_TYPE_L4 :
-(cht & CQE_RSS_HTYPE_IP) ? PKT_HASH_TYPE_L3 :
-   PKT_HASH_TYPE_NONE;
-   skb_set_hash(skb, be32_to_cpu(cqe->rss_hash_result), ht);
+   u32 ht = (mlx5_htype_l4_to_xdp[((cht & CQE_RSS_HTYPE_L4) >> 6)] | \
+ mlx5_htype_l3_to_xdp[((cht & CQE_RSS_HTYPE_IP) >> 2)]);
+
+   xdp_record_hash(xdp, be32_to_cpu(cqe->rss_hash_result), ht);
 }
 
 static inline bool is_first_ethertype_ip(struct sk_buff *skb)
@@ -570,7 +584,8 @@ static inline void mlx5e_handle_csum(struct net_device 
*netdev,
 static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 *cqe,
  u32 cqe_bcnt,
  struct mlx5e_rq *rq,
- struct sk_buff *skb)
+ struct sk_buff *skb,
+ struct xdp_buff *xdp)
 {
struct net_device *netdev = rq->netdev;
struct mlx5e_tstamp *tstamp = rq->tstamp;
@@ -593,8 +608,7 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 
*cqe,
 
skb_record_rx_queue(skb, rq->ix);
 
-   if (likely(netdev->features & NETIF_F_RXHASH))
-   mlx5e_skb_set_hash(cqe, skb);
+   xdp_set_skb_hash(xdp, skb);
 
if (cqe_has_vlan(cqe))
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
@@ -609,11 +623,12 @@ static inline void mlx5e_build_rx_skb(struct mlx5_cqe64 
*cqe,
 static inline void mlx5e_complete_rx_cqe(struct mlx5e_rq *rq,
 struct mlx5_cqe64 *cqe,
 u32 cqe_bcnt,
-struct sk_buff *skb)
+struct sk_buff *skb,
+struct xdp_buff *xdp)
 {
rq->stats.packets++;
rq->stats.bytes += cqe_bcnt;
-   mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb);
+   mlx5e_build_rx_skb(cqe, cqe_bcnt, rq, skb, xdp);
 }
 
 static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_xdpsq *sq)
@@ -696,27 +711,27 @@ static inline bool mlx5e_xmit_xdp_frame(struct mlx5e_rq 
*rq,
 /* returns true if packet was consumed by xdp */
 static inline int mlx5e_xdp_handle(struct mlx5e_rq *rq,
   struct mlx5e_dma_info *di,
-  void *va, u16 *rx_headroom, u32 *len)
+  struct xdp_buff *xdp, void *va,
+  u16 *rx_headroom, u32 *len)
 {
const struct bpf_prog *prog = READ_ONCE(rq->xdp_prog);
-   struct xdp_buff xdp;
u32 act;
 
if (!prog)
return false;
 
-   xdp.data = va + *rx_headroom;
-   xdp.data_end = xdp.data + *len;
-   xdp.data_hard_start = va;
+

[RFC net-next PATCH 4/5] net: new XDP feature for reading HW rxhash from drivers

2017-05-18 Thread Jesper Dangaard Brouer

Introducing a new XDP feature and associated bpf helper bpf_xdp_rxhash.

The rxhash and type allow filtering on packets without touching
packet memory.  The performance difference on my system with a
100 Gbit/s mlx5 NIC is 12Mpps to 19Mpps.

TODO: desc RXHASH and associated type, and how XDP choose to map
and export these to bpf_prog's.

TODO: desc how this interacts with XDP driver features system.
---
 include/linux/filter.h  |   31 -
 include/linux/netdev_features.h |4 ++
 include/uapi/linux/bpf.h|   56 +-
 kernel/bpf/verifier.c   |3 ++
 net/core/dev.c  |   16 -
 net/core/filter.c   |   73 +++
 samples/bpf/bpf_helpers.h   |2 +
 tools/include/uapi/linux/bpf.h  |   10 +
 8 files changed, 190 insertions(+), 5 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a7786db14fa..33a254ccd47d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -413,7 +413,8 @@ struct bpf_prog {
locked:1,   /* Program image locked? */
gpl_compatible:1, /* Is filter GPL compatible? 
*/
cb_access:1,/* Is control block accessed? */
-   dst_needed:1;   /* Do we need dst entry? */
+   dst_needed:1,   /* Do we need dst entry? */
+   xdp_rxhash_needed:1;/* Req XDP RXHASH support */
kmemcheck_bitfield_end(meta);
enum bpf_prog_type  type;   /* Type of BPF program */
u32 len;/* Number of filter blocks */
@@ -444,12 +445,40 @@ struct bpf_skb_data_end {
void *data_end;
 };
 
+/* Kernel internal xdp_buff->flags */
+#define XDP_CTX_F_RXHASH_TYPE_MASK XDP_HASH_TYPE_MASK
+#define XDP_CTX_F_RXHASH_TYPE_BITS XDP_HASH_TYPE_BITS
+#define XDP_CTX_F_RXHASH_SW(1ULL <<  XDP_CTX_F_RXHASH_TYPE_BITS)
+#define XDP_CTX_F_RXHASH_HW(1ULL << (XDP_CTX_F_RXHASH_TYPE_BITS+1))
+
 struct xdp_buff {
void *data;
void *data_end;
void *data_hard_start;
+   u64 flags;
+   u32 rxhash;
 };
 
+/* helper functions for driver setting rxhash */
+static inline void
+xdp_record_hash(struct xdp_buff *xdp, u32 hash, u32 type)
+{
+   xdp->flags |= XDP_CTX_F_RXHASH_HW;
+   xdp->flags |= type & XDP_CTX_F_RXHASH_TYPE_MASK;
+   xdp->rxhash = hash;
+}
+
+static inline void
+xdp_set_skb_hash(struct xdp_buff *xdp, struct sk_buff *skb)
+{
+   if (likely(xdp->flags & (XDP_CTX_F_RXHASH_HW|XDP_CTX_F_RXHASH_SW))) {
+   bool is_sw = !!(xdp->flags | XDP_CTX_F_RXHASH_SW);
+   bool is_l4 = !!(xdp->flags & XDP_HASH_TYPE_L4_MASK);
+
+   __skb_set_hash(skb, xdp->rxhash, is_sw, is_l4);
+   }
+}
+
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf, act_bpf and lwt programs
  */
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index ff81ee231410..4b50e8c606c5 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -219,11 +219,13 @@ enum {
 /* XDP driver flags */
 enum {
XDP_DRV_F_ENABLED_BIT,
+   XDP_DRV_F_RXHASH_BIT,
 };
 
 #define __XDP_DRV_F_BIT(bit)   ((netdev_features_t)1 << (bit))
 #define __XDP_DRV_F(name)  __XDP_DRV_F_BIT(XDP_DRV_F_##name##_BIT)
 #define XDP_DRV_F_ENABLED  __XDP_DRV_F(ENABLED)
+#define XDP_DRV_F_RXHASH   __XDP_DRV_F(RXHASH)
 
 /* XDP driver MUST support these features, else kernel MUST reject
  * bpf_prog to guarantee safe access to data structures
@@ -233,7 +235,7 @@ enum {
 /* Some XDP features are under development. Based on bpf_prog loading
  * detect if kernel feature can be activated.
  */
-#define XDP_DRV_FEATURES_DEVEL 0
+#define XDP_DRV_FEATURES_DEVEL XDP_DRV_F_RXHASH
 
 /* Some XDP features are optional, like action return code, as they
  * are handled safely runtime.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 945a1f5f63c5..1d9d3a46217d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -482,6 +482,9 @@ union bpf_attr {
  * Get the owner uid of the socket stored inside sk_buff.
  * @skb: pointer to skb
  * Return: uid of the socket owner on success or overflowuid if failed.
+ *
+ * u64 bpf_xdp_rxhash(xdp_md, new_hash, type, flags)
+ * TODO: MISSING DESC
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -531,7 +534,8 @@ union bpf_attr {
FN(xdp_adjust_head),\
FN(probe_read_str), \
FN(get_socket_cookie),  \
-   FN(get_socket_uid),
+   FN(get_socket_uid), \
+   FN(xdp_rxhash),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function

[RFC net-next PATCH 3/5] net: introduce XDP driver features interface

2017-05-18 Thread Jesper Dangaard Brouer

There is a fundamental difference between normal eBPF programs
and (XDP) eBPF programs getting attached in a driver. For normal
eBPF programs it is easy to add a new bpf feature, like a bpf
helper, because is it strongly tied to the feature being
available in the current core kernel code.  When drivers invoke a
bpf_prog, then it is not sufficient to simply relying on whether
a bpf_helper exists or not.  When a driver haven't implemented a
given feature yet, then it is possible to expose uninitialized
parts of xdp_buff.  The driver pass in a pointer to xdp_buff,
usually "allocated" on the stack, which must not be exposed.

Only two user visible NETIF_F_XDP_* net_device feature flags are
exposed via ethtool (-k) seen as "xdp" and "xdp-partial".
The "xdp-partial" is detected when there is not feature equality
between kernel and driver, and a netdev_warn is given.

The idea is that XDP_DRV_* feature bits define a contract between
the driver and the kernel, giving a reliable way to know that XDP
features a driver promised to implement. Thus, knowing what bpf
side features are safe to allow.

There are 3 levels of features: "required", "devel" and "optional".

The motivation is pushing driver vendors forward to support all
the new XDP features.  Once a given feature bit is moved into
the "required" features, the kernel will reject loading XDP
program if feature isn't implemented by driver.  Features under
developement, require help from the bpf infrastrucure to detect
when a given helper or direct-access is used, using a bpf_prog
bit to mark a need for the feature, and pulling in this bit in
the xdp_features_check().  When all drivers have implemented
a "devel" feature, it can be moved to the "required" feature and
the bpf_prog bit can be refurbished. The "optional" features are
for things that are handled safely runtime, but drivers will
still get flagged as "xdp-partial" if not implementing those.
---
 include/linux/netdev_features.h |   32 
 include/linux/netdevice.h   |1 +
 net/core/dev.c  |   34 ++
 net/core/ethtool.c  |2 ++
 4 files changed, 69 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 1d4737cffc71..ff81ee231410 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -77,6 +77,8 @@ enum {
NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload 
*/
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
 
+   NETIF_F_XDP_BASELINE_BIT,   /* Driver supports XDP */
+   NETIF_F_XDP_PARTIAL_BIT,/* not supporting all XDP features */
/*
 * Add your fresh new feature above and remember to update
 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -140,6 +142,8 @@ enum {
 #define NETIF_F_HW_TC  __NETIF_F(HW_TC)
 #define NETIF_F_HW_ESP __NETIF_F(HW_ESP)
 #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
+#define NETIF_F_XDP_BASELINE   __NETIF_F(XDP_BASELINE)
+#define NETIF_F_XDP_PARTIAL__NETIF_F(XDP_PARTIAL)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
@@ -212,4 +216,32 @@ enum {
 NETIF_F_GSO_UDP_TUNNEL |   \
 NETIF_F_GSO_UDP_TUNNEL_CSUM)
 
+/* XDP driver flags */
+enum {
+   XDP_DRV_F_ENABLED_BIT,
+};
+
+#define __XDP_DRV_F_BIT(bit)   ((netdev_features_t)1 << (bit))
+#define __XDP_DRV_F(name)  __XDP_DRV_F_BIT(XDP_DRV_F_##name##_BIT)
+#define XDP_DRV_F_ENABLED  __XDP_DRV_F(ENABLED)
+
+/* XDP driver MUST support these features, else kernel MUST reject
+ * bpf_prog to guarantee safe access to data structures
+ */
+#define XDP_DRV_FEATURES_REQUIRED  XDP_DRV_F_ENABLED
+
+/* Some XDP features are under development. Based on bpf_prog loading
+ * detect if kernel feature can be activated.
+ */
+#define XDP_DRV_FEATURES_DEVEL 0
+
+/* Some XDP features are optional, like action return code, as they
+ * are handled safely runtime.
+ */
+#define XDP_DRV_FEATURES_OPTIONAL  0
+
+#define XDP_DRV_FEATURES_MASK  (XDP_DRV_FEATURES_REQUIRED |\
+XDP_DRV_FEATURES_DEVEL |   \
+XDP_DRV_FEATURES_OPTIONAL)
+
 #endif /* _LINUX_NETDEV_FEATURES_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9c23bd2efb56..329ae156ff65 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1685,6 +1685,7 @@ struct net_device {
netdev_features_t   hw_enc_features;
netdev_features_t   mpls_features;
netdev_features_t   gso_partial_features;
+   netdev_features_t   xdp_features;
 
int ifindex;
int group;
diff --git

[RFC net-next PATCH 2/5] mlx5: fix bug reading rss_hash_type from CQE

2017-05-18 Thread Jesper Dangaard Brouer

Masks for extracting part of the Completion Queue Entry (CQE)
field rss_hash_type was swapped, namely CQE_RSS_HTYPE_IP and
CQE_RSS_HTYPE_L4.

The bug resulted in setting skb->l4_hash, even-though the
rss_hash_type indicated that hash was NOT computed over the
L4 (UDP or TCP) part of the packet.

Added comments from the datasheet, to make it more clear what
these masks are selecting.

Signed-off-by: Jesper Dangaard Brouer 
---
 include/linux/mlx5/device.h |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index dd9a263ed368..a940ec6a046c 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -787,8 +787,14 @@ enum {
 };
 
 enum {
-   CQE_RSS_HTYPE_IP= 0x3 << 6,
-   CQE_RSS_HTYPE_L4= 0x3 << 2,
+   CQE_RSS_HTYPE_IP= 0x3 << 2,
+   /* cqe->rss_hash_type[3:2] - IP destination selected for hash
+* (00 = none,  01 = IPv4, 10 = IPv6, 11 = Reserved)
+*/
+   CQE_RSS_HTYPE_L4= 0x3 << 6,
+   /* cqe->rss_hash_type[7:6] - L4 destination selected for hash
+* (00 = none, 01 = TCP. 10 = UDP, 11 = IPSEC.SPI
+*/
 };
 
 enum {

[RFC net-next PATCH 0/5] XDP driver feature API and handling change to xdp_buff

2017-05-18 Thread Jesper Dangaard Brouer

I would like some comments on introducing a feature API between XDP
drives and XDP/BPF core.  The primary issue is when extending struct
xdp_buff, today, drivers not implementing this feature can access
uninitilized memory, using bpf-helper associated with the feature.

---

Jesper Dangaard Brouer (5):
  samples/bpf: xdp_tx_iptunnel make use of map_data[]
  mlx5: fix bug reading rss_hash_type from CQE
  net: introduce XDP driver features interface
  net: new XDP feature for reading HW rxhash from drivers
  mlx5: add XDP rxhash feature for driver mlx5


 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |3 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |   98 ++---
 include/linux/filter.h|   31 ++-
 include/linux/mlx5/device.h   |   10 ++
 include/linux/netdev_features.h   |   34 +++
 include/linux/netdevice.h |1 
 include/uapi/linux/bpf.h  |   56 
 kernel/bpf/verifier.c |3 +
 net/core/dev.c|   48 ++
 net/core/ethtool.c|2 
 net/core/filter.c |   73 
 samples/bpf/bpf_helpers.h |2 
 samples/bpf/xdp_tx_iptunnel_common.h  |2 
 samples/bpf/xdp_tx_iptunnel_kern.c|2 
 samples/bpf/xdp_tx_iptunnel_user.c|   14 ++-
 tools/include/uapi/linux/bpf.h|   10 ++
 16 files changed, 345 insertions(+), 44 deletions(-)

--

[RFC net-next PATCH 1/5] samples/bpf: xdp_tx_iptunnel make use of map_data[]

2017-05-18 Thread Jesper Dangaard Brouer

There is no reason to use a compile time constant MAX_IPTNL_ENTRIES
shared between the _user.c and _kern.c, when map_data[].def.max_entries
can tell us dynamically what the max_entries were of the ELF map that
the bpf loaded created.

Signed-off-by: Jesper Dangaard Brouer 
---
 samples/bpf/xdp_tx_iptunnel_common.h |2 --
 samples/bpf/xdp_tx_iptunnel_kern.c   |2 +-
 samples/bpf/xdp_tx_iptunnel_user.c   |   14 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/samples/bpf/xdp_tx_iptunnel_common.h 
b/samples/bpf/xdp_tx_iptunnel_common.h
index dd12cc35110f..b065699cacb5 100644
--- a/samples/bpf/xdp_tx_iptunnel_common.h
+++ b/samples/bpf/xdp_tx_iptunnel_common.h
@@ -9,8 +9,6 @@
 
 #include 
 
-#define MAX_IPTNL_ENTRIES 256U
-
 struct vip {
union {
__u32 v6[4];
diff --git a/samples/bpf/xdp_tx_iptunnel_kern.c 
b/samples/bpf/xdp_tx_iptunnel_kern.c
index 0f4f6e8c8611..b19489eb3c22 100644
--- a/samples/bpf/xdp_tx_iptunnel_kern.c
+++ b/samples/bpf/xdp_tx_iptunnel_kern.c
@@ -30,7 +30,7 @@ struct bpf_map_def SEC("maps") vip2tnl = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(struct vip),
.value_size = sizeof(struct iptnl_info),
-   .max_entries = MAX_IPTNL_ENTRIES,
+   .max_entries = 256,
 };
 
 static __always_inline void count_tx(u32 protocol)
diff --git a/samples/bpf/xdp_tx_iptunnel_user.c 
b/samples/bpf/xdp_tx_iptunnel_user.c
index 92b8bde9337c..0500a5cc75c4 100644
--- a/samples/bpf/xdp_tx_iptunnel_user.c
+++ b/samples/bpf/xdp_tx_iptunnel_user.c
@@ -123,11 +123,6 @@ static int parse_ports(const char *port_str, int 
*min_port, int *max_port)
return 1;
}
 
-   if (tmp_max_port - tmp_min_port + 1 > MAX_IPTNL_ENTRIES) {
-   fprintf(stderr, "Port range (%s) is larger than %u\n",
-   port_str, MAX_IPTNL_ENTRIES);
-   return 1;
-   }
*min_port = tmp_min_port;
*max_port = tmp_max_port;
 
@@ -142,6 +137,7 @@ int main(int argc, char **argv)
int min_port = 0, max_port = 0;
struct iptnl_info tnl = {};
struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+   unsigned int entries, max_entries;
struct vip vip = {};
char filename[256];
int opt;
@@ -238,6 +234,14 @@ int main(int argc, char **argv)
return 1;
}
 
+   entries = max_port - min_port + 1;
+   max_entries = map_data[1].def.max_entries;
+   if (entries > max_entries) {
+   fprintf(stderr, "Req port entries (%u) is larger than max %u\n",
+   entries, max_entries);
+   return 1;
+   }
+
signal(SIGINT, int_exit);
 
while (min_port <= max_port) {

[PATCH net-next] net/mlx5e: Fix possible memory leak

2017-05-18 Thread Wei Yongjun

From: Wei Yongjun 

'encap_header' is malloced and should be freed before leaving from
the error handling cases, otherwise it will cause memory leak.

Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 11c27e4..a72ecbc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1404,8 +1404,8 @@ static int mlx5e_create_encap_header_ipv4(struct 
mlx5e_priv *priv,
 
if (!(nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-   neigh_release(n);
-   return -EAGAIN;
+   err = -EAGAIN;
+   goto out;
}
 
err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,
@@ -1510,8 +1510,8 @@ static int mlx5e_create_encap_header_ipv6(struct 
mlx5e_priv *priv,
 
if (!(nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-   neigh_release(n);
-   return -EAGAIN;
+   err = -EAGAIN;
+   goto out;
}
 
err = mlx5_encap_alloc(priv->mdev, e->tunnel_type,

Re: [PATCH] net: sched: fix a use-after-free error on chain on the error exit path

2017-05-18 Thread David Miller

From: Colin King 
Date: Thu, 18 May 2017 15:07:02 +0100

> From: Colin Ian King 
> 
> Set chain to null after the call to tcf_chain_destroy so that we don't
> call tcf_chain_put on the error exit path, thus avoiding a use-after-free
> error.
> 
> Detected by CoverityScan, CID#1436357 ("Use after free")
> 
> Signed-off-by: Colin Ian King 

Colin, you really need to make some adjustments to how you are submitting
these kinds of patches.

First of all, you must indicate the target tree in your Subject line
as "[PATCH net-next] " in this case.

Also, you need to add an appropriate Fixes: tag right before your
signoff.

Thank you.

Re: [PATCH net-next 5/6] net: bridge: get msgtype from nlmsghdr in mdb ops

2017-05-18 Thread Nikolay Aleksandrov


On 5/18/17 12:27 AM, Vivien Didelot wrote:

Retrieve the message type from the nlmsghdr structure instead of
hardcoding it in both br_mdb_add and br_mdb_del.

Signed-off-by: Vivien Didelot 
---
  net/bridge/br_mdb.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index a72d5e6f339f..d280b20587cb 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -569,6 +569,7 @@ static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr 
*nlh,
struct net_bridge_port *p;
struct net_bridge_vlan *v;
struct net_bridge *br;
+   int msgtype = nlh->nlmsg_type;


minor nits:
nlmsg_type is a u16, also please keep the order and arrange these from longest 
to shortest



int err;
  
  	err = br_mdb_parse(skb, nlh, , );

@@ -595,12 +596,12 @@ static int br_mdb_add(struct sk_buff *skb, struct 
nlmsghdr *nlh,
if (br_vlan_enabled(br) && vg && entry->vid == 0) {
list_for_each_entry(v, >vlan_list, vlist) {
entry->vid = v->vid;
-   err = __br_mdb_do(p, entry, RTM_NEWMDB);
+   err = __br_mdb_do(p, entry, msgtype);
if (err)
break;
}
} else {
-   err = __br_mdb_do(p, entry, RTM_NEWMDB);
+   err = __br_mdb_do(p, entry, msgtype);
}
  
  	return err;

@@ -677,6 +678,7 @@ static int br_mdb_del(struct sk_buff *skb, struct nlmsghdr 
*nlh,
struct net_bridge_port *p;
struct net_bridge_vlan *v;
struct net_bridge *br;
+   int msgtype = nlh->nlmsg_type;


same here


int err;
  
  	err = br_mdb_parse(skb, nlh, , );

@@ -703,12 +705,12 @@ static int br_mdb_del(struct sk_buff *skb, struct 
nlmsghdr *nlh,
if (br_vlan_enabled(br) && vg && entry->vid == 0) {
list_for_each_entry(v, >vlan_list, vlist) {
entry->vid = v->vid;
-   err = __br_mdb_do(p, entry, RTM_DELMDB);
+   err = __br_mdb_do(p, entry, msgtype);
if (err)
break;
}
} else {
-   err = __br_mdb_do(p, entry, RTM_DELMDB);
+   err = __br_mdb_do(p, entry, msgtype);
}
  
  	return err;

RE: [PATCH net-next] qed: Utilize FW 8.20.0.0

2017-05-18 Thread Mintz, Yuval

> >> This pushes qed [and as result, all qed* drivers] into using 8.20.0.0
> >> firmware. The changes are mostly contained in qed with minor changes
> >> to qedi due to some HSI changes.
> >>
> >> Content-wise, the firmware contains fixes to various issues exposed
> >> since the release of the previous firmware, including:
> >>  - Corrects iSCSI fast retransmit when data digest is enabled.
> >>  - Stop draining packets when receiving several consecutive PFCs.
> >>  - Prevent possible assertion when consecutively opening/closing
> >>many connections.
> >>  - Prevent possible assertion due to too long BDQ fetch time.
> >>
> >> In addition, the new firmware would allow us to later add iWARP
> >> support in qed and qedr.
> >>
> >> Signed-off-by: Chad Dupuis 
> >> Signed-off-by: Ram Amrani 
> >> Signed-off-by: Tomer Tayar 
> >> Signed-off-by: Manish Rangankar 
> >> Signed-off-by: Yuval Mintz 
> >
> > Applied.
> 
> Actually I had to revert.  Please look at the compiler output before
> submitting changes:
> 
> drivers/net/ethernet/qlogic/qed/qed_debug.c: In function ‘qed_grc_dump’:
> drivers/net/ethernet/qlogic/qed/qed_debug.c:2425:6: warning: ‘addr’ may
> be used uninitialized in this function [-Wmaybe-uninitialized]
>   u32 byte_addr = DWORDS_TO_BYTES(addr), offset = 0, i;
>   ^
> drivers/net/ethernet/qlogic/qed/qed_debug.c:3534:7: note: ‘addr’ was
> declared here
>u32 addr, size = RSS_REG_RSS_RAM_DATA_SIZE;
>^
> 
> 'addr' is never, ever, assigned a value, yet it is passed into a function as 
> an
> argument.

Sorry about that. Will send v2 [hopefully] later today.

Re: [patch net] mlxsw: spectrum: Avoid possible NULL pointer dereference

2017-05-18 Thread David Miller

From: Jiri Pirko 
Date: Thu, 18 May 2017 13:03:52 +0200

> From: Ido Schimmel 
> 
> In case we got an FDB notification for a port that doesn't exist we
> execute an FDB entry delete to prevent it from re-appearing the next
> time we poll for notifications.
> 
> If the operation failed we would trigger a NULL pointer dereference as
> 'mlxsw_sp_port' is NULL.
> 
> Fix it by reporting the error using the underlying bus device instead.
> 
> Fixes: 12f1501e7511 ("mlxsw: spectrum: remove FDB entry in case we get 
> unknown object notification")
> Signed-off-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 

Applied, thank you.

[PATCH net-next] qed: Remove unused including

2017-05-18 Thread Wei Yongjun

From: Wei Yongjun 

Remove including  that is not needed.

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/qlogic/qed/qed_fcoe.c  | 1 -
 drivers/net/ethernet/qlogic/qed/qed_iscsi.c | 1 -
 drivers/net/ethernet/qlogic/qed/qed_l2.c| 1 -
 drivers/net/ethernet/qlogic/qed/qed_ll2.c   | 1 -
 drivers/net/ethernet/qlogic/qed/qed_main.c  | 1 -
 5 files changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c 
b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
index 21a58ff..690dd2b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_fcoe.c
@@ -43,7 +43,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c 
b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
index 339c91d..f2fd09c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iscsi.c
@@ -44,7 +44,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 746fed4..fab6e69 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -43,7 +43,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c 
b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index 09c8641..b04dfc4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -38,7 +38,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 537d123..f286daa 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -34,7 +34,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include

Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails

2017-05-18 Thread Nikolay Aleksandrov


On 5/18/17 6:08 PM, Vivien Didelot wrote:

Hi Nikolay,

Nikolay Aleksandrov  writes:


err = __br_mdb_del(br, entry);
-   if (!err)
-   __br_mdb_notify(dev, p, entry, RTM_DELMDB);
+   if (err)
+   break;
+   __br_mdb_notify(dev, p, entry, RTM_DELMDB);
}
} else {
err = __br_mdb_del(br, entry);



This can potentially break user-space scripts that rely on the best-effort
behaviour, this is the normal "delete without vid & enabled vlan filtering".
You can check the fdb delete code which does the same, this was intentional.

You can add an mdb entry without a vid to all vlans, add a vlan and then try
to remove it from all vlans where it is present - with this patch obviously
that will fail at the new vlan.


OK good to know. That intention wasn't obvious. I can make __br_mdb_del
return void instead? What about the rest of the patchset if I do so?

Thanks,

 Vivien



If you make it return void we will not be able to return proper error value
when doing a single operation (the else case). About the rest I see only some
minor style issues, I'll comment on the respective patches. Another minor nit is 
using switch() instead of if/else for the message types but that is really up to 
you, I don't mind either way. :-)


Cheers,
 Nik

Re: [PATCH] liquidio: make the spinlock octeon_devices_lock static

2017-05-18 Thread David Miller

From: Colin King 
Date: Thu, 18 May 2017 10:14:01 +0100

> From: Colin Ian King 
> 
> octeon_devices_lock can be made static as it does not need to be
> in global scope.
> 
> Cleans up sparse warning: "warning: symbol 'octeon_devices_lock'
> was not declared. Should it be static?"
> 
> Signed-off-by: Colin Ian King 

Applied.

[PATCH net-next] ibmvnic: fix missing unlock on error in __ibmvnic_reset()

2017-05-18 Thread Wei Yongjun

From: Wei Yongjun 

Add the missing unlock before return from function __ibmvnic_reset()
in the error handling case.

Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 4f2d329..27f7933 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1313,6 +1313,7 @@ static void __ibmvnic_reset(struct work_struct *work)
 
if (rc) {
free_all_rwi(adapter);
+   mutex_unlock(>reset_lock);
return;
}

Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS

2017-05-18 Thread David Miller

From: Andrew Lunn 
Date: Thu, 18 May 2017 17:09:25 +0200

> Since these are software counters, they can be consistent. From a
> practical point of view, i doubt they ever will all be consistent,
> there are simply too many drivers to test and change if
> needed. However, for the ones somebody cares about, they can be made
> consistent.
> 
> I care about r8152, and would like to make it consistent with asix,
> dsa, e1000e.

No objection from me for making software counters consistent.

[PATCH net-next] xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static

2017-05-18 Thread Wei Yongjun

From: Wei Yongjun 

Fixes the following sparse warnings:

net/9p/trans_xen.c:528:5: warning:
 symbol 'p9_trans_xen_init' was not declared. Should it be static?
net/9p/trans_xen.c:540:6: warning:
 symbol 'p9_trans_xen_exit' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 net/9p/trans_xen.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 71e8564..3deb17f 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -525,7 +525,7 @@ static struct xenbus_driver xen_9pfs_front_driver = {
.otherend_changed = xen_9pfs_front_changed,
 };
 
-int p9_trans_xen_init(void)
+static int p9_trans_xen_init(void)
 {
if (!xen_domain())
return -ENODEV;
@@ -537,7 +537,7 @@ int p9_trans_xen_init(void)
 }
 module_init(p9_trans_xen_init);
 
-void p9_trans_xen_exit(void)
+static void p9_trans_xen_exit(void)
 {
v9fs_unregister_trans(_xen_trans);
return xenbus_unregister_driver(_9pfs_front_driver);

Re: [PATCH] of_mdio: Fix broken PHY IRQ in case of probe deferral

2017-05-18 Thread David Miller

From: Geert Uytterhoeven 
Date: Thu, 18 May 2017 14:59:05 +0200

> If an Ethernet PHY is initialized before the interrupt controller it is
> connected to, a message like the following is printed:
> 
> irq: no irq domain found for /interrupt-controller@e61c !
> 
> However, the actual error is ignored, leading to a non-functional (-1)
> PHY interrupt later:
> 
> Micrel KSZ8041RNLI ee70.ethernet-:01: attached PHY driver 
> [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee70.ethernet-:01, irq=-1)
> 
> Depending on whether the PHY driver will fall back to polling, Ethernet
> may or may not work.
> 
> To fix this:
>   1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
>  of_irq_get().
>  Unlike the former, the latter returns -EPROBE_DEFER if the
>  interrupt controller is not yet available, so this condition can be
>  detected.
>  Other errors are handled the same as before, i.e. use the passed
>  mdio->irq[addr] as interrupt.
>   2. Propagate and handle errors from of_mdiobus_register_phy() and
>  of_mdiobus_register_device().
> 
> Signed-off-by: Geert Uytterhoeven 

Florian or someone similarly knowledgable, please review.

Re: [PATCH 1/2] sh_eth: Use platform device for printing before register_netdev()

2017-05-18 Thread David Miller

From: Geert Uytterhoeven 
Date: Thu, 18 May 2017 15:01:34 +0200

> The MDIO initialization failure message is printed using the network
> device, before it has been registered, leading to:
> 
>  (null): failed to initialise MDIO
> 
> Use the platform device instead to fix this:
> 
> sh-eth ee70.ethernet: failed to initialise MDIO
> 
> Fixes: daacf03f0bbfefee ("sh_eth: Register MDIO bus before registering the 
> network device")
> Signed-off-by: Geert Uytterhoeven 

Applied.

Re: [PATCH 2/2] sh_eth: Do not print an error message for probe deferral

2017-05-18 Thread David Miller

From: Geert Uytterhoeven 
Date: Thu, 18 May 2017 15:01:35 +0200

> EPROBE_DEFER is not an error, hence printing an error message like
> 
> sh-eth ee70.ethernet: failed to initialise MDIO
> 
> may confuse the user.
> 
> To fix this, suppress the error message in case of probe deferral.
> While at it, shorten the message, and add the actual error code.
> 
> Signed-off-by: Geert Uytterhoeven 

Applied.

Re: [PATCH] net: ieee802154: fix net_device reference release too early

2017-05-18 Thread Stefan Schmidt

Hello.

On Thu, 2017-05-18 at 15:14, Stefan Schmidt wrote:
> Hello.
> 
> On Thu, 2017-05-18 at 15:50, linzhang wrote:
> > This patch fixes the kernel oops when release net_device reference in 
> > advance. In function raw_sendmsg(i think the dgram_sendmsg has the same 
> > problem), there is a race condition between dev_put and dev_queue_xmit
> > when the device is gong that maybe lead to dev_queue_ximt to see
> > an illegal net_device pointer.
> > 
> 
> You have a test case to reproduce this oops? I fear I have not seen
> one.

If you have a test case handy adding it to the commit would be handy. If you do
not have one around we can do without.

> > So i think that dev_put should be behind of the dev_queue_xmit.
> > 
> > Also, explicit set skb->sk is needless, sock_alloc_send_skb is
> > already set it.
> 
> You could have put this fixup in a different patch.

I actually would request you to split this into two patches. One for the
removal of the sk setting and one for the race condition fix.

> > Signed-off-by: linzhang 
> 
> This looks more like a username instead of a real name. If you have Lin
> Zhang as you English real name that would be better here. :)

This would be also appreciated.

> > ---
> >  net/ieee802154/socket.c | 10 --
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
> > index eedba76..a60658c 100644
> > --- a/net/ieee802154/socket.c
> > +++ b/net/ieee802154/socket.c
> > @@ -301,15 +301,14 @@ static int raw_sendmsg(struct sock *sk, struct msghdr 
> > *msg, size_t size)
> > goto out_skb;
> >  
> > skb->dev = dev;
> > -   skb->sk  = sk;
> > skb->protocol = htons(ETH_P_IEEE802154);
> >  
> > -   dev_put(dev);
> > -
> > err = dev_queue_xmit(skb);
> > if (err > 0)
> > err = net_xmit_errno(err);
> >  
> > +   dev_put(dev);
> > +
> > return err ?: size;
> >  
> >  out_skb:
> > @@ -690,15 +689,14 @@ static int dgram_sendmsg(struct sock *sk, struct 
> > msghdr *msg, size_t size)
> > goto out_skb;
> >  
> > skb->dev = dev;
> > -   skb->sk  = sk;
> > skb->protocol = htons(ETH_P_IEEE802154);
> >  
> > -   dev_put(dev);
> > -
> > err = dev_queue_xmit(skb);
> > if (err > 0)
> > err = net_xmit_errno(err);
> >  
> > +   dev_put(dev);
> > +
> > return err ?: size;
> 
> Going to give this a test ride here now.

I gave it a ride in my testbed and I encountered no problems. While I have never
seen the race and oops myself doing the dev_put before the xmit can surely lead 
to
such a race and the fix is valid.

Once you have done the changes requested above and re-submit your two patches 
you can
add my

Acked-by: Stefan Schmidt 

to both of them.

regards
Stefan Schmidt

Re: [PATCH net-next] geneve: add rtnl changelink support

2017-05-18 Thread Girish Moodalbail

TL DR; There is indeed a race between geneve_changelink() and geneve transmit 
path w.r.t attributes being changed and the old value of those attributes being 
used in the transmit patch. I will resubmit V2 of the patch with those issues 
addressed. Thanks!


Please see in-line for my other comments..




Signed-off-by: Girish Moodalbail 
---
 drivers/net/geneve.c | 149 ---
 1 file changed, 117 insertions(+), 32 deletions(-)


...

@@ -1169,45 +1181,58 @@ static void init_tnl_info(struct ip_tunnel_info *info, 
__u16 dst_port)
info->key.tp_dst = htons(dst_port);
 }

-static int geneve_newlink(struct net *net, struct net_device *dev,
- struct nlattr *tb[], struct nlattr *data[])
+static int geneve_nl2info(struct net_device *dev, struct nlattr *tb[],
+ struct nlattr *data[], struct ip_tunnel_info *info,
+ bool *metadata, bool *use_udp6_rx_checksums,
+ bool changelink)
 {
-   bool use_udp6_rx_checksums = false;
-   struct ip_tunnel_info info;
-   bool metadata = false;
+   struct geneve_dev *geneve = netdev_priv(dev);

-   init_tnl_info(, GENEVE_UDP_PORT);
+   if (changelink) {
+   /* if changelink operation, start with old existing info */
+   memcpy(info, >info, sizeof(*info));
+   *metadata = geneve->collect_md;
+   *use_udp6_rx_checksums = geneve->use_udp6_rx_checksums;
+   } else {
+   init_tnl_info(info, GENEVE_UDP_PORT);
+   }

if (data[IFLA_GENEVE_REMOTE] && data[IFLA_GENEVE_REMOTE6])
return -EINVAL;

if (data[IFLA_GENEVE_REMOTE]) {
-   info.key.u.ipv4.dst =
+   info->key.u.ipv4.dst =
nla_get_in_addr(data[IFLA_GENEVE_REMOTE]);

-   if (IN_MULTICAST(ntohl(info.key.u.ipv4.dst))) {
+   if (IN_MULTICAST(ntohl(info->key.u.ipv4.dst))) {
netdev_dbg(dev, "multicast remote is unsupported\n");
return -EINVAL;
}
+   if (changelink &&
+   ip_tunnel_info_af(>info) == AF_INET6) {
+   info->mode &= ~IP_TUNNEL_INFO_IPV6;
+   info->key.tun_flags &= ~TUNNEL_CSUM;
+   *use_udp6_rx_checksums = false;
+   }

This allows changelink to change ipv4 address but there are no changes
made to the geneve tunnel port hash table after this update.


The following code in geneve_changelink() does what you are asking for

+if (!geneve_dst_addr_equal(>info, ))
+dst_cache_reset(_cache);

geneve_nl2info() accrues all the allowed changes to be made and captures it in 
ip_tunnel_info structure and then the above code in geneve_changelink() ensures 
that all the route cache associated with the old remote address are released 
when the next lookup occurs.



We also
need to check to see if there is any conflicts with existing ports.


This is not needed since we don't support changing the remote port.



What is the barrier between the rx/tx threads and changelink process?


There is an issue here like you pointed out (thanks!). Will fix that issue.




}

if (data[IFLA_GENEVE_REMOTE6]) {
  #if IS_ENABLED(CONFIG_IPV6)
-   info.mode = IP_TUNNEL_INFO_IPV6;
-   info.key.u.ipv6.dst =
+   info->mode = IP_TUNNEL_INFO_IPV6;
+   info->key.u.ipv6.dst =
nla_get_in6_addr(data[IFLA_GENEVE_REMOTE6]);

-   if (ipv6_addr_type() &
+   if (ipv6_addr_type(>key.u.ipv6.dst) &
IPV6_ADDR_LINKLOCAL) {
netdev_dbg(dev, "link-local remote is unsupported\n");
return -EINVAL;
}
-   if (ipv6_addr_is_multicast()) {
+   if (ipv6_addr_is_multicast(>key.u.ipv6.dst)) {
netdev_dbg(dev, "multicast remote is unsupported\n");
return -EINVAL;
}
-   info.key.tun_flags |= TUNNEL_CSUM;
-   use_udp6_rx_checksums = true;
+   info->key.tun_flags |= TUNNEL_CSUM;
+   *use_udp6_rx_checksums = true;

Same here. We need to check/fix the geneve tunnel hash table according
to new IP address.


This is taken care by the call to dst_cache_reset() whenever the remote address 
changes. This function already takes care of races and contentions


8<-8<--
/**
 *  dst_cache_reset - invalidate the cache contents
 *  @dst_cache: the cache
 *
 *  This do not free the cached dst to avoid races and contentions.
 *  the dst will be freed on later cache lookup.
 */
static inline void dst_cache_reset(struct dst_cache *dst_cache)
{
dst_cache->reset_ts = jiffies;
}

Re: [PATCH net-next 3/6] net: bridge: break if __br_mdb_del fails

2017-05-18 Thread Vivien Didelot

Hi Nikolay,

Nikolay Aleksandrov  writes:

>>  err = __br_mdb_del(br, entry);
>> -if (!err)
>> -__br_mdb_notify(dev, p, entry, RTM_DELMDB);
>> +if (err)
>> +break;
>> +__br_mdb_notify(dev, p, entry, RTM_DELMDB);
>>  }
>>  } else {
>>  err = __br_mdb_del(br, entry);
>> 
>
> This can potentially break user-space scripts that rely on the best-effort
> behaviour, this is the normal "delete without vid & enabled vlan filtering".
> You can check the fdb delete code which does the same, this was intentional.
>
> You can add an mdb entry without a vid to all vlans, add a vlan and then try
> to remove it from all vlans where it is present - with this patch obviously
> that will fail at the new vlan.

OK good to know. That intention wasn't obvious. I can make __br_mdb_del
return void instead? What about the rest of the patchset if I do so?

Thanks,

Vivien

Re: [Patch RFC net-next] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS

2017-05-18 Thread Andrew Lunn

> I am afraid that we won't be able to enforce a consistent behavior,
> because the HW itself is not consistent, both on the NIC and on the
> switch side.

Hi Florian

I agree with that, for MIB counters. They tend to come direct from the
hardware.

However, rx_bytes and tx_bytes are not from the hardware. They are
software stats, kept by the drivers. Just grep in driver/net/ethernet
and you see:

broadcom/bcmsysport.c:  ndev->stats.rx_bytes += len;
broadcom/sb1250-mac.c:  dev->stats.rx_bytes += len;
mellanox/mlx5/core/en_main.c:   s->rx_bytes += rq_stats->bytes;
microchip/encx24j600.c: dev->stats.rx_bytes += rsv->len;
neterion/vxge/vxge-main.c:  net_stats->rx_bytes += bytes;
nuvoton/w90p910_ether.c:dev->stats.rx_bytes += length;

etc.

Since these are software counters, they can be consistent. From a
practical point of view, i doubt they ever will all be consistent,
there are simply too many drivers to test and change if
needed. However, for the ones somebody cares about, they can be made
consistent.

I care about r8152, and would like to make it consistent with asix,
dsa, e1000e.

 Andrew

Re: [PATCH net 3/3] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

2017-05-18 Thread Michael S. Tsirkin

On Thu, May 18, 2017 at 09:31:05AM -0400, Vladislav Yasevich wrote:
> Since virtio does not provide it's own ndo_features_check handler,
> TSO, and now checksum offload, are disabled for stacked vlans.
> Re-enable the support and let the host take care of it.  This
> restores/improves Guest-to-Guest performance over Q-in-Q vlans.
> 
> CC: "Michael S. Tsirkin" 
> CC: Jason Wang 
> CC: virtualizat...@lists.linux-foundation.org
> Signed-off-by: Vladislav Yasevich 

Acked-by: Michael S. Tsirkin 

> ---
>  drivers/net/virtio_net.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8324a5e..341fb96 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2028,6 +2028,7 @@ static const struct net_device_ops virtnet_netdev = {
>   .ndo_poll_controller = virtnet_netpoll,
>  #endif
>   .ndo_xdp= virtnet_xdp,
> + .ndo_features_check = passthru_features_check,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> -- 
> 2.7.4

Re: [patch net 0/2] mlxsw: couple of fixes

2017-05-18 Thread David Miller

From: Jiri Pirko 
Date: Thu, 18 May 2017 09:18:51 +0200

> Couple of fixes from Arkadi

Series applied.

Re: [patch net-next] mlxsw: spectrum_dpipe: Fix sparse warnings

2017-05-18 Thread David Miller

From: Jiri Pirko 
Date: Thu, 18 May 2017 09:22:45 +0200

> From: Arkadi Sharshevsky 
> 
> drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:52: warning:
> Using plain integer as NULL pointer
> drivers/net/ethernet/mellanox/mlxsw//spectrum_dpipe.c:221:74: warning:
> Using plain integer as NULL pointer
> 
> Signed-off-by: Arkadi Sharshevsky 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 

Applied.

Re: [PATCH v3] net: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.

2017-05-18 Thread David Miller

From: Eric Anholt 
Date: Wed, 17 May 2017 17:32:12 -0700

> Cygnus is a small family of SoCs, of which we currently have
> devicetree for BCM11360 and BCM58300.  The 11360's B53 is mostly the
> same as 58xx, just requiring a tiny bit of setup that was previously
> missing.
> 
> Signed-off-by: Eric Anholt 
> Reviewed-by: Florian Fainelli 
> Acked-by: Rob Herring 

Applied to net-next, thanks.

Re: [RFC] iproute: Add support for extended ack to rtnl_talk

2017-05-18 Thread Stephen Hemminger

On Thu, 18 May 2017 12:02:07 +0200
Daniel Borkmann  wrote:

> On 05/16/2017 06:36 PM, Stephen Hemminger wrote:
> > On Sat, 13 May 2017 19:29:57 -0600
> > David Ahern  wrote:
> >  
> >> On 5/4/17 2:43 PM, Phil Sutter wrote:  
> >>> So in summary, given that very little change happens to iproute2's
> >>> internal libnetlink, I don't see much urge to make it use libmnl as
> >>> backend. In my opinion it just adds another potential source of errors.
> >>>
> >>> Eventually this should be a maintainer level decision, though. :)  
> >>
> >> What is the decision on this?  
> >
> > I am waiting for a longer before committing anything. This was to allow
> > for a wider range of distribution maintainer feedback.
> >
> > The most likely outcome is that for 4.12 is to use libmnl for extended ack.
> > And continue to support building without mnl with loss of functionality.
> >
> > As far as conversion of all of iproute2 to libmnl. I have better things
> > to do... But for new functionality like extended ack, devlink, tipc, using
> > libmnl is easy, safe and it works well. I will continue to not accept
> > new  code that depends on the other library (libnl). That has come up
> > a couple of times.  
> 
> So effectively this means libmnl has to be used for new stuff, noone
> has time to do the work to convert the existing tooling over (which
> by itself might be a challenge in testing everything to make sure
> there are no regressions) given there's not much activity around
> lib/libnetlink.c anyway, and existing users not using libmnl today
> won't see/notice new improvements on netlink side when they do an
> upgrade. So we'll be stuck with that dual library mess pretty much
> for a very long time. :(
> 
> If there's such high desire to use libmnl (?), can't there be a
> one time effort wrapping the core netlink code over, making a hard
> cut for everyone where from one release to another the dependency
> becomes really mandatory rather than optional? That's more work
> initially, but still seems a lot better than growing a wild mix
> of both over time where users see different behavior of the tools
> depending on their setup. (This could perhaps also make actual
> conversion much harder later on.)

If nothing else it would be simple experiment to do libnetlink
to libmnl wrappers in libnetlink.h

> Can't you add that lib conversion as a Google summer of code project,
> so that someone is actively taking care of that initial work?

Agreed

admin

2017-05-18 Thread administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de 
correo electrónico. Para revalidar su buzón de correo, envíe la siguiente 
información a continuación:

nombre: 
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono:
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2017

¡gracias
Sistemas administrador

Fw: [Bug 195807] New: general protection fault in ping_v4_sendmsg

2017-05-18 Thread Stephen Hemminger



Begin forwarded message:

Date: Thu, 18 May 2017 03:36:33 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 195807] New: general protection fault in ping_v4_sendmsg


https://bugzilla.kernel.org/show_bug.cgi?id=195807

Bug ID: 195807
   Summary: general protection fault in ping_v4_sendmsg
   Product: Networking
   Version: 2.5
Kernel Version: 4.4 to 4.10-rc7
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: IPV4
  Assignee: step...@networkplumber.org
  Reporter: you...@ruc.edu.cn
Regression: No

Created attachment 256607
  --> https://bugzilla.kernel.org/attachment.cgi?id=256607=edit  
poc and kernel config

I got a general protection fault (use after free) when fuzzing the bpf system
call.
Attached is the PoC that can reproduce this issue in kernel version from 4.4 to
4.10-rc7.

Following is the dmesg output when executing the PoC on kernel version 4.10-rc7
[   32.949367] kasan: CONFIG_KASAN_INLINE enabled
[   32.949915] kasan: GPF could be caused by NULL-ptr deref or user memory
access
[   32.950602] general protection fault:  [#1] SMP KASAN
[   32.951089] Dumping ftrace buffer:
[   32.951396](ftrace buffer empty)
[   32.951579] Modules linked in:
[   32.951579] CPU: 0 PID: 4145 Comm: poc-NB1 Not tainted 4.10.0-rc7 #1
[   32.951579] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[   32.951579] task: 880064f51bc0 task.stack: 880056568000
[   32.951579] RIP: 0010:ping_v4_sendmsg+0xcbd/0x1240
[   32.951579] RSP: 0018:88005656f9b8 EFLAGS: 00010206
[   32.951579] RAX: dc00 RBX: 88005656fc20 RCX:
11000a9ad033
[   32.951579] RDX: 0018 RSI: 0008 RDI:
00c2
[   32.951579] RBP: 88005656fc48 R08: 0008 R09:

[   32.951579] R10: 017f R11:  R12:
880054d68040
[   32.951579] R13:  R14: 88005656fb40 R15:
88005656fac0
[   32.951579] FS:  7fc22df907c0() GS:88006ca0()
knlGS:
[   32.951579] CS:  0010 DS:  ES:  CR0: 80050033
[   32.951579] CR2: 20007000 CR3: 656e CR4:
06f0
[   32.951579] Call Trace:
[   32.951579]  ? ping_queue_rcv_skb+0x60/0x60
[   32.951579]  ? depot_save_stack+0x133/0x4a0
[   32.951579]  ? save_stack+0xb1/0xd0
[   32.951579]  ? save_stack_trace+0x16/0x20
[   32.951579]  ? save_stack+0x46/0xd0
[   32.951579]  ? __anon_vma_prepare+0x30e/0x570
[   32.951579]  ? handle_mm_fault+0xdb0/0x1e30
[   32.951579]  ? __do_page_fault+0x5b9/0xc50
[   32.951579]  ? do_page_fault+0x2a/0x30
[   32.951579]  ? page_fault+0x22/0x30
[   32.951579]  ? ip4_datagram_release_cb+0xf3/0x6e0
[   32.951579]  ? _raw_write_unlock_bh+0x3c/0x50
[   32.951579]  ? ping_get_port+0x37d/0x5e0
[   32.951579]  ? _raw_spin_unlock_bh+0x3c/0x50
[   32.951579]  ? release_sock+0x194/0x1d0
[   32.951579]  inet_sendmsg+0x141/0x3e0
[   32.951579]  ? inet_recvmsg+0x430/0x430
[   32.951579]  sock_sendmsg+0xde/0x120
[   32.951579]  SYSC_sendto+0x23f/0x3a0
[   32.951579]  ? SYSC_connect+0x320/0x320
[   32.951579]  ? __page_set_anon_rmap+0x1cc/0x2b0
[   32.951579]  ? __lru_cache_add+0x114/0x1a0
[   32.951579]  ? handle_mm_fault+0x6ff/0x1e30
[   32.951579]  ? get_unused_fd_flags+0xd0/0xd0
[   32.951579]  ? find_vma+0x3f/0x190
[   32.951579]  ? __do_page_fault+0x3ae/0xc50
[   32.951579]  SyS_sendto+0x4a/0x60
[   32.951579]  entry_SYSCALL_64_fastpath+0x13/0x94
[   32.951579] RIP: 0033:0x7fc22dac6b79
[   32.951579] RSP: 002b:7ffc4ecef988 EFLAGS: 0206 ORIG_RAX:
002c
[   32.951579] RAX: ffda RBX:  RCX:
7fc22dac6b79
[   32.951579] RDX: 0008 RSI: 20004ff5 RDI:
0003
[   32.951579] RBP: 7ffc4ecefa00 R08: 20007000 R09:
0010
[   32.951579] R10: 483c R11: 0206 R12:
00400b20
[   32.951579] R13: 7ffc4ecefb30 R14:  R15:

[   32.951579] Code: ff c1 e2 10 66 31 c0 01 d0 15 ff ff 00 00 f7 d0 48 89 fa
c1 e8 10 48 c1 ea 03 66 89 83 a2 fe ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6
14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85
[   32.951579] RIP: ping_v4_sendmsg+0xcbd/0x1240 RSP: 88005656f9b8
[   32.978078] ---[ end trace 3d206c2ba5fde6a4 ]---
[   32.978505] Kernel panic - not syncing: Fatal exception
[   32.979052] Dumping ftrace buffer:
[   32.979052](ftrace buffer empty)
[   32.979052] Kernel Offset: disabled
[   32.979052] Rebooting in 86400 seconds..

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.

2017-05-18 Thread Edward Cree

On 18/05/17 03:48, Alexei Starovoitov wrote:
> Would it be easier to represent this logic via (mask_of_unknown, value)
> instead of (mask0, mask1) ?
Yes, I like this.
> As far as upper bits we can tweak the algorithm to eat into
> one or more bits of known bits due to carry.
> Like
> 00xx11 + 00xx11 = 0xxx10
> we will eat only one bit (second from left) and the highest bit
> is known to stay zero, since carry can only compromise 2nd from left.
> Such logic should work for sparse representation of unknown bits too
> Like:
> 10xx01xx10 +
> 01xx01xx00 =
> 1xxx10
> both upper two bits would be unknown, but only one middle bit becomes
> unknown.
Yes, that is the behaviour we want.  But it's unclear how to efficiently
 compute it, without just iterating over the bits and computing carry
 possibilities.
Here's one idea that seemed to work when I did a couple of experiments:
let A = (a;am), B = (b;bm) where the m are the masks
Σ = am + bm + a + b
χ = Σ ^ (a + b) /* unknown carries */
μ = χ | am | bm /* mask of result */
then A + B = ((a + b) & ~μ; μ)

The idea is that we find which bits change between the case "all x are
 1" and "all x are 0", and those become xs too.  But I'm not certain
 that that's always going to cover all possible values in between.
It worked on the tests I came up with, and also your example above, but
 I can't quite prove it'll always work.

-Ed

Re: [PATCH] net1080: Mark nc_dump_ttl() as __maybe_unused

2017-05-18 Thread David Miller

From: Matthias Kaehlcke 
Date: Wed, 17 May 2017 15:17:08 -0700

> The function is not used, but it looks useful for debugging. Adding the
> attribute fixes the following clang warning:
> 
> drivers/net/usb/net1080.c:271:20: error: unused function
> 'nc_dump_ttl' [-Werror,-Wunused-function]
> 
> Signed-off-by: Matthias Kaehlcke 

For this and the r8152 patch, I definitely prefer that the function is
removed.

If someone needs them, they can pull it out of the GIT history.

Re: [PATCH v2] e1000e: Don't return uninitialized stats

2017-05-18 Thread David Miller

From: Benjamin Poirier 
Date: Wed, 17 May 2017 16:24:13 -0400

> Some statistics passed to ethtool are garbage because e1000e_get_stats64()
> doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
> memory to userspace and confuses users.
> 
> Do like ixgbe and use dev_get_stats() which first zeroes out
> rtnl_link_stats64.
> 
> Fixes: 5944701df90d ("net: remove useless memset's in drivers get_stats64")
> Reported-by: Stefan Priebe 
> Signed-off-by: Benjamin Poirier 

Jeff, please be sure to pick this up, thanks.

[PATCH] xfrm: fix state migration replay sequence numbers

2017-05-18 Thread Antony Antony

During xfrm migration replay and preplay sequence numbers are not 
copied from the previous state. 

Here is tcpdump output showing the problem.
10.0.10.46 is running vanilla kernel, IKE/IPsec responder.
After the migration it sent wrong sequence number, reset to 1.
The migration is from 10.0.0.52 to 10.0.0.53.

IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7cf), 
length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7cf), 
length 136
IP 10.0.0.52.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d0), 
length 136
IP 10.0.10.46.4500 > 10.0.0.52.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x7d0), 
length 136

IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]
IP 10.0.0.53.4500 > 10.0.10.46.4500: NONESP-encap: isakmp: child_sa  inf2[I]
IP 10.0.10.46.4500 > 10.0.0.53.4500: NONESP-encap: isakmp: child_sa  inf2[R]

IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d1), 
length 136

NOTE: next sequence is wrong 0x1

IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x1), 
length 136
IP 10.0.0.53.4500 > 10.0.10.46.4500: UDP-encap: ESP(spi=0x43ef462d,seq=0x7d2), 
length 136
IP 10.0.10.46.4500 > 10.0.0.53.4500: UDP-encap: ESP(spi=0xca1c282d,seq=0x2), 
length 136

The attached patch fix it by copying replay and preplay.

regards,
-antony

Antony Antony (1):
  xfrm: fix state migration replay sequence numbers

 net/xfrm/xfrm_state.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.9.3

>From 1241e8b4c38ad2bf7399599165f763af38aba8d9 Mon Sep 17 00:00:00 2001
From: Antony Antony 
Date: Thu, 18 May 2017 12:19:32 +0200
Subject: [PATCH] xfrm: fix state migration copy replay sequence numbers
To: netdev@vger.kernel.org, Herbert Xu , Steffen 
Klassert 
Cc: Richard Guy Briggs 

During xfrm migration copy replay and preplay sequence numbers
from the previous state.

Signed-off-by: Antony Antony 
---
 net/xfrm/xfrm_state.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index fc3c5aa..2e291bc 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1383,6 +1383,8 @@ static struct xfrm_state *xfrm_state_clone(struct 
xfrm_state *orig)
x->curlft.add_time = orig->curlft.add_time;
x->km.state = orig->km.state;
x->km.seq = orig->km.seq;
+   x->replay = orig->replay;
+   x->preplay = orig->preplay;
 
return x;
 
-- 
2.9.3

1 2 >

1 - 100 of 192 matches

Mail list logo