I found a strange place while reading “net/ipv6/reassembly.c”

2018-08-14 Thread Ttttabcd
Hello everyone who develops the kernel.

At the beginning I was looking for the source author, but his email address has 
expired, so I can only come here to ask questions.

The problem is in the /net/ipv6/reassembly.c file, the author is Pedro Roque.

I found some strange places when I read the code for this file (Linux Kernel 
version 4.18).

In the "/net/ipv6/reassembly.c"

In the function "ip6_frag_queue"

offset = ntohs(fhdr->frag_off) & ~0x7;
end = offset + (ntohs(ipv6_hdr(skb)->payload_len) -
((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1)));

if ((unsigned int)end > IPV6_MAXPLEN) {
*prob_offset = (u8 *)>frag_off - skb_network_header(skb);
return -1;
}

Here the length of the payload is judged.

And in the function "ip6_frag_reasm"

payload_len = ((head->data - skb_network_header(head)) -
   sizeof(struct ipv6hdr) + fq->q.len -
   sizeof(struct frag_hdr));
if (payload_len > IPV6_MAXPLEN)
goto out_oversize;

..
out_oversize:
net_dbg_ratelimited("ip6_frag_reasm: payload len = %d\n", 
payload_len);
goto out_fail;

Here also judges the length of the payload.

Judged the payload length twice.

I tested that the code in the label "out_oversize:" does not execute at all, 
because it has been returned in "ip6_frag_queue".

Unless I comment out the code that judge the payload length in the function 
"ip6_frag_queue", the code labeled "out_oversize:" can be executed.

So, is this repeated?


Re: [PATCH] net: macb: Fix regression breaking non-MDIO fixed-link PHYs

2018-08-14 Thread Andrew Lunn
On Tue, Aug 14, 2018 at 05:58:12PM +0200, Uwe Kleine-König wrote:
> Hello Ahmad,
> 
> 
> On Tue, Aug 14, 2018 at 04:12:40PM +0200, Ahmad Fatoum wrote:
> > The referenced commit broke initializing macb on the EVB-KSZ9477 eval board.
> > There, of_mdiobus_register was called even for the fixed-link representing
> > the SPI-connected switch PHY, with the result that the driver attempts to
> > enumerate PHYs on a non-existent MDIO bus:
> > 
> > libphy: MACB_mii_bus: probed
> > mdio_bus f0028000.ethernet-: fixed-link has invalid PHY address
> > mdio_bus f0028000.ethernet-: scan phy fixed-link at address 0
> > [snip]
> > mdio_bus f0028000.ethernet-: scan phy fixed-link at address 31
> > macb f0028000.ethernet: broken fixed-link specification
> > 
> > Cc: 
> > Fixes: 739de9a1563a ("net: macb: Reorganize macb_mii bringup")
> 
> I added the people involved in 739de9a1563a to Cc. Maybe they want to
> comment. So not stripping the remaining part of the original mail.

Thanks Uwe for Cc: in my.

Ahmed, where is the device tree for the EVB-KSZ9477?

Thanks
Andrew  


Re: [PATCH] net: macb: Fix regression breaking non-MDIO fixed-link PHYs

2018-08-14 Thread Brad Mouring
Hello Ahmed, Uwe,

On Tue, Aug 14, 2018 at 05:58:12PM +0200, Uwe Kleine-König wrote:
> Hello Ahmad,
> 
> 
> On Tue, Aug 14, 2018 at 04:12:40PM +0200, Ahmad Fatoum wrote:
> > The referenced commit broke initializing macb on the EVB-KSZ9477 eval board.
> > There, of_mdiobus_register was called even for the fixed-link representing
> > the SPI-connected switch PHY, with the result that the driver attempts to
> > enumerate PHYs on a non-existent MDIO bus:
> > 
> > libphy: MACB_mii_bus: probed
> > mdio_bus f0028000.ethernet-: fixed-link has invalid PHY address
> > mdio_bus f0028000.ethernet-: scan phy fixed-link at address 0
> > [snip]
> > mdio_bus f0028000.ethernet-: scan phy fixed-link at address 31
> > macb f0028000.ethernet: broken fixed-link specification
> > 
> > Cc: 
> > Fixes: 739de9a1563a ("net: macb: Reorganize macb_mii bringup")
> 
> I added the people involved in 739de9a1563a to Cc. Maybe they want to
> comment. So not stripping the remaining part of the original mail.
> 
> Best regards
> Uwe

You should probably prod Andrew Lunn, he suggested that I move the fixed
link code from macb_mii_init() to _probe(). Here, you're at least partially
directly undoing that.

(ref: https://www.mail-archive.com/netdev@vger.kernel.org/msg221018.html)

> > Signed-off-by: Ahmad Fatoum 
> > ---
> >  drivers/net/ethernet/cadence/macb_main.c | 26 +++-
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> > b/drivers/net/ethernet/cadence/macb_main.c
> > index a6c911bb5ce2..d202a03c42ed 100644
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -481,11 +481,6 @@ static int macb_mii_probe(struct net_device *dev)
> >  
> > if (np) {
> > if (of_phy_is_fixed_link(np)) {
> > -   if (of_phy_register_fixed_link(np) < 0) {
> > -   dev_err(>pdev->dev,
> > -   "broken fixed-link specification\n");
> > -   return -ENODEV;
> > -   }
> > bp->phy_node = of_node_get(np);
> > } else {
> > bp->phy_node = of_parse_phandle(np, "phy-handle", 0);
> > @@ -568,7 +563,7 @@ static int macb_mii_init(struct macb *bp)
> >  {
> > struct macb_platform_data *pdata;
> > struct device_node *np;
> > -   int err;
> > +   int err = -ENXIO;
> >  
> > /* Enable management port */
> > macb_writel(bp, NCR, MACB_BIT(MPE));
> > @@ -591,10 +586,21 @@ static int macb_mii_init(struct macb *bp)
> > dev_set_drvdata(>dev->dev, bp->mii_bus);
> >  
> > np = bp->pdev->dev.of_node;
> > -   if (pdata)
> > -   bp->mii_bus->phy_mask = pdata->phy_mask;
> > +   if (np && of_phy_is_fixed_link(np)) {
> > +   if (of_phy_register_fixed_link(np) < 0) {
> > +   dev_err(>pdev->dev,
> > +   "broken fixed-link specification\n");
> > +   goto err_out_free_mdiobus;
> > +   }
> > +
> > +   err = mdiobus_register(bp->mii_bus);
> > +   } else {
> > +   if (pdata)
> > +   bp->mii_bus->phy_mask = pdata->phy_mask;
> > +
> > +   err = of_mdiobus_register(bp->mii_bus, np);
> > +   }
> >  
> > -   err = of_mdiobus_register(bp->mii_bus, np);
> > if (err)
> > goto err_out_free_mdiobus;
> >  
> > @@ -606,9 +612,9 @@ static int macb_mii_init(struct macb *bp)
> >  
> >  err_out_unregister_bus:
> > mdiobus_unregister(bp->mii_bus);
> > +err_out_free_mdiobus:
> > if (np && of_phy_is_fixed_link(np))
> > of_phy_deregister_fixed_link(np);
> > -err_out_free_mdiobus:
> > of_node_put(bp->phy_node);
> > mdiobus_free(bp->mii_bus);
> >  err_out:
> > -- 
> > 2.18.0
> > 
> > 
> > 
> 
> -- 
> Pengutronix e.K.   | Uwe Kleine-König|
> Industrial Linux Solutions | 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pengutronix.de_=DwIDAw=I_0YwoKy7z5LMTVdyO6YCiE2uzI1jjZZuIPelcSjixA=8iZb7YNOSMVIG_mTIHDL03ZcObgQI_gGlWrSewdGETA=IQAK1YsKs7Z2bvZxuajiSXw3asFiEztQKYkvy-LpBn8=XKrOhFxQshoEcDxMZVSATnJW2cbaD16mQofKdEbJVW0=
>   |

-- 
Brad Mouring
Senior Software Engineer
National Instruments
512-683-6610 / bmour...@ni.com


Re: [PATCH v2 net-next] veth: Free queues on link delete

2018-08-14 Thread Toshiaki Makita
On 2018/08/15 10:29, David Ahern wrote:
> On 8/14/18 7:16 PM, Toshiaki Makita wrote:
>> Hmm, on second thought this queues need to be freed after veth_close()
>> to make sure no packet will reference them. That means we need to free
>> them in .ndo_uninit() or destructor.
>> (rtnl_delete_link() calls dellink() before unregister_netdevice_many()
>> which calls dev_close_many() through rollback_registered_many())
>>
>> Currently veth has destructor veth_dev_free() for vstats, so we can free
>> queues in the function.
>> To be in line with vstats, allocation also should be moved to
>> veth_dev_init().
> 
> given that, can you take care of the free in the proper location?

Sure, will cook a patch.
Thanks!

-- 
Toshiaki Makita



Re: [PATCH v2 net-next] veth: Free queues on link delete

2018-08-14 Thread David Ahern
On 8/14/18 7:16 PM, Toshiaki Makita wrote:
> Hmm, on second thought this queues need to be freed after veth_close()
> to make sure no packet will reference them. That means we need to free
> them in .ndo_uninit() or destructor.
> (rtnl_delete_link() calls dellink() before unregister_netdevice_many()
> which calls dev_close_many() through rollback_registered_many())
> 
> Currently veth has destructor veth_dev_free() for vstats, so we can free
> queues in the function.
> To be in line with vstats, allocation also should be moved to
> veth_dev_init().

given that, can you take care of the free in the proper location?


Re: [PATCH v2 net-next] veth: Free queues on link delete

2018-08-14 Thread Toshiaki Makita
On 2018/08/15 10:04, dsah...@kernel.org wrote:
> From: David Ahern 
> 
> kmemleak reported new suspected memory leaks.
> $ cat /sys/kernel/debug/kmemleak
> unreferenced object 0x8800354d5c00 (size 1024):
>   comm "ip", pid 836, jiffies 4294722952 (age 25.904s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [<(ptrval)>] kmemleak_alloc+0x70/0x94
> [<(ptrval)>] slab_post_alloc_hook+0x42/0x52
> [<(ptrval)>] __kmalloc+0x101/0x142
> [<(ptrval)>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
> [<(ptrval)>] veth_newlink+0x147/0x3ac [veth]
> ...
> unreferenced object 0x88002e009c00 (size 1024):
>   comm "ip", pid 836, jiffies 4294722958 (age 25.898s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [<(ptrval)>] kmemleak_alloc+0x70/0x94
> [<(ptrval)>] slab_post_alloc_hook+0x42/0x52
> [<(ptrval)>] __kmalloc+0x101/0x142
> [<(ptrval)>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
> [<(ptrval)>] veth_newlink+0x219/0x3ac [veth]
> 
> The allocations in question are veth_alloc_queues for the dev and its peer.
> 
> Free the queues on a delete.
> 
> Fixes: 638264dc90227 ("veth: Support per queue XDP ring")
> Signed-off-by: David Ahern 
> ---
> v2
> - free peer dev queues as well
> 
>  drivers/net/veth.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index e3202af72df5..2a3ce60631ef 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -1205,6 +1205,7 @@ static void veth_dellink(struct net_device *dev, struct 
> list_head *head)
>   struct veth_priv *priv;
>   struct net_device *peer;
>  
> + veth_free_queues(dev);
>   priv = netdev_priv(dev);
>   peer = rtnl_dereference(priv->peer);
>  
> @@ -1216,6 +1217,7 @@ static void veth_dellink(struct net_device *dev, struct 
> list_head *head)
>   unregister_netdevice_queue(dev, head);
>  
>   if (peer) {
> + veth_free_queues(peer);
>   priv = netdev_priv(peer);
>   RCU_INIT_POINTER(priv->peer, NULL);
>   unregister_netdevice_queue(peer, head);

Hmm, on second thought this queues need to be freed after veth_close()
to make sure no packet will reference them. That means we need to free
them in .ndo_uninit() or destructor.
(rtnl_delete_link() calls dellink() before unregister_netdevice_many()
which calls dev_close_many() through rollback_registered_many())

Currently veth has destructor veth_dev_free() for vstats, so we can free
queues in the function.
To be in line with vstats, allocation also should be moved to
veth_dev_init().

-- 
Toshiaki Makita



[PATCH v2 net-next] veth: Free queues on link delete

2018-08-14 Thread dsahern
From: David Ahern 

kmemleak reported new suspected memory leaks.
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0x8800354d5c00 (size 1024):
  comm "ip", pid 836, jiffies 4294722952 (age 25.904s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<(ptrval)>] kmemleak_alloc+0x70/0x94
[<(ptrval)>] slab_post_alloc_hook+0x42/0x52
[<(ptrval)>] __kmalloc+0x101/0x142
[<(ptrval)>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
[<(ptrval)>] veth_newlink+0x147/0x3ac [veth]
...
unreferenced object 0x88002e009c00 (size 1024):
  comm "ip", pid 836, jiffies 4294722958 (age 25.898s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<(ptrval)>] kmemleak_alloc+0x70/0x94
[<(ptrval)>] slab_post_alloc_hook+0x42/0x52
[<(ptrval)>] __kmalloc+0x101/0x142
[<(ptrval)>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
[<(ptrval)>] veth_newlink+0x219/0x3ac [veth]

The allocations in question are veth_alloc_queues for the dev and its peer.

Free the queues on a delete.

Fixes: 638264dc90227 ("veth: Support per queue XDP ring")
Signed-off-by: David Ahern 
---
v2
- free peer dev queues as well

 drivers/net/veth.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index e3202af72df5..2a3ce60631ef 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1205,6 +1205,7 @@ static void veth_dellink(struct net_device *dev, struct 
list_head *head)
struct veth_priv *priv;
struct net_device *peer;
 
+   veth_free_queues(dev);
priv = netdev_priv(dev);
peer = rtnl_dereference(priv->peer);
 
@@ -1216,6 +1217,7 @@ static void veth_dellink(struct net_device *dev, struct 
list_head *head)
unregister_netdevice_queue(dev, head);
 
if (peer) {
+   veth_free_queues(peer);
priv = netdev_priv(peer);
RCU_INIT_POINTER(priv->peer, NULL);
unregister_netdevice_queue(peer, head);
-- 
2.11.0



Re: [PATCH net] veth: Free queues on link delete

2018-08-14 Thread David Ahern
On 8/14/18 6:37 PM, Toshiaki Makita wrote:
> On 2018/08/15 7:36, dsah...@kernel.org wrote:
>> From: David Ahern 
>>
>> kmemleak reported new suspected memory leaks.
>> $ cat /sys/kernel/debug/kmemleak
>> unreferenced object 0x880130b6ec00 (size 1024):
>>   comm "ip", pid 916, jiffies 4296194668 (age 7251.672s)
>>   hex dump (first 32 bytes):
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>>   backtrace:
>> [<1ed37cc9>] kmemleak_alloc+0x70/0x94
>> [<646dfdeb>] slab_post_alloc_hook+0x42/0x52
>> [<04aba61b>] __kmalloc+0x101/0x142
>> [<54d50e21>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
>> [<8238855a>] veth_newlink+0x147/0x3ac [veth]
>> ...
>>
>> The allocation in question is veth_alloc_queues.
>>
>> Free the queues on a delete.
> 
> Oops, thanks for catching this.
> 
>> Fixes: 638264dc90227 ("veth: Support per queue XDP ring")
>> Signed-off-by: David Ahern 
>> ---
>>  drivers/net/veth.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>> index e3202af72df5..bef7d212f04e 100644
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -1205,6 +1205,7 @@ static void veth_dellink(struct net_device *dev, 
>> struct list_head *head)
>>  struct veth_priv *priv;
>>  struct net_device *peer;
>>  
>> +veth_free_queues(dev);
>>  priv = netdev_priv(dev);
>>  peer = rtnl_dereference(priv->peer);
> 
> We need to free up peer queues as well.

missed that. Odd that kmemleak was not complaining.

> Also isn't this for net-next though it is now closed?
> 

yes. was not sure if net-next is now net.

will send a v2.


Re: [PATCH net] veth: Free queues on link delete

2018-08-14 Thread Toshiaki Makita
On 2018/08/15 7:36, dsah...@kernel.org wrote:
> From: David Ahern 
> 
> kmemleak reported new suspected memory leaks.
> $ cat /sys/kernel/debug/kmemleak
> unreferenced object 0x880130b6ec00 (size 1024):
>   comm "ip", pid 916, jiffies 4296194668 (age 7251.672s)
>   hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [<1ed37cc9>] kmemleak_alloc+0x70/0x94
> [<646dfdeb>] slab_post_alloc_hook+0x42/0x52
> [<04aba61b>] __kmalloc+0x101/0x142
> [<54d50e21>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
> [<8238855a>] veth_newlink+0x147/0x3ac [veth]
> ...
> 
> The allocation in question is veth_alloc_queues.
> 
> Free the queues on a delete.

Oops, thanks for catching this.

> Fixes: 638264dc90227 ("veth: Support per queue XDP ring")
> Signed-off-by: David Ahern 
> ---
>  drivers/net/veth.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index e3202af72df5..bef7d212f04e 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -1205,6 +1205,7 @@ static void veth_dellink(struct net_device *dev, struct 
> list_head *head)
>   struct veth_priv *priv;
>   struct net_device *peer;
>  
> + veth_free_queues(dev);
>   priv = netdev_priv(dev);
>   peer = rtnl_dereference(priv->peer);

We need to free up peer queues as well.
Also isn't this for net-next though it is now closed?

-- 
Toshiaki Makita



[RFC][bug?] "net/act_pedit: Introduce 'add' operation" is broken for anything wider than an octet

2018-08-14 Thread Al Viro
The code doing addition in that commit is

+   switch (cmd) {
+   case TCA_PEDIT_KEY_EX_CMD_SET:
+   val = tkey->val;
+   break;
+   case TCA_PEDIT_KEY_EX_CMD_ADD:
+   val = (*ptr + tkey->val) & ~tkey->mask;
+   break;
+   default:
+   pr_info("tc filter pedit bad command (%d)\n",
+   cmd);
+   goto bad;
+   }
+
+   *ptr = ((*ptr & tkey->mask) ^ val);


Any net-endian field wider than an octet will have the carry between
octets handled wrong on little-endian hosts.  Should we at least
verify that ~mask fits into one octet?

As it is, consider e.g. an attempt to subtract 1 from a 16bit field
at offset 2 in a word.  We want {0,0,0,1} (0x1000 from host POV)
to turn into 0, so the value to add would be 0xff00.  Except that
{0, 0, 1, 0} would turn into {0, 0, 1, 0xff} that way, not the
expected {0, 0, 0, 0xff}.

Granted, there's not a lot of wider-than-octet fields where arithmetics
would've made sense, but we probably ought to refuse allowing such
operations.  Especially since on big-endian hosts they will work
just fine until you try to move that over to a little-endian box...

Alternatively, we could do something like
val = htonl(be32_to_cpup(ptr) + ntohl(tkey->val)) & ~tkey->mask;
but I'm not sure if that's worth doing.  It's not as if there would be
a major overhead, but still...

Comments?


Re: [PATCH bpf] bpf: fix a rcu usage warning in bpf_prog_array_copy_core()

2018-08-14 Thread Roman Gushchin
On Tue, Aug 14, 2018 at 04:59:45PM -0700, Alexei Starovoitov wrote:
> On Tue, Aug 14, 2018 at 11:01:12AM -0700, Yonghong Song wrote:
> > Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
> > to the cgroup storage") refactored the bpf_prog_array_copy_core()
> > to accommodate new structure bpf_prog_array_item which contains
> > bpf_prog array itself.
> > 
> > In the old code, we had
> >perf_event_query_prog_array():
> >  mutex_lock(...)
> >  bpf_prog_array_copy_call():
> >prog = rcu_dereference_check(array, 1)->progs
> >bpf_prog_array_copy_core(prog, ...)
> >  mutex_unlock(...)
> > 
> > With the above commit, we had
> >perf_event_query_prog_array():
> >  mutex_lock(...)
> >  bpf_prog_array_copy_call():
> >bpf_prog_array_copy_core(array, ...):
> >  item = rcu_dereference(array)->items;
> >  ...
> >  mutex_unlock(...)
> > 
> > The new code will trigger a lockdep rcu checking warning.
> > The fix is to change rcu_dereference() to rcu_dereference_check()
> > to prevent such a warning.
> > 
> > Reported-by: syzbot+6e72317008eef84a2...@syzkaller.appspotmail.com
> > Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the 
> > cgroup storage")
> > Cc: Roman Gushchin 
> > Signed-off-by: Yonghong Song 
> 
> makes sense to me
> Acked-by: Alexei Starovoitov 
> 
> Roman, would you agree?
> 

rcu_dereference_check(<>, 1) always looks a bit strange to me,
but if it's the only reasonable way to silence the warning,
of course I'm fine with it.

Thanks!


virtio_net failover and initramfs (was: Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework)

2018-08-14 Thread Siwei Liu
Are we sure all userspace apps skip and ignore slave interfaces by
just looking at "IFLA_MASTER" attribute?

When STANDBY is enabled on virtio-net, a failover master interface
will appear, which automatically enslaves the virtio device. But it is
found out that iSCSI (or any network boot) cannot boot strap over the
new failover interface together with a standby virtio (without any VF
or PT device in place).

Dracut (initramfs) ends up with timeout and dropping into emergency shell:

[  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
timeout - starting timeout scripts
[  228.171788] dracut-initqueue[377]: Warning: Could not boot.
 Starting Dracut Emergency Shell...
Generating "/run/initramfs/rdsosreport.txt"
Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.
dracut:/# ip l sh
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500 qdisc noqueue
state UP mode DEFAULT group default qlen 1000
link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
3: eth1:  mtu 1500 qdisc pfifo_fast
master eth0 state UP mode DEFAULT group default qlen 1000
link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
dracut:/#

If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
network boot starts to work.

The reason is that dracut has its own means to differentiate virtual
interfaces for network boot: it does not look at IFLA_MASTER and
ignores slave interfaces. Instead, users have to provide explicit
option e.g. bond=eth0,eth1 in the boot line, then dracut would know
the config and ignore the slave interfaces.

However, with automatic creation of failover interface that assumption
is no longer true. Can we change dracut to ignore all slave interface
by checking  IFLA_MASTER? I don't think so. It has a large impact to
existing configs.

What's a feasible solution then? Check the driver name for failover as well?

Thanks,
-Siwei



On Tue, May 22, 2018 at 10:38 AM, Jiri Pirko  wrote:
> Tue, May 22, 2018 at 06:52:21PM CEST, m...@redhat.com wrote:
>>On Tue, May 22, 2018 at 05:45:01PM +0200, Jiri Pirko wrote:
>>> Tue, May 22, 2018 at 05:32:30PM CEST, m...@redhat.com wrote:
>>> >On Tue, May 22, 2018 at 05:13:43PM +0200, Jiri Pirko wrote:
>>> >> Tue, May 22, 2018 at 03:39:33PM CEST, m...@redhat.com wrote:
>>> >> >On Tue, May 22, 2018 at 03:26:26PM +0200, Jiri Pirko wrote:
>>> >> >> Tue, May 22, 2018 at 03:17:37PM CEST, m...@redhat.com wrote:
>>> >> >> >On Tue, May 22, 2018 at 03:14:22PM +0200, Jiri Pirko wrote:
>>> >> >> >> Tue, May 22, 2018 at 03:12:40PM CEST, m...@redhat.com wrote:
>>> >> >> >> >On Tue, May 22, 2018 at 11:08:53AM +0200, Jiri Pirko wrote:
>>> >> >> >> >> Tue, May 22, 2018 at 11:06:37AM CEST, j...@resnulli.us wrote:
>>> >> >> >> >> >Tue, May 22, 2018 at 04:06:18AM CEST, 
>>> >> >> >> >> >sridhar.samudr...@intel.com wrote:
>>> >> >> >> >> >>Use the registration/notification framework supported by the 
>>> >> >> >> >> >>generic
>>> >> >> >> >> >>failover infrastructure.
>>> >> >> >> >> >>
>>> >> >> >> >> >>Signed-off-by: Sridhar Samudrala 
>>> >> >> >> >> >
>>> >> >> >> >> >In previous patchset versions, the common code did
>>> >> >> >> >> >netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>> >> >> >> >> >(netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>> >> >> >> >> >
>>> >> >> >> >> >This should be part of the common "failover" code.
>>> >> >> >> >> >
>>> >> >> >> >>
>>> >> >> >> >> Also note that in the current patchset you use IFF_FAILOVER 
>>> >> >> >> >> flag for
>>> >> >> >> >> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>> >> >> >> >> IFF_FAILOVER_SLAVE should be used.
>>> >> >> >> >
>>> >> >> >> >Or drop IFF_FAILOVER_SLAVE and set both IFF_FAILOVER and 
>>> >> >> >> >IFF_SLAVE?
>>> >> >> >>
>>> >> >> >> No. IFF_SLAVE is for bonding.
>>> >> >> >
>>> >> >> >What breaks if we reuse it for failover?
>>> >> >>
>>> >> >> This is exposed to userspace. IFF_SLAVE is expected for bonding 
>>> >> >> slaves.
>>> >> >> And failover slave is not a bonding slave.
>>> >> >
>>> >> >That does not really answer the question.  I'd claim it's sufficiently
>>> >> >like a bond slave for IFF_SLAVE to make sense.
>>> >> >
>>> >> >In fact you will find that netvsc already sets IFF_SLAVE, and so
>>> >>
>>> >> netvsc does the whole failover thing in a wrong way. This patchset is
>>> >> trying to fix it.
>>> >
>>> >Maybe, but we don't need gratuitous changes either, especially if they
>>> >break userspace.
>>>
>>> What do you mean by the "break"? It was a mistake to reuse IFF_SLAVE at
>>> the first place, lets fix it. If some userspace depends on that flag, it
>>> is broken anyway.
>>>
>>>
>>> >
>>> >> >does e.g. the eql driver.
>>> >> >
>>> >> >The advantage of using IFF_SLAVE is that userspace 

Re: [PATCH bpf] bpf: fix a rcu usage warning in bpf_prog_array_copy_core()

2018-08-14 Thread Alexei Starovoitov
On Tue, Aug 14, 2018 at 11:01:12AM -0700, Yonghong Song wrote:
> Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
> to the cgroup storage") refactored the bpf_prog_array_copy_core()
> to accommodate new structure bpf_prog_array_item which contains
> bpf_prog array itself.
> 
> In the old code, we had
>perf_event_query_prog_array():
>  mutex_lock(...)
>  bpf_prog_array_copy_call():
>prog = rcu_dereference_check(array, 1)->progs
>bpf_prog_array_copy_core(prog, ...)
>  mutex_unlock(...)
> 
> With the above commit, we had
>perf_event_query_prog_array():
>  mutex_lock(...)
>  bpf_prog_array_copy_call():
>bpf_prog_array_copy_core(array, ...):
>  item = rcu_dereference(array)->items;
>  ...
>  mutex_unlock(...)
> 
> The new code will trigger a lockdep rcu checking warning.
> The fix is to change rcu_dereference() to rcu_dereference_check()
> to prevent such a warning.
> 
> Reported-by: syzbot+6e72317008eef84a2...@syzkaller.appspotmail.com
> Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the 
> cgroup storage")
> Cc: Roman Gushchin 
> Signed-off-by: Yonghong Song 

makes sense to me
Acked-by: Alexei Starovoitov 

Roman, would you agree?



Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-14 Thread Cong Wang
On Tue, Aug 14, 2018 at 10:35 AM Vlad Buslov  wrote:
>
>
> On Mon 13 Aug 2018 at 23:18, Cong Wang  wrote:
> > Hi, Vlad,
> >
> > Could you help to test my fixes?
> >
> > I just pushed them into my own git repo:
> > https://github.com/congwang/linux/commits/net-sched-fixes
> >
> > Particularly, this is the revert:
> > https://github.com/congwang/linux/commit/b3f51c4ab8272cc8d3244848e528fce1426c4659
> > and this is my fix for the lockdep warning you reported:
> > https://github.com/congwang/linux/commit/ecadcde94919183e9f0d5bc376f05e731baf2661
> >
> > I don't have environment to test ife modules.
>
> Hi Cong,
>
> I've run the test with your patch applied and couldn't reproduce the
> lockdep warning.
>

Thank you! I will add your Test-by and send them out.


Re: [PATCH net-next] net: sched: act_ife: always release ife action on init error

2018-08-14 Thread Cong Wang
On Tue, Aug 14, 2018 at 10:30 AM Vlad Buslov  wrote:
>
> Action init API was changed to always take reference to action, even when
> overwriting existing action. Substitute conditional action release, which
> was executed only if action is newly created, with unconditional release in
> tcf_ife_init() error handling code to prevent double free or memory leak in
> case of overwrite.
>
> Fixes: 4e8ddd7f1758 ("net: sched: don't release reference on action 
> overwrite")
> Reported-by: Cong Wang 
> Signed-off-by: Vlad Buslov 

Looks good,

Acked-by: Cong Wang 


[PATCH net] veth: Free queues on link delete

2018-08-14 Thread dsahern
From: David Ahern 

kmemleak reported new suspected memory leaks.
$ cat /sys/kernel/debug/kmemleak
unreferenced object 0x880130b6ec00 (size 1024):
  comm "ip", pid 916, jiffies 4296194668 (age 7251.672s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
  backtrace:
[<1ed37cc9>] kmemleak_alloc+0x70/0x94
[<646dfdeb>] slab_post_alloc_hook+0x42/0x52
[<04aba61b>] __kmalloc+0x101/0x142
[<54d50e21>] kmalloc_array.constprop.20+0x1e/0x26 [veth]
[<8238855a>] veth_newlink+0x147/0x3ac [veth]
...

The allocation in question is veth_alloc_queues.

Free the queues on a delete.

Fixes: 638264dc90227 ("veth: Support per queue XDP ring")
Signed-off-by: David Ahern 
---
 drivers/net/veth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index e3202af72df5..bef7d212f04e 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1205,6 +1205,7 @@ static void veth_dellink(struct net_device *dev, struct 
list_head *head)
struct veth_priv *priv;
struct net_device *peer;
 
+   veth_free_queues(dev);
priv = netdev_priv(dev);
peer = rtnl_dereference(priv->peer);
 
-- 
2.11.0



Re: [PATCH net] cls_matchall: fix tcf_unbind_filter missing

2018-08-14 Thread Cong Wang
On Tue, Aug 14, 2018 at 2:28 AM Hangbin Liu  wrote:
>
> Fix tcf_unbind_filter missing in cls_matchall as this will trigger
> WARN_ON() in cbq_destroy_class().
>
> Fixes: fd62d9f5c575f ("net/sched: matchall: Fix configuration race")
> Reported-by: Li Shuang 
> Signed-off-by: Hangbin Liu 

Acked-by: Cong Wang 


[Patch net-next] ila: make lockdep happy again

2018-08-14 Thread Cong Wang
Previously, alloc_ila_locks() and bucket_table_alloc() call
spin_lock_init() separately, therefore they have two different
lock names and lock class keys. However, after commit b893281715ab
("ila: Call library function alloc_bucket_locks") they both call
helper alloc_bucket_spinlocks() which now only has one lock
name and lock class key. This causes a few bogus lockdep warnings
as reported by syzbot.

Fix this by making alloc_bucket_locks() a macro and pass declaration
name as lock name and a static lock class key inside the macro.

Fixes: b893281715ab ("ila: Call library function alloc_bucket_locks")
Reported-by: 
Cc: Tom Herbert 
Signed-off-by: Cong Wang 
---
 include/linux/spinlock.h | 17 ++---
 lib/bucket_locks.c   | 11 +++
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index fd57888d4942..a81040719fb9 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -432,9 +432,20 @@ extern int _atomic_dec_and_lock_irqsave(atomic_t *atomic, 
spinlock_t *lock,
 #define atomic_dec_and_lock_irqsave(atomic, lock, flags) \
__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, 
&(flags)))
 
-int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
-  size_t max_size, unsigned int cpu_mult,
-  gfp_t gfp);
+int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask,
+size_t max_size, unsigned int cpu_mult,
+gfp_t gfp, const char *name,
+struct lock_class_key *key);
+
+#define alloc_bucket_spinlocks(locks, lock_mask, max_size, cpu_mult, gfp)\
+   ({   \
+   static struct lock_class_key key;\
+   int ret; \
+\
+   ret = __alloc_bucket_spinlocks(locks, lock_mask, max_size,   \
+  cpu_mult, gfp, #locks, ); \
+   ret; \
+   })
 
 void free_bucket_spinlocks(spinlock_t *locks);
 
diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
index ade3ce6c4af6..64b92e1dbace 100644
--- a/lib/bucket_locks.c
+++ b/lib/bucket_locks.c
@@ -11,8 +11,9 @@
  * to a power of 2 to be suitable as a hash table.
  */
 
-int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
-  size_t max_size, unsigned int cpu_mult, gfp_t gfp)
+int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
+size_t max_size, unsigned int cpu_mult, gfp_t gfp,
+const char *name, struct lock_class_key *key)
 {
spinlock_t *tlocks = NULL;
unsigned int i, size;
@@ -33,8 +34,10 @@ int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int 
*locks_mask,
tlocks = kvmalloc_array(size, sizeof(spinlock_t), gfp);
if (!tlocks)
return -ENOMEM;
-   for (i = 0; i < size; i++)
+   for (i = 0; i < size; i++) {
spin_lock_init([i]);
+   lockdep_init_map([i].dep_map, name, key, 0);
+   }
}
 
*locks = tlocks;
@@ -42,7 +45,7 @@ int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int 
*locks_mask,
 
return 0;
 }
-EXPORT_SYMBOL(alloc_bucket_spinlocks);
+EXPORT_SYMBOL(__alloc_bucket_spinlocks);
 
 void free_bucket_spinlocks(spinlock_t *locks)
 {
-- 
2.14.4



Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-14 Thread Al Viro
How can cxgb4/cxgb4_tc_flower.c handling of 16bit
fields possibly work on b-e?  Look:
case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
switch (offset) {
case PEDIT_TCP_SPORT_DPORT:
if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
offload_pedit(fs, cpu_to_be32(val) >> 16,
  cpu_to_be32(mask) >> 16,
  TCP_SPORT);

OK, we are feeding two results of >> 16 (i.e. the values in
range 0..65535 from the host POV) to offload_pedit().  Which does

static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
  u8 field)
{
u32 set_val = val & ~mask;

OK, it's a value in range 0..65535.

u32 offset = 0;
u8 size = 1;
int i;

for (i = 0; i < ARRAY_SIZE(pedits); i++) {
if (pedits[i].field == field) {
go until we finally find this:
PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0),
i.e.
{TCP_SPORT, 2, offsetof(struct ch_filter_specification, nat_fport)}
offset = pedits[i].offset;
size = pedits[i].size;
... resulting in offset = offsetof(..., nat_fport), size = 2
break;
}
}
memcpy((u8 *)fs + offset, _val, size);
... and we copy the first two bytes of set_val to fs->nat_fport, right?

On little-endian, assuming that val & 0x was 256 * V0 + V1 and
mask & 0x - 256 * M0 + M1, we get cpu_to_be32(val) >> 16 equal to
256 * V1 + V0, and similar for mask, resuling in set_val containing
{V0 & ~M0, V1 & ~M1, 0, 0}, with the first two bytes copied to fs->nat_fport.

Now, think what will happen on big-endian.  The value in set_val has upper
16 bits all zero, no matter what - shift anything 32bit down by 16 and you'll
get that.  And on big-endian that's first two bytes of memory representation,
so this memcpy() is absolutely guaranteed to set fs->nat_fport to zero.
No matter how fancy the hardware is, it can't guess what had the other two
bytes been - CPU has discarded those before the NIC had a chance to see
them.

Am I right assuming that the val is supposed to be {S1, S0, D1, D0},
with sport == S1 * 256 + S0, dport == D1 * 256 + D0?  If so, the following
ought to work [== COMPLETELY UNTESTED, in other words] on l-e same as the
current code does and do the right thing on b-e.  Objections?

offload_pedit() is broken for big-endian; it's actually easier to spell the
memcpy (and in case of ports - memcpy-with-byteswap) explicitly, avoiding
both the b-e problems and getting rid of a lot of LoC, including an unpleasant
macro.

Signed-off-by: Al Viro 
---
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 3db969eefba9..020ca0121fb4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -43,27 +43,6 @@
 
 #define STATS_CHECK_PERIOD (HZ / 2)
 
-static struct ch_tc_pedit_fields pedits[] = {
-   PEDIT_FIELDS(ETH_, DMAC_31_0, 4, dmac, 0),
-   PEDIT_FIELDS(ETH_, DMAC_47_32, 2, dmac, 4),
-   PEDIT_FIELDS(ETH_, SMAC_15_0, 2, smac, 0),
-   PEDIT_FIELDS(ETH_, SMAC_47_16, 4, smac, 2),
-   PEDIT_FIELDS(IP4_, SRC, 4, nat_fip, 0),
-   PEDIT_FIELDS(IP4_, DST, 4, nat_lip, 0),
-   PEDIT_FIELDS(IP6_, SRC_31_0, 4, nat_fip, 0),
-   PEDIT_FIELDS(IP6_, SRC_63_32, 4, nat_fip, 4),
-   PEDIT_FIELDS(IP6_, SRC_95_64, 4, nat_fip, 8),
-   PEDIT_FIELDS(IP6_, SRC_127_96, 4, nat_fip, 12),
-   PEDIT_FIELDS(IP6_, DST_31_0, 4, nat_lip, 0),
-   PEDIT_FIELDS(IP6_, DST_63_32, 4, nat_lip, 4),
-   PEDIT_FIELDS(IP6_, DST_95_64, 4, nat_lip, 8),
-   PEDIT_FIELDS(IP6_, DST_127_96, 4, nat_lip, 12),
-   PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0),
-   PEDIT_FIELDS(TCP_, DPORT, 2, nat_lport, 0),
-   PEDIT_FIELDS(UDP_, SPORT, 2, nat_fport, 0),
-   PEDIT_FIELDS(UDP_, DPORT, 2, nat_lport, 0),
-};
-
 static struct ch_tc_flower_entry *allocate_flower_entry(void)
 {
struct ch_tc_flower_entry *new = kzalloc(sizeof(*new), GFP_KERNEL);
@@ -306,81 +285,63 @@ static int cxgb4_validate_flow_match(struct net_device 
*dev,
return 0;
 }
 
-static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 
mask,
- u8 field)
-{
-   u32 set_val = val & ~mask;
-   u32 offset = 0;
-   u8 size = 1;
-   int i;
-
-   for (i = 0; i < ARRAY_SIZE(pedits); i++) {
-   if (pedits[i].field == field) {
-   offset = pedits[i].offset;
-   size = pedits[i].size;
-   break;
-   }
-   }
-   memcpy((u8 *)fs + offset, _val, size);
-}
-
-static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
-   u32 mask, u32 offset, u8 

Re: [PATCH next-queue 3/8] ixgbe: add VF ipsec management

2018-08-14 Thread Shannon Nelson

On 8/13/2018 10:31 PM, kbuild test robot wrote:

Hi Shannon,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[also build test ERROR on v4.18 next-20180813]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Shannon-Nelson/ixgbe-ixgbevf-IPsec-offload-support-for-VFs/20180814-074800
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git 
dev-queue
config: x86_64-randconfig-v0-08131550 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
 # save the attached .config to linux build tree
 make ARCH=x86_64

All errors (new ones prefixed by >>):

drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function 
`ixgbe_ipsec_vf_add_sa':

drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c:917: undefined reference to 
`xfrm_aead_get_byname'

make[1]: *** [vmlinux] Error 1
make[1]: Target '_all' not remade because of errors.



Huh, odd.  I'm not able to reproduce this error using your config file 
in a net-next tree of vintage v4.18-rc8 or in Jeff's dev-queue branch. 
It looks like I'm using an older compiler (4.8.5) but that shouldn't 
make a difference here.


sln



[PATCH net-next] net: sched: always disable bh when taking tcf_lock

2018-08-14 Thread Vlad Buslov
Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.

Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:

[  105.470398] 
[  105.475014] WARNING: inconsistent lock state
[  105.479628] 4.18.0-rc8+ #664 Not tainted
[  105.483897] 
[  105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  105.500449] f86c012e (&(>tcfa_lock)->rlock){+.?.}, at: 
est_fetch_counters+0x3c/0xa0
[  105.509696] {SOFTIRQ-ON-W} state was registered at:
[  105.514925]   _raw_spin_lock+0x2c/0x40
[  105.519022]   tcf_bpf_init+0x579/0x820 [act_bpf]
[  105.523990]   tcf_action_init_1+0x4e4/0x660
[  105.528518]   tcf_action_init+0x1ce/0x2d0
[  105.532880]   tcf_exts_validate+0x1d8/0x200
[  105.537416]   fl_change+0x55a/0x268b [cls_flower]
[  105.542469]   tc_new_tfilter+0x748/0xa20
[  105.546738]   rtnetlink_rcv_msg+0x56a/0x6d0
[  105.551268]   netlink_rcv_skb+0x18d/0x200
[  105.555628]   netlink_unicast+0x2d0/0x370
[  105.559990]   netlink_sendmsg+0x3b9/0x6a0
[  105.564349]   sock_sendmsg+0x6b/0x80
[  105.568271]   ___sys_sendmsg+0x4a1/0x520
[  105.572547]   __sys_sendmsg+0xd7/0x150
[  105.576655]   do_syscall_64+0x72/0x2c0
[  105.580757]   entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  105.586243] irq event stamp: 489296
[  105.590084] hardirqs last  enabled at (489296): [] 
_raw_spin_unlock_irq+0x29/0x40
[  105.599765] hardirqs last disabled at (489295): [] 
_raw_spin_lock_irq+0x15/0x50
[  105.609277] softirqs last  enabled at (489292): [] 
irq_enter+0x83/0xa0
[  105.618001] softirqs last disabled at (489293): [] 
irq_exit+0x140/0x190
[  105.626813]
   other info that might help us debug this:
[  105.633976]  Possible unsafe locking scenario:

[  105.640526]CPU0
[  105.643325]
[  105.646125]   lock(&(>tcfa_lock)->rlock);
[  105.650747]   
[  105.653717] lock(&(>tcfa_lock)->rlock);
[  105.658514]
*** DEADLOCK ***

[  105.665349] 1 lock held by swapper/16/0:
[  105.669629]  #0: a640ad99 ((>timer)){+.-.}, at: 
call_timer_fn+0x10b/0x550
[  105.678200]
   stack backtrace:
[  105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[  105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 
03/30/2017
[  105.698626] Call Trace:
[  105.701421]  
[  105.703791]  dump_stack+0x92/0xeb
[  105.707461]  print_usage_bug+0x336/0x34c
[  105.711744]  mark_lock+0x7c9/0x980
[  105.715500]  ? print_shortest_lock_dependencies+0x2e0/0x2e0
[  105.721424]  ? check_usage_forwards+0x230/0x230
[  105.726315]  __lock_acquire+0x923/0x26f0
[  105.730597]  ? debug_show_all_locks+0x240/0x240
[  105.735478]  ? mark_lock+0x493/0x980
[  105.739412]  ? check_chain_key+0x140/0x1f0
[  105.743861]  ? __lock_acquire+0x836/0x26f0
[  105.748323]  ? lock_acquire+0x12e/0x290
[  105.752516]  lock_acquire+0x12e/0x290
[  105.756539]  ? est_fetch_counters+0x3c/0xa0
[  105.761084]  _raw_spin_lock+0x2c/0x40
[  105.765099]  ? est_fetch_counters+0x3c/0xa0
[  105.769633]  est_fetch_counters+0x3c/0xa0
[  105.773995]  est_timer+0x87/0x390
[  105.777670]  ? est_fetch_counters+0xa0/0xa0
[  105.782210]  ? lock_acquire+0x12e/0x290
[  105.786410]  call_timer_fn+0x161/0x550
[  105.790512]  ? est_fetch_counters+0xa0/0xa0
[  105.795055]  ? del_timer_sync+0xd0/0xd0
[  105.799249]  ? __lock_is_held+0x93/0x110
[  105.803531]  ? mark_held_locks+0x20/0xe0
[  105.807813]  ? _raw_spin_unlock_irq+0x29/0x40
[  105.812525]  ? est_fetch_counters+0xa0/0xa0
[  105.817069]  ? est_fetch_counters+0xa0/0xa0
[  105.821610]  run_timer_softirq+0x3c4/0x9f0
[  105.826064]  ? lock_acquire+0x12e/0x290
[  105.830257]  ? __bpf_trace_timer_class+0x10/0x10
[  105.835237]  ? __lock_is_held+0x25/0x110
[  105.839517]  __do_softirq+0x11d/0x7bf
[  105.843542]  irq_exit+0x140/0x190
[  105.847208]  smp_apic_timer_interrupt+0xac/0x3b0
[  105.852182]  apic_timer_interrupt+0xf/0x20
[  105.856628]  
[  105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[  105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 
80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 
24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[  105.884288] RSP: 0018:8803ad94fd20 EFLAGS: 0246 ORIG_RAX: 
ff13
[  105.892494] RAX:  RBX: e8fb300829c0 RCX: b41e19e1
[  105.899988] RDX: 0007 RSI: dc00 RDI: 8803ad9358ac
[  105.907503] RBP: b6636300 R08: 0004 R09: 
[  105.914997] R10:  

[PATCH bpf] bpf: fix a rcu usage warning in bpf_prog_array_copy_core()

2018-08-14 Thread Yonghong Song
Commit 394e40a29788 ("bpf: extend bpf_prog_array to store pointers
to the cgroup storage") refactored the bpf_prog_array_copy_core()
to accommodate new structure bpf_prog_array_item which contains
bpf_prog array itself.

In the old code, we had
   perf_event_query_prog_array():
 mutex_lock(...)
 bpf_prog_array_copy_call():
   prog = rcu_dereference_check(array, 1)->progs
   bpf_prog_array_copy_core(prog, ...)
 mutex_unlock(...)

With the above commit, we had
   perf_event_query_prog_array():
 mutex_lock(...)
 bpf_prog_array_copy_call():
   bpf_prog_array_copy_core(array, ...):
 item = rcu_dereference(array)->items;
 ...
 mutex_unlock(...)

The new code will trigger a lockdep rcu checking warning.
The fix is to change rcu_dereference() to rcu_dereference_check()
to prevent such a warning.

Reported-by: syzbot+6e72317008eef84a2...@syzkaller.appspotmail.com
Fixes: 394e40a29788 ("bpf: extend bpf_prog_array to store pointers to the 
cgroup storage")
Cc: Roman Gushchin 
Signed-off-by: Yonghong Song 
---
 kernel/bpf/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4d09e610777f..3f5bf1af0826 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1579,7 +1579,7 @@ static bool bpf_prog_array_copy_core(struct 
bpf_prog_array __rcu *array,
struct bpf_prog_array_item *item;
int i = 0;
 
-   item = rcu_dereference(array)->items;
+   item = rcu_dereference_check(array, 1)->items;
for (; item->prog; item++) {
if (item->prog == _bpf_prog.prog)
continue;
-- 
2.17.1



Re: [PATCH][bpf-next] bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"

2018-08-14 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 14 Aug 2018 10:39:12 -0700

> On Mon, Aug 13, 2018 at 03:00:32PM +0100, Colin King wrote:
>> From: Colin Ian King 
>> 
>> Trivial fix to spelling mistake in error message
>> 
>> Signed-off-by: Colin Ian King 
> 
> Acked-by: Alexei Starovoitov 
> 
> Dave, may be you can take this one directly, since PR for net-next is not yet 
> sent to Linus?

Sure, done.


Re: [PATCH] hv/netvsc: Fix NULL dereference at single queue mode fallback

2018-08-14 Thread Takashi Iwai
On Tue, 14 Aug 2018 19:29:32 +0200,
David Miller wrote:
> 
> From: Takashi Iwai 
> Date: Tue, 14 Aug 2018 19:10:50 +0200
> 
> > The recent commit 916c5e1413be ("hv/netvsc: fix handling of fallback
> > to single queue mode") tried to fix the fallback behavior to a single
> > queue mode, but it changed the function to return zero incorrectly,
> > while the function should return an object pointer.  Eventually this
> > leads to a NULL dereference at the callers that expect non-NULL
> > value.
> > 
> > Fix it by returning the proper net_device object.
> > 
> > Fixes: 916c5e1413be ("hv/netvsc: fix handling of fallback to single queue 
> > mode")
> > Signed-off-by: Takashi Iwai 
> 
> Applied and queued up for -stable.
> 
> Please do not put explicit "CC: stable" notations in networking patches, I 
> queue
> up and submit networking patches to -stable explicitly.

OK, noted for the next time.  Thanks!


Takashi


Re: [PATCH][bpf-next] bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"

2018-08-14 Thread Alexei Starovoitov
On Mon, Aug 13, 2018 at 03:00:32PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in error message
> 
> Signed-off-by: Colin Ian King 

Acked-by: Alexei Starovoitov 

Dave, may be you can take this one directly, since PR for net-next is not yet 
sent to Linus?



Re: [PATCH net-next] net: sched: act_ife: disable bh when taking ife_mod_lock

2018-08-14 Thread Vlad Buslov


On Mon 13 Aug 2018 at 23:18, Cong Wang  wrote:
> Hi, Vlad,
>
> Could you help to test my fixes?
>
> I just pushed them into my own git repo:
> https://github.com/congwang/linux/commits/net-sched-fixes
>
> Particularly, this is the revert:
> https://github.com/congwang/linux/commit/b3f51c4ab8272cc8d3244848e528fce1426c4659
> and this is my fix for the lockdep warning you reported:
> https://github.com/congwang/linux/commit/ecadcde94919183e9f0d5bc376f05e731baf2661
>
> I don't have environment to test ife modules.

Hi Cong,

I've run the test with your patch applied and couldn't reproduce the
lockdep warning.

>
> BTW, this is the fix for the deadlock I spotted:
> https://github.com/congwang/linux/commit/44f3d7f5b6ed2d4a46177e6c658fa23b76141afa
>
> Thanks!



[PATCH net-next] net: sched: act_ife: always release ife action on init error

2018-08-14 Thread Vlad Buslov
Action init API was changed to always take reference to action, even when
overwriting existing action. Substitute conditional action release, which
was executed only if action is newly created, with unconditional release in
tcf_ife_init() error handling code to prevent double free or memory leak in
case of overwrite.

Fixes: 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite")
Reported-by: Cong Wang 
Signed-off-by: Vlad Buslov 
---
 net/sched/act_ife.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 5d200495e467..c524edcad900 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -551,9 +551,6 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
   NULL, NULL);
if (err) {
 metadata_parse_err:
-   if (ret == ACT_P_CREATED)
-   tcf_idr_release(*a, bind);
-
if (exists)
spin_unlock_bh(>tcf_lock);
tcf_idr_release(*a, bind);
@@ -574,11 +571,10 @@ static int tcf_ife_init(struct net *net, struct nlattr 
*nla,
 */
err = use_all_metadata(ife);
if (err) {
-   if (ret == ACT_P_CREATED)
-   tcf_idr_release(*a, bind);
-
if (exists)
spin_unlock_bh(>tcf_lock);
+   tcf_idr_release(*a, bind);
+
kfree(p);
return err;
}
-- 
2.7.5



Re: [PATCH] hv/netvsc: Fix NULL dereference at single queue mode fallback

2018-08-14 Thread David Miller
From: Takashi Iwai 
Date: Tue, 14 Aug 2018 19:10:50 +0200

> The recent commit 916c5e1413be ("hv/netvsc: fix handling of fallback
> to single queue mode") tried to fix the fallback behavior to a single
> queue mode, but it changed the function to return zero incorrectly,
> while the function should return an object pointer.  Eventually this
> leads to a NULL dereference at the callers that expect non-NULL
> value.
> 
> Fix it by returning the proper net_device object.
> 
> Fixes: 916c5e1413be ("hv/netvsc: fix handling of fallback to single queue 
> mode")
> Signed-off-by: Takashi Iwai 

Applied and queued up for -stable.

Please do not put explicit "CC: stable" notations in networking patches, I queue
up and submit networking patches to -stable explicitly.

Thank you.


Re: [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite

2018-08-14 Thread Vlad Buslov
On Mon 13 Aug 2018 at 23:00, Cong Wang  wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov  wrote:
>> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
>> index 89a761395c94..acea3feae762 100644
>> --- a/net/sched/act_ife.c
>> +++ b/net/sched/act_ife.c
> ...
>> @@ -548,6 +546,8 @@ static int tcf_ife_init(struct net *net, struct nlattr 
>> *nla,
>>
>> if (exists)
>> spin_unlock_bh(>tcf_lock);
>> +   tcf_idr_release(*a, bind);
>> +
>> kfree(p);
>> return err;
>> }
>
> With this change, you seem release it twice when nla_parse_nested() fails
> for ACT_P_CREATED case...?

Thank you, great catch!

>
> Looks like what you want is the following?
>
> if (err) {
> tcf_idr_release(*a, bind);
> kfree(p);
> return err;
> }

Yes. Sending the fix.


Re: [PATCH] hv/netvsc: Fix NULL dereference at single queue mode fallback

2018-08-14 Thread Stephen Hemminger
On Tue, 14 Aug 2018 19:10:50 +0200
Takashi Iwai  wrote:

> The recent commit 916c5e1413be ("hv/netvsc: fix handling of fallback
> to single queue mode") tried to fix the fallback behavior to a single
> queue mode, but it changed the function to return zero incorrectly,
> while the function should return an object pointer.  Eventually this
> leads to a NULL dereference at the callers that expect non-NULL
> value.
> 
> Fix it by returning the proper net_device object.
> 
> Fixes: 916c5e1413be ("hv/netvsc: fix handling of fallback to single queue 
> mode")
> Cc: 
> Signed-off-by: Takashi Iwai 

Reviewed-by: Stephen Hemminger 


Re: [Intel-wired-lan] [PATCH next-queue 0/8] ixgbe/ixgbevf: IPsec offload support for VFs

2018-08-14 Thread Shannon Nelson

On 8/14/2018 8:30 AM, Alexander Duyck wrote:

On Mon, Aug 13, 2018 at 11:43 AM Shannon Nelson
 wrote:


This set of patches implements IPsec hardware offload for VF devices in
Intel's 10Gbe x540 family of Ethernet devices.


[...]



So the one question I would have about this patch set is what happens
if you are setting up a ipsec connection between the PF and one of the
VFs on the same port/function? Do the ipsec offloads get translated
across the Tx loopback or do they end up causing issues? Specifically
I would be interested in seeing the results of a test either between
two VFs, or the PF and one of the VFs on the same port.

- Alex



There is definitely something funky in the internal switch connection, 
as messages going from PF to VF with an offloaded encryption don't seem 
to get received by the VF, at least when in a VEB setup.  If I only set 
up offloads on the Rx on both PF and VF, and don't offload the Tx, then 
things work.


I don't have a setup to test this, but I suspect that in a VEPA 
configuration, with packets going out to the switch and turned around 
back in, the Tx encryption offload would happen as expected.


sln


[PATCH] hv/netvsc: Fix NULL dereference at single queue mode fallback

2018-08-14 Thread Takashi Iwai
The recent commit 916c5e1413be ("hv/netvsc: fix handling of fallback
to single queue mode") tried to fix the fallback behavior to a single
queue mode, but it changed the function to return zero incorrectly,
while the function should return an object pointer.  Eventually this
leads to a NULL dereference at the callers that expect non-NULL
value.

Fix it by returning the proper net_device object.

Fixes: 916c5e1413be ("hv/netvsc: fix handling of fallback to single queue mode")
Cc: 
Signed-off-by: Takashi Iwai 
---
 drivers/net/hyperv/rndis_filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 408ece27131c..2a5209f23f29 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1338,7 +1338,7 @@ struct netvsc_device *rndis_filter_device_add(struct 
hv_device *dev,
/* setting up multiple channels failed */
net_device->max_chn = 1;
net_device->num_chn = 1;
-   return 0;
+   return net_device;
 
 err_dev_remv:
rndis_filter_device_remove(dev, net_device);
-- 
2.18.0



Re: [PATCH net] cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0

2018-08-14 Thread David Miller
From: Ganesh Goudar 
Date: Tue, 14 Aug 2018 16:21:37 +0530

> Signed-off-by: Ganesh Goudar 

Applied.


Re: [PATCH net-next] net: dsa: mv88e6xxx: missing unlock on error path

2018-08-14 Thread David Miller
From: Dan Carpenter 
Date: Tue, 14 Aug 2018 12:09:05 +0300

> We added a new error path, but we need to drop the lock before we return.
> 
> Fixes: 2d2e1dd29962 ("net: dsa: mv88e6xxx: Cache the port cmode")
> Signed-off-by: Dan Carpenter 

Applied.


Re: [PATCH net-next] net: dsa: mv88e6xxx: bitwise vs logical bug

2018-08-14 Thread David Miller
From: Dan Carpenter 
Date: Tue, 14 Aug 2018 12:06:43 +0300

> We are trying to test if these flags are set but there are some && vs &
> typos.
> 
> Fixes: efd1ba6af93f ("net: dsa: mv88e6xxx: Add SERDES phydev_mac_change up 
> for 6390")
> Signed-off-by: Dan Carpenter 

Applied.


Re: [PATCH] 9p/xen: fix check for xenbus_read error in front_probe

2018-08-14 Thread Stefano Stabellini
On Tue, 14 Aug 2018, Dominique Martinet wrote:
> From: Dominique Martinet 
> 
> If the xen bus exists but does not expose the proper interface, it is
> possible to get a non-zero length but still some error, leading to
> strcmp failing trying to load invalid memory addresses e.g.
> fffe.
> 
> There is then no need to check length when there is no error, as the
> xenbus driver guarantees that the string is nul-terminated.
> 
> Signed-off-by: Dominique Martinet 
> Cc: Stefano Stabellini 
> Cc: Eric Van Hensbergen 
> Cc: Latchesar Ionkov 

Reviewed-by: Stefano Stabellini 


> ---
> 
> This is a trivial bug I stumbled on when setting up xen with p9fs and
> running the VM in pvm: it had enough in the bus to trigger the probe
> but then there was no version and it tried to return ENOENT but len
> was set to the lower-level message size.
> 
>  net/9p/trans_xen.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index 1a5b38892eb4..f76beadddfc3 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -391,9 +391,9 @@ static int xen_9pfs_front_probe(struct xenbus_device *dev,
>   unsigned int max_rings, max_ring_order, len = 0;
>  
>   versions = xenbus_read(XBT_NIL, dev->otherend, "versions", );
> - if (!len)
> - return -EINVAL;
> + if (IS_ERR(versions))
> + return PTR_ERR(versions);
>   if (strcmp(versions, "1")) {
>   kfree(versions);
>   return -EINVAL;
>   }
> -- 
> 2.17.1
> 


Re: [PATCH net-next] ieee802154: hwsim: using right kind of iteration

2018-08-14 Thread David Miller
From: Alexander Aring 
Date: Sun, 12 Aug 2018 16:24:56 -0400

> This patch fixes the error path to unsubscribe all other phy's from
> current phy. The actually code using a wrong kind of list iteration may
> copied from the case to unsubscribe the current phy from all other
> phy's.
> 
> Cc: Stefan Schmidt 
> Reported-by: Dan Carpenter 
> Fixes: f25da51fdc38 ("ieee802154: hwsim: add replacement for fakelb")
> Signed-off-by: Alexander Aring 

I'll apply this directly, thanks.


Re: [PATCH] net: macb: Fix regression breaking non-MDIO fixed-link PHYs

2018-08-14 Thread Uwe Kleine-König
Hello Ahmad,


On Tue, Aug 14, 2018 at 04:12:40PM +0200, Ahmad Fatoum wrote:
> The referenced commit broke initializing macb on the EVB-KSZ9477 eval board.
> There, of_mdiobus_register was called even for the fixed-link representing
> the SPI-connected switch PHY, with the result that the driver attempts to
> enumerate PHYs on a non-existent MDIO bus:
> 
>   libphy: MACB_mii_bus: probed
>   mdio_bus f0028000.ethernet-: fixed-link has invalid PHY address
>   mdio_bus f0028000.ethernet-: scan phy fixed-link at address 0
> [snip]
>   mdio_bus f0028000.ethernet-: scan phy fixed-link at address 31
>   macb f0028000.ethernet: broken fixed-link specification
> 
> Cc: 
> Fixes: 739de9a1563a ("net: macb: Reorganize macb_mii bringup")

I added the people involved in 739de9a1563a to Cc. Maybe they want to
comment. So not stripping the remaining part of the original mail.

Best regards
Uwe

> Signed-off-by: Ahmad Fatoum 
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 26 +++-
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index a6c911bb5ce2..d202a03c42ed 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -481,11 +481,6 @@ static int macb_mii_probe(struct net_device *dev)
>  
>   if (np) {
>   if (of_phy_is_fixed_link(np)) {
> - if (of_phy_register_fixed_link(np) < 0) {
> - dev_err(>pdev->dev,
> - "broken fixed-link specification\n");
> - return -ENODEV;
> - }
>   bp->phy_node = of_node_get(np);
>   } else {
>   bp->phy_node = of_parse_phandle(np, "phy-handle", 0);
> @@ -568,7 +563,7 @@ static int macb_mii_init(struct macb *bp)
>  {
>   struct macb_platform_data *pdata;
>   struct device_node *np;
> - int err;
> + int err = -ENXIO;
>  
>   /* Enable management port */
>   macb_writel(bp, NCR, MACB_BIT(MPE));
> @@ -591,10 +586,21 @@ static int macb_mii_init(struct macb *bp)
>   dev_set_drvdata(>dev->dev, bp->mii_bus);
>  
>   np = bp->pdev->dev.of_node;
> - if (pdata)
> - bp->mii_bus->phy_mask = pdata->phy_mask;
> + if (np && of_phy_is_fixed_link(np)) {
> + if (of_phy_register_fixed_link(np) < 0) {
> + dev_err(>pdev->dev,
> + "broken fixed-link specification\n");
> + goto err_out_free_mdiobus;
> + }
> +
> + err = mdiobus_register(bp->mii_bus);
> + } else {
> + if (pdata)
> + bp->mii_bus->phy_mask = pdata->phy_mask;
> +
> + err = of_mdiobus_register(bp->mii_bus, np);
> + }
>  
> - err = of_mdiobus_register(bp->mii_bus, np);
>   if (err)
>   goto err_out_free_mdiobus;
>  
> @@ -606,9 +612,9 @@ static int macb_mii_init(struct macb *bp)
>  
>  err_out_unregister_bus:
>   mdiobus_unregister(bp->mii_bus);
> +err_out_free_mdiobus:
>   if (np && of_phy_is_fixed_link(np))
>   of_phy_deregister_fixed_link(np);
> -err_out_free_mdiobus:
>   of_node_put(bp->phy_node);
>   mdiobus_free(bp->mii_bus);
>  err_out:
> -- 
> 2.18.0
> 
> 
> 

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [Intel-wired-lan] [PATCH next-queue 0/8] ixgbe/ixgbevf: IPsec offload support for VFs

2018-08-14 Thread Alexander Duyck
On Mon, Aug 13, 2018 at 11:43 AM Shannon Nelson
 wrote:
>
> This set of patches implements IPsec hardware offload for VF devices in
> Intel's 10Gbe x540 family of Ethernet devices.
>
> The IPsec HW offload feature has been in the x540/Niantic family of
> network devices since their release in 2009, but there was no Linux
> kernel support for the offload until 2017.  After the XFRM code added
> support for the offload last year, the hw offload was added to the ixgbe
> PF driver.
>
> Since the related x540 VF device uses same setup as the PF for implementing
> the offload, adding the feature to the ixgbevf seemed like a good idea.
> In this case, the PF owns the device registers, so the VF simply packages
> up the request information into a VF<->PF message and the PF does the
> device configuration.  The resulting IPsec throughput is roughly equivalent
> to what we see in the PF - nearly line-rate, with the expected drop in CPU
> cycles burned.  (I'm not great at performance statistics, I'll let better
> folks do the actual measurements as they pertain to their own usage)
>
> To make use of the capability, first two things are needed: the PF must
> be told to enable the offload for VFs (it is off by default) and the VF
> must be trusted.  A new ethtool priv-flag for ixgbe is added to control
> VF offload support.  For example:
>
> ethtool --set-priv-flags eth0 vf-ipsec on
> ip link set eth0 vf 1 trust on
>
> Once those are set up and the VF device is UP, the user can add SAs the
> same as for PFs, whether the VF is in the host or has been assigned to
> a VM.
>
> Note that the x540 chip supports a total of 1024 Rx plus 1024 Tx Security
> Associations (SAs), shared among the PF and VFs that might request them.
> It is entirely possible for a single VF to soak up all the offload
> capability, which would likely annoy some people.  It seems rather
> arbitrary to try to set a limit for how many a VF could be allowed,
> but this is mitigated somewhat by the need for "trust" and "vf-ipsec"
> to be enabled.  I suppose we could come up with a way to make a limit
> configurable, but there is no existing method for adding that kind
> configuration so I'll leave that to a future discussion.
>
> Currently this doesn't support Tx offload as the hardware encryption
> engine doesn't seem to engage on the Tx packets.  This may be a lingering
> driver bug, more investigation is needed.  Until then, requests for a Tx
> offload are failed and the userland requester will need to add Tx SAs
> without the offload attribute.
>
> Given that we don't have Tx offload support, the benefit here is less
> than it could be, but is definitely still noticeable.  For example, with
> informal iperf testing over a 10Gbps link, with full offload in a PF on
> one side and a VF in a VM on the other side on a CPU with AES instructions:
>
> Reference:
> No IPsec: 9.4 Gbps
> IPsec offload btwn two PFs:   9.2 Gbps
> VF as the iperf receiver:
> IPsec offload on PF, none on VF:  6.8 Gbps
> IPsec offload on PF and VF:   9.2 Gbps   << biggest benefit
> VF as the iperf sender:
> IPsec offload on PF, none on VF:  4.8 Gbps
> IPsec offload on PF and VF:   4.8 Gbps
>
> The iperf traffic is primarily uni-directional, and we can see the most
> benefit when VF is the iperf server and is receiving the test traffic.
> Watching output from sar also shows a nice decrease in CPU utilization.
>

So the one question I would have about this patch set is what happens
if you are setting up a ipsec connection between the PF and one of the
VFs on the same port/function? Do the ipsec offloads get translated
across the Tx loopback or do they end up causing issues? Specifically
I would be interested in seeing the results of a test either between
two VFs, or the PF and one of the VFs on the same port.

- Alex


Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

2018-08-14 Thread Stefano Brivio
Hi William,

On Fri, 10 Aug 2018 07:11:01 -0700
William Tu  wrote:

> > int rr_select_srcport(struct dp_upcall_info *upcall)
> > {
> > /* look up source port from upcall->skb... */
> > }
> >
> > And we could then easily extend this to use BPF with maps one day.
> >
> >  
> Hi Stefano,
> 
> If you want to experiment with BPF, Joe and I have some prototype.
> We implemented the upcall mechanism using BPF perf event helper function
> https://github.com/williamtu/ovs-ebpf/blob/master/bpf/datapath.c#L62
> 
> And there are threads polling the perf ring buffer to receive packets from
> BPF.
> https://github.com/williamtu/ovs-ebpf/blob/master/lib/perf-event.c#L232

Interesting, thanks for the pointers!

> If I follow the discussion correctly, before upcall, you need to queue
> packets based on different configurations (vport/hash/vni/5-tuple/...)
> and queue to different buckets when congestion happens.

Yes, correct.

> In this case, you
> probably needs a BPF map to enqueue/dequeue the packet.BPF queue map is
> not supported yet, but there is patch available:
> [iovisor-dev] [RFC PATCH 1/3] bpf: add bpf queue map
> 
> So how to enqueue and dequeue packets depends on user's BPF implementation.
> This allows fairness scheme to be extensible.

For the moment being we'll try to ensure that BPF can be plugged there
rather easily. I see the advantage, but I'd rather do this as a second
step.

-- 
Stefano


[PATCH] net: macb: Fix regression breaking non-MDIO fixed-link PHYs

2018-08-14 Thread Ahmad Fatoum
The referenced commit broke initializing macb on the EVB-KSZ9477 eval board.
There, of_mdiobus_register was called even for the fixed-link representing
the SPI-connected switch PHY, with the result that the driver attempts to
enumerate PHYs on a non-existent MDIO bus:

libphy: MACB_mii_bus: probed
mdio_bus f0028000.ethernet-: fixed-link has invalid PHY address
mdio_bus f0028000.ethernet-: scan phy fixed-link at address 0
[snip]
mdio_bus f0028000.ethernet-: scan phy fixed-link at address 31
macb f0028000.ethernet: broken fixed-link specification

Cc: 
Fixes: 739de9a1563a ("net: macb: Reorganize macb_mii bringup")
Signed-off-by: Ahmad Fatoum 
---
 drivers/net/ethernet/cadence/macb_main.c | 26 +++-
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index a6c911bb5ce2..d202a03c42ed 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -481,11 +481,6 @@ static int macb_mii_probe(struct net_device *dev)
 
if (np) {
if (of_phy_is_fixed_link(np)) {
-   if (of_phy_register_fixed_link(np) < 0) {
-   dev_err(>pdev->dev,
-   "broken fixed-link specification\n");
-   return -ENODEV;
-   }
bp->phy_node = of_node_get(np);
} else {
bp->phy_node = of_parse_phandle(np, "phy-handle", 0);
@@ -568,7 +563,7 @@ static int macb_mii_init(struct macb *bp)
 {
struct macb_platform_data *pdata;
struct device_node *np;
-   int err;
+   int err = -ENXIO;
 
/* Enable management port */
macb_writel(bp, NCR, MACB_BIT(MPE));
@@ -591,10 +586,21 @@ static int macb_mii_init(struct macb *bp)
dev_set_drvdata(>dev->dev, bp->mii_bus);
 
np = bp->pdev->dev.of_node;
-   if (pdata)
-   bp->mii_bus->phy_mask = pdata->phy_mask;
+   if (np && of_phy_is_fixed_link(np)) {
+   if (of_phy_register_fixed_link(np) < 0) {
+   dev_err(>pdev->dev,
+   "broken fixed-link specification\n");
+   goto err_out_free_mdiobus;
+   }
+
+   err = mdiobus_register(bp->mii_bus);
+   } else {
+   if (pdata)
+   bp->mii_bus->phy_mask = pdata->phy_mask;
+
+   err = of_mdiobus_register(bp->mii_bus, np);
+   }
 
-   err = of_mdiobus_register(bp->mii_bus, np);
if (err)
goto err_out_free_mdiobus;
 
@@ -606,9 +612,9 @@ static int macb_mii_init(struct macb *bp)
 
 err_out_unregister_bus:
mdiobus_unregister(bp->mii_bus);
+err_out_free_mdiobus:
if (np && of_phy_is_fixed_link(np))
of_phy_deregister_fixed_link(np);
-err_out_free_mdiobus:
of_node_put(bp->phy_node);
mdiobus_free(bp->mii_bus);
 err_out:
-- 
2.18.0



Re: [PATCH v2 iproute2-next] Add SKB Priority qdisc support in tc(8)

2018-08-14 Thread David Ahern
On 8/13/18 8:57 PM, Nishanth Devarajan wrote:
> sch_skbprio is a qdisc that prioritizes packets according to their 
> skb->priority
> field. Under congestion, it drops already-enqueued lower priority packets to
> make space available for higher priority packets. Skbprio was conceived as a
> solution for denial-of-service defenses that need to route packets with
> different priorities as a means to overcome DoS attacks.
> 
> Signed-off-by: Nishanth Devarajan 
> Reviewed-by: Michel Machado 
> ---
> v2
> *Patch applies cleanly, fixes for proper code indentation.
> ---
>  man/man8/tc-skbprio.8 | 70 ++
>  tc/Makefile   |  1 +
>  tc/q_skbprio.c| 84 
> +++
>  3 files changed, 155 insertions(+)
>  create mode 100644 man/man8/tc-skbprio.8
>  create mode 100644 tc/q_skbprio.c

applied to iproute2-next. Thanks


Re: [PATCH net-next] net: dsa: mv88e6xxx: missing unlock on error path

2018-08-14 Thread Andrew Lunn
On Tue, Aug 14, 2018 at 12:09:05PM +0300, Dan Carpenter wrote:
> We added a new error path, but we need to drop the lock before we return.
> 
> Fixes: 2d2e1dd29962 ("net: dsa: mv88e6xxx: Cache the port cmode")
> Signed-off-by: Dan Carpenter 

Signed-off-by: Andrew Lunn 

Andrew


Re: [PATCH net-next] net: dsa: mv88e6xxx: bitwise vs logical bug

2018-08-14 Thread Andrew Lunn
On Tue, Aug 14, 2018 at 12:06:43PM +0300, Dan Carpenter wrote:
> We are trying to test if these flags are set but there are some && vs &
> typos.
> 
> Fixes: efd1ba6af93f ("net: dsa: mv88e6xxx: Add SERDES phydev_mac_change up 
> for 6390")
> Signed-off-by: Dan Carpenter 

Reviewed-by: Andrew Lunn 

Andrew


[iproute PATCH 3/3] testsuite: Add a first ss test validating ssfilter

2018-08-14 Thread Phil Sutter
This tests a few ssfilter expressions by selecting sockets from a TCP
dump file. The dump was created using the following command:

| ss -ntaD testsuite/tests/ss/ss1.dump

It is fed into ss via TCPDIAG_FILE environment variable.

Signed-off-by: Phil Sutter 
---
 testsuite/tests/ss/ss1.dump   | Bin 0 -> 720 bytes
 testsuite/tests/ss/ssfilter.t |  48 ++
 2 files changed, 48 insertions(+)
 create mode 100644 testsuite/tests/ss/ss1.dump
 create mode 100755 testsuite/tests/ss/ssfilter.t

diff --git a/testsuite/tests/ss/ss1.dump b/testsuite/tests/ss/ss1.dump
new file mode 100644
index 
..9c273231c78418593cabda324ca20d5a6d41e1aa
GIT binary patch
literal 720
zcmYdbU|Nin11kdun3n(~QOsv#0-E2u
z3TO>b6#}61K{9Mm>1mU45ek8<2e$al?_I?phHf4@A7mgq)YMKS^IrfxbkZgGJtnxD>cjP}t=IBMn#FbAjW2%?
Wu$m7G%#dh=`5=oj%@O8f3p)VpCN7Tv

literal 0
HcmV?d1

diff --git a/testsuite/tests/ss/ssfilter.t b/testsuite/tests/ss/ssfilter.t
new file mode 100755
index 0..e74f1765cb723
--- /dev/null
+++ b/testsuite/tests/ss/ssfilter.t
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+. lib/generic.sh
+
+# % ./misc/ss -Htna
+# LISTEN  01280.0.0.0:22   0.0.0.0:*
+# ESTAB   00 10.0.0.1:22  10.0.0.1:36266
+# ESTAB   00 10.0.0.1:36266   10.0.0.1:22
+# ESTAB   00 10.0.0.1:22  10.0.0.2:50312
+export TCPDIAG_FILE="$(dirname $0)/ss1.dump"
+
+ts_log "[Testing ssfilter]"
+
+ts_ss "$0" "Match dport = 22" -Htna dport = 22
+test_on "ESTAB0   0 10.0.0.1:36266   
10.0.0.1:22"
+
+ts_ss "$0" "Match dport 22" -Htna dport 22
+test_on "ESTAB0   0 10.0.0.1:36266   
10.0.0.1:22"
+
+ts_ss "$0" "Match (dport)" -Htna '( dport = 22 )'
+test_on "ESTAB0   0 10.0.0.1:36266   
10.0.0.1:22"
+
+ts_ss "$0" "Match src = 0.0.0.0" -Htna src = 0.0.0.0
+test_on "LISTEN 0   1280.0.0.0:22 
0.0.0.0:*"
+
+ts_ss "$0" "Match src 0.0.0.0" -Htna src 0.0.0.0
+test_on "LISTEN 0   1280.0.0.0:22 
0.0.0.0:*"
+
+ts_ss "$0" "Match src sport" -Htna src 0.0.0.0 sport = 22
+test_on "LISTEN 0   1280.0.0.0:22 
0.0.0.0:*"
+
+ts_ss "$0" "Match src and sport" -Htna src 0.0.0.0 and sport = 22
+test_on "LISTEN 0   1280.0.0.0:22 
0.0.0.0:*"
+
+ts_ss "$0" "Match src and sport and dport" -Htna src 10.0.0.1 and sport = 22 
and dport = 50312
+test_on "ESTAB0   0 10.0.0.1:22   
10.0.0.2:50312"
+
+ts_ss "$0" "Match src and sport and (dport)" -Htna 'src 10.0.0.1 and sport = 
22 and ( dport = 50312 )'
+test_on "ESTAB0   0 10.0.0.1:22   
10.0.0.2:50312"
+
+ts_ss "$0" "Match src and (sport and dport)" -Htna 'src 10.0.0.1 and ( sport = 
22 and dport = 50312 )'
+test_on "ESTAB0   0 10.0.0.1:22   
10.0.0.2:50312"
+
+ts_ss "$0" "Match (src and sport) and dport" -Htna '( src 10.0.0.1 and sport = 
22 ) and dport = 50312'
+test_on "ESTAB0   0 10.0.0.1:22   
10.0.0.2:50312"
+
+ts_ss "$0" "Match (src or src) and dst" -Htna '( src 0.0.0.0 or src 10.0.0.1 ) 
and dst 10.0.0.2'
+test_on "ESTAB0   0 10.0.0.1:22   
10.0.0.2:50312"
-- 
2.18.0



[iproute PATCH 0/3] Fix and test ssfilter

2018-08-14 Thread Phil Sutter
This series contains a fix for ssfilter and introduces a testscript to
verify correct functionality.

Phil Sutter (3):
  ss: Review ssfilter
  testsuite: Prepare for ss tests
  testsuite: Add a first ss test validating ssfilter

 misc/ssfilter.y   |  36 ++---
 testsuite/Makefile|   2 +-
 testsuite/lib/generic.sh  |  37 ++
 testsuite/tests/ss/ss1.dump   | Bin 0 -> 720 bytes
 testsuite/tests/ss/ssfilter.t |  48 ++
 5 files changed, 84 insertions(+), 39 deletions(-)
 create mode 100644 testsuite/tests/ss/ss1.dump
 create mode 100755 testsuite/tests/ss/ssfilter.t

-- 
2.18.0



[iproute PATCH 1/3] ss: Review ssfilter

2018-08-14 Thread Phil Sutter
The original problem was ssfilter rejecting single expressions if
enclosed in braces, such as:

| sport = 22 or ( dport = 22 )

This is fixed by allowing 'expr' to be an 'exprlist' enclosed in braces.
The no longer required recursion in 'exprlist' being an 'exprlist'
enclosed in braces is dropped.

In addition to that, a few other things are changed:

* Remove pointless 'null' prefix in 'appled' before 'exprlist'.
* For simple equals matches, '=' operator was required for ports but not
  allowed for hosts. Make this consistent by making '=' operator
  optional in both cases.

Reported-by: Samuel Mannehed 
Fixes: b2038cc0b2403 ("ssfilter: Eliminate shift/reduce conflicts")
Signed-off-by: Phil Sutter 
---
 misc/ssfilter.y | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/misc/ssfilter.y b/misc/ssfilter.y
index 88d4229a9b241..0413dddaa7584 100644
--- a/misc/ssfilter.y
+++ b/misc/ssfilter.y
@@ -42,24 +42,22 @@ static void yyerror(char *s)
 %nonassoc '!'
 
 %%
-applet: null exprlist
+applet: exprlist
 {
-*yy_ret = $2;
-$$ = $2;
+*yy_ret = $1;
+$$ = $1;
 }
 | null
 ;
+
 null:   /* NOTHING */ { $$ = NULL; }
 ;
+
 exprlist: expr
 | '!' expr
 {
 $$ = alloc_node(SSF_NOT, $2);
 }
-| '(' exprlist ')'
-{
-$$ = $2;
-}
 | exprlist '|' expr
 {
 $$ = alloc_node(SSF_OR, $1);
@@ -77,13 +75,21 @@ exprlist: expr
 }
 ;
 
-expr:  DCOND HOSTCOND
+eq:'='
+   | /* nothing */
+   ;
+
+expr:  '(' exprlist ')'
+   {
+   $$ = $2;
+   }
+   | DCOND eq HOSTCOND
 {
-   $$ = alloc_node(SSF_DCOND, $2);
+   $$ = alloc_node(SSF_DCOND, $3);
 }
-| SCOND HOSTCOND
+| SCOND eq HOSTCOND
 {
-   $$ = alloc_node(SSF_SCOND, $2);
+   $$ = alloc_node(SSF_SCOND, $3);
 }
 | DPORT GEQ HOSTCOND
 {
@@ -101,7 +107,7 @@ expr:   DCOND HOSTCOND
 {
 $$ = alloc_node(SSF_NOT, alloc_node(SSF_D_GE, $3));
 }
-| DPORT '=' HOSTCOND
+| DPORT eq HOSTCOND
 {
$$ = alloc_node(SSF_DCOND, $3);
 }
@@ -126,7 +132,7 @@ expr:   DCOND HOSTCOND
 {
 $$ = alloc_node(SSF_NOT, alloc_node(SSF_S_GE, $3));
 }
-| SPORT '=' HOSTCOND
+| SPORT eq HOSTCOND
 {
$$ = alloc_node(SSF_SCOND, $3);
 }
@@ -134,7 +140,7 @@ expr:   DCOND HOSTCOND
 {
$$ = alloc_node(SSF_NOT, alloc_node(SSF_SCOND, $3));
 }
-| DEVNAME '=' DEVCOND
+| DEVNAME eq DEVCOND
 {
$$ = alloc_node(SSF_DEVCOND, $3);
 }
@@ -142,7 +148,7 @@ expr:   DCOND HOSTCOND
 {
$$ = alloc_node(SSF_NOT, alloc_node(SSF_DEVCOND, $3));
 }
-| FWMARK '=' MARKMASK
+| FWMARK eq MARKMASK
 {
 $$ = alloc_node(SSF_MARKMASK, $3);
 }
-- 
2.18.0



[iproute PATCH 2/3] testsuite: Prepare for ss tests

2018-08-14 Thread Phil Sutter
This merges the shared bits from ts_tc() and ts_ip() into a common
function for being wrapped by the first ones and adds a third ts_ss()
for testing ss commands.

Signed-off-by: Phil Sutter 
---
 testsuite/Makefile   |  2 +-
 testsuite/lib/generic.sh | 37 ++---
 2 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/testsuite/Makefile b/testsuite/Makefile
index 2a54e5c845e65..8fcbc557ff9a7 100644
--- a/testsuite/Makefile
+++ b/testsuite/Makefile
@@ -65,7 +65,7 @@ endif
TMP_ERR=`mktemp /tmp/tc_testsuite.XX`; \
TMP_OUT=`mktemp /tmp/tc_testsuite.XX`; \
STD_ERR="$$TMP_ERR" STD_OUT="$$TMP_OUT" \
-   TC="$$i/tc/tc" IP="$$i/ip/ip" DEV="$(DEV)" IPVER="$@" 
SNAME="$$i" \
+   TC="$$i/tc/tc" IP="$$i/ip/ip" SS=$$i/misc/ss DEV="$(DEV)" 
IPVER="$@" SNAME="$$i" \
ERRF="$(RESULTS_DIR)/$@.$$o.err" $(KENV) $(PREFIX) tests/$@ > 
$(RESULTS_DIR)/$@.$$o.out; \
if [ "$$?" = "127" ]; then \
echo "SKIPPED"; \
diff --git a/testsuite/lib/generic.sh b/testsuite/lib/generic.sh
index 8cef20fa1b280..f92260fc40cf3 100644
--- a/testsuite/lib/generic.sh
+++ b/testsuite/lib/generic.sh
@@ -26,16 +26,17 @@ ts_skip()
 exit 127
 }
 
-ts_tc()
+__ts_cmd()
 {
+   CMD=$1; shift
SCRIPT=$1; shift
DESC=$1; shift
 
-   $TC $@ 2> $STD_ERR > $STD_OUT
+   $CMD $@ 2> $STD_ERR > $STD_OUT
 
if [ -s $STD_ERR ]; then
ts_err "${SCRIPT}: ${DESC} failed:"
-   ts_err "command: $TC $@"
+   ts_err "command: $CMD $@"
ts_err "stderr output:"
ts_err_cat $STD_ERR
if [ -s $STD_OUT ]; then
@@ -50,29 +51,19 @@ ts_tc()
fi
 }
 
-ts_ip()
+ts_tc()
 {
-   SCRIPT=$1; shift
-   DESC=$1; shift
+   __ts_cmd "$TC" "$@"
+}
 
-   $IP $@ 2> $STD_ERR > $STD_OUT
-RET=$?
+ts_ip()
+{
+   __ts_cmd "$IP" "$@"
+}
 
-   if [ -s $STD_ERR ] || [ "$RET" != "0" ]; then
-   ts_err "${SCRIPT}: ${DESC} failed:"
-   ts_err "command: $IP $@"
-   ts_err "stderr output:"
-   ts_err_cat $STD_ERR
-   if [ -s $STD_OUT ]; then
-   ts_err "stdout output:"
-   ts_err_cat $STD_OUT
-   fi
-   elif [ -s $STD_OUT ]; then
-   echo "${SCRIPT}: ${DESC} succeeded with output:"
-   cat $STD_OUT
-   else
-   echo "${SCRIPT}: ${DESC} succeeded"
-   fi
+ts_ss()
+{
+   __ts_cmd "$SS" "$@"
 }
 
 ts_qdisc_available()
-- 
2.18.0



Re: [PATCH bpf] Revert "xdp: add NULL pointer check in __xdp_return()"

2018-08-14 Thread Björn Töpel
Den fre 10 aug. 2018 kl 18:26 skrev Jakub Kicinski
:
>
> On Fri, 10 Aug 2018 17:16:45 +0200, Björn Töpel wrote:
> > Den fre 10 aug. 2018 kl 16:10 skrev Daniel Borkmann :
> > >
> > > On 08/10/2018 11:28 AM, Björn Töpel wrote:
> > > > From: Björn Töpel 
> > > >
> > > > This reverts commit 36e0f12bbfd3016f495904b35e41c5711707509f.
> > > >
> > > > The reverted commit adds a WARN to check against NULL entries in the
> > > > mem_id_ht rhashtable. Any kernel path implementing the XDP (generic or
> > > > driver) fast path is required to make a paired
> > > > xdp_rxq_info_reg/xdp_rxq_info_unreg call for proper function. In
> > > > addition, a driver using a different allocation scheme than the
> > > > default MEM_TYPE_PAGE_SHARED is required to additionally call
> > > > xdp_rxq_info_reg_mem_model.
> > > >
> > > > For MEM_TYPE_ZERO_COPY, an xdp_rxq_info_reg_mem_model call ensures
> > > > that the mem_id_ht rhashtable has a properly inserted allocator id. If
> > > > not, this would be a driver bug. A NULL pointer kernel OOPS is
> > > > preferred to the WARN.
> > > >
> > > > Suggested-by: Jesper Dangaard Brouer 
> > > > Signed-off-by: Björn Töpel 
> > >
> > > Given the last bpf pr went out yesterday night, I've applied this to
> > > bpf-next (worst case we can just route it via stable), thanks!
> >
> > Ah, right! Thanks!
> >
> > bpf-next is OK. (Since this path is currently not used yet by any driver... 
> > :-()
>
> Wasn't this dead code, anyway?  The frame return path is for redirects,
> and one can't convert_to_xdp_frame presently?

Indeed, dead it is. Hmm, I'll remove it as part of the i40e zc submission.


Björn


Re: [Query]: DSA Understanding

2018-08-14 Thread Lad, Prabhakar
Hi Florian,

On Mon, Aug 13, 2018 at 7:57 PM Florian Fainelli  wrote:
>
> On 08/13/2018 08:58 AM, Lad, Prabhakar wrote:
> > Hi Andrew/Florain,
> >
> > On Mon, Aug 13, 2018 at 2:38 PM Andrew Lunn  wrote:
> >>
>  I agree, this should be padding packets correctly, can you still
>  instrument cpsw to make sure that what comes to its ndo_start_xmit() is
>  ETH_ZLEN + tag_len or more?
> 
> >>> Yes I can confirm the skb->len is always >= 62 (ETH_ZLEN + 2)
> >>
> >> Which switch are you using?
> >>
> >> Marvell switches use either 4 or 8 bytes of tag. Broadcom has 4, KSZ
> >> has 1 for packets going to the switch, lan9303 has 4, mtd uses 4, qca
> >> has 2.
> >>
> > I am using the KSZ switch. for Ingress it has 1 byte and for Egress it
> > has 2 bytes.
> > I came across patch [1] and padded 2 more bytes in ksz_xmit() and I was
> > successfully able to ping from lan4 to PC. Thank you very much for
> > your guidance/support.
> >
> > Now I have stumbled into a different issue:
> >
> > Case 1 Works:
> > =
> > lan0 = 192.168.0.1
> > PC1 = 192.168.0.10
> > For the above ping works from both directions.
> >
> > CASE 2 Doesn’t Work:
> > =
> > lan0 = 192.168.0.1
> > PC1 = 192.168.0.10
> > lan4 = 192.168.0.4
> > PC2 = 192.168.0.11
> >
> > Ping from lan0 to PC1 and PC1 to lan0 works
> > But ping from PC2 to lan4 and lan4 to PC2 fails.
> >
> > CASE 3 Works:
> > =
> > lan0 = 192.168.0.1
> > PC1 = 192.168.0.10
> > lan4 = 192.168.4.4
> > PC2 = 192.168.4.11
> >
> > With the above setup ping works.
> >
> > [Query] Why does ping fail in case 2. Any thoughts what I am missing here ?
> > or is it the expected behaviour ?
>
> For case 2, what I suspect is happening is that the machine that has
> lan1/lan4, because you have put lan1/lan4 in the same subnet, does not
> know how to respond to PC2 because it is unable to select an appropriate
> output interface. In such cases, you might have to add an explicit /32
> route that forces telling the kernel that PC2 is accessible via lan2.
>
That did the trick thank you, following is my setup
lan0 = 192.168.0.1
PC1 = 192.168.0.10
lan4 = 192.168.0.4
PC2 = 192.168.0.11

route add 192.168.0.11  gw 192.168.0.4
route add 192.168.0.10  gw 192.168.0.1

And now ping works either ways. Is this setup for adding the route a valid way
to do it ? Or is there some standard way I need to follow which I am missing.

> Andrew, do you see an other explanation for that?
>
> >
> > [1] https://lore.kernel.org/patchwork/patch/851457/
>
> This patch works, but I think it is still working by "accident" in that
> if you have both VLAN tags + KSZ tag, you would likely still be too
> short by a few bytes.
>
too take care of this now I making sure the skb->len is padded to VLAN_ETH_ZLEN
in tag_ksz.c ksz_xmit() function.

Cheers,
--Prabhakar Lad


Re: [PATCH net 2/2] net/mlx5e: Cleanup of dcbnl related fields

2018-08-14 Thread Håkon Bugge



> On 14 Aug 2018, at 11:01, Yuval Shaia  wrote:
> 
> On Wed, Aug 08, 2018 at 03:48:08PM -0700, Saeed Mahameed wrote:
>> From: Huy Nguyen 
>> 
>> Remove unused netdev_registered_init/remove in en.h
>> Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.
> 
> s/ENOSUPPORT/EOPNOTSUPP
> (noted by Haakon)

Sure did,

> 
>> Remove extra white space

and I also said that this has nothing to with the commit (no matter how 
tempting it can be) ;-)


Thxs, Håkon

>> 
>> Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
>> Signed-off-by: Huy Nguyen 
>> Cc: Yuval Shaia 
>> Reviewed-by: Parav Pandit 
>> Signed-off-by: Saeed Mahameed 
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 --
>> .../ethernet/mellanox/mlx5/core/en_dcbnl.c| 30 +++
>> 2 files changed, 11 insertions(+), 21 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> index eb9eb7aa953a..405236cf0b04 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> @@ -858,8 +858,6 @@ struct mlx5e_profile {
>>  mlx5e_fp_handle_rx_cqe handle_rx_cqe;
>>  mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe;
>>  } rx_handlers;
>> -void(*netdev_registered_init)(struct mlx5e_priv *priv);
>> -void(*netdev_registered_remove)(struct mlx5e_priv *priv);
>>  int max_tc;
>> };
>> 
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
>> index e33afa8d2417..722998d68564 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
>> @@ -443,16 +443,12 @@ static int mlx5e_dcbnl_ieee_setapp(struct net_device 
>> *dev, struct dcb_app *app)
>>  bool is_new;
>>  int err;
>> 
>> -if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP)
>> -return -EINVAL;
>> -
>> -if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager))
>> -return -EINVAL;
>> -
>> -if (!MLX5_DSCP_SUPPORTED(priv->mdev))
>> -return -EINVAL;
>> +if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager) ||
>> +!MLX5_DSCP_SUPPORTED(priv->mdev))
>> +return -EOPNOTSUPP;
>> 
>> -if (app->protocol >= MLX5E_MAX_DSCP)
>> +if ((app->selector != IEEE_8021QAZ_APP_SEL_DSCP) ||
>> +(app->protocol >= MLX5E_MAX_DSCP))
>>  return -EINVAL;
>> 
>>  /* Save the old entry info */
>> @@ -500,16 +496,12 @@ static int mlx5e_dcbnl_ieee_delapp(struct net_device 
>> *dev, struct dcb_app *app)
>>  struct mlx5e_priv *priv = netdev_priv(dev);
>>  int err;
>> 
>> -if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP)
>> -return -EINVAL;
>> -
>> -if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager))
>> -return -EINVAL;
>> -
>> -if (!MLX5_DSCP_SUPPORTED(priv->mdev))
>> -return -EINVAL;
>> +if  (!MLX5_CAP_GEN(priv->mdev, vport_group_manager) ||
>> + !MLX5_DSCP_SUPPORTED(priv->mdev))
>> +return -EOPNOTSUPP;
>> 
>> -if (app->protocol >= MLX5E_MAX_DSCP)
>> +if ((app->selector != IEEE_8021QAZ_APP_SEL_DSCP) ||
>> +(app->protocol >= MLX5E_MAX_DSCP))
>>  return -EINVAL;
>> 
>>  /* Skip if no dscp app entry */
>> @@ -1146,7 +1138,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv 
>> *priv, u8 trust_state)
>> {
>>  int err;
>> 
>> -err =  mlx5_set_trust_state(priv->mdev, trust_state);
>> +err = mlx5_set_trust_state(priv->mdev, trust_state);
>>  if (err)
>>  return err;
>>  priv->dcbx_dp.trust_state = trust_state;
>> -- 
>> 2.17.0
>> 



[PATCH net] cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0

2018-08-14 Thread Ganesh Goudar
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h 
b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
index e3adf43..60df66f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h
@@ -189,6 +189,8 @@ CH_PCI_DEVICE_ID_TABLE_DEFINE_BEGIN
CH_PCI_ID_TABLE_FENTRY(0x50ac), /* Custom T540-BT */
CH_PCI_ID_TABLE_FENTRY(0x50ad), /* Custom T520-CR */
CH_PCI_ID_TABLE_FENTRY(0x50ae), /* Custom T540-XL-SO */
+   CH_PCI_ID_TABLE_FENTRY(0x50af), /* Custom T580-KR-SO */
+   CH_PCI_ID_TABLE_FENTRY(0x50b0), /* Custom T520-CR-LOM */
 
/* T6 adapters:
 */
-- 
2.1.0



[PATCH net] cls_matchall: fix tcf_unbind_filter missing

2018-08-14 Thread Hangbin Liu
Fix tcf_unbind_filter missing in cls_matchall as this will trigger
WARN_ON() in cbq_destroy_class().

Fixes: fd62d9f5c575f ("net/sched: matchall: Fix configuration race")
Reported-by: Li Shuang 
Signed-off-by: Hangbin Liu 
---
 net/sched/cls_matchall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 47b207e..7ad65da 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -111,6 +111,8 @@ static void mall_destroy(struct tcf_proto *tp, struct 
netlink_ext_ack *extack)
if (!head)
return;
 
+   tcf_unbind_filter(tp, >res);
+
if (!tc_skip_hw(head->flags))
mall_destroy_hw_filter(tp, head, (unsigned long) head, extack);
 
-- 
2.5.5



Re: [Intel-wired-lan] e1000e driver stuck at 10Mbps after reconnection

2018-08-14 Thread Camille Bordignon
Le mercredi 08 août 2018 à 18:00:28 (+0300), Neftin, Sasha a écrit :
> On 8/8/2018 17:24, Neftin, Sasha wrote:
> > On 8/7/2018 09:42, Camille Bordignon wrote:
> > > Le lundi 06 août 2018 à 15:45:29 (-0700), Alexander Duyck a écrit :
> > > > On Mon, Aug 6, 2018 at 4:59 AM, Camille Bordignon
> > > >  wrote:
> > > > > Hello,
> > > > > 
> > > > > Recently we experienced some issues with intel NIC (I219-LM
> > > > > and I219-V).
> > > > > It seems that after a wire reconnection, auto-negotation "fails" and
> > > > > link speed drips to 10 Mbps.
> > > > > 
> > > > >  From kernel logs:
> > > > > [17616.346150] e1000e: enp0s31f6 NIC Link is Down
> > > > > [17627.003322] e1000e: enp0s31f6 NIC Link is Up 10 Mbps Full
> > > > > Duplex, Flow Control: None
> > > > > [17627.003325] e1000e :00:1f.6 enp0s31f6: 10/100 speed:
> > > > > disabling TSO
> > > > > 
> > > > > 
> > > > > $ethtool enp0s31f6
> > > > > Settings for enp0s31f6:
> > > > >  Supported ports: [ TP ]
> > > > >  Supported link modes:   10baseT/Half 10baseT/Full
> > > > >  100baseT/Half 100baseT/Full
> > > > >  1000baseT/Full
> > > > >  Supported pause frame use: No
> > > > >  Supports auto-negotiation: Yes
> > > > >  Supported FEC modes: Not reported
> > > > >  Advertised link modes:  10baseT/Half 10baseT/Full
> > > > >  100baseT/Half 100baseT/Full
> > > > >  1000baseT/Full
> > > > >  Advertised pause frame use: No
> > > > >  Advertised auto-negotiation: Yes
> > > > >  Advertised FEC modes: Not reported
> > > > >  Speed: 10Mb/s
> > > > >  Duplex: Full
> > > > >  Port: Twisted Pair
> > > > >  PHYAD: 1
> > > > >  Transceiver: internal
> > > > >  Auto-negotiation: on
> > > > >  MDI-X: on (auto)
> > > > >  Supports Wake-on: pumbg
> > > > >  Wake-on: g
> > > > >  Current message level: 0x0007 (7)
> > > > >     drv probe link
> > > > >  Link detected: yes
> > > > > 
> > > > > 
> > > > > Notice that if disconnection last less than about 5 seconds,
> > > > > nothing wrong happens.
> > > > > And if after last failure, disconnection / connection occurs again and
> > > > > last less than 5 seconds, link speed is back to 1000 Mbps.
> > > > > 
> > > > > [18075.350678] e1000e: enp0s31f6 NIC Link is Down
> > > > > [18078.716245] e1000e: enp0s31f6 NIC Link is Up 1000 Mbps
> > > > > Full Duplex, Flow Control: None
> > > > > 
> > > > > The following patch seems to fix this issue.
> > > > > However I don't clearly understand why.
> > > > > 
> > > > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> > > > > b/drivers/net/ethernet/intel/e1000e/netdev.c
> > > > > index 3ba0c90e7055..763c013960f1 100644
> > > > > --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> > > > > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> > > > > @@ -5069,7 +5069,7 @@ static bool e1000e_has_link(struct
> > > > > e1000_adapter *adapter)
> > > > >  case e1000_media_type_copper:
> > > > >  if (hw->mac.get_link_status) {
> > > > >  ret_val = hw->mac.ops.check_for_link(hw);
> > > > > -   link_active = !hw->mac.get_link_status;
> > > > > +   link_active = false;
> > > > >  } else {
> > > > >  link_active = true;
> > > > >  }
> > > > > 
> > > > > Maybe this is related to watchdog task.
> > > > > 
> > > > > I've found out this fix by comparing with last commit that works fine 
> > > > > :
> > > > > commit 0b76aae741abb9d16d2c0e67f8b1e766576f897d.
> > > > > However I don't know if this information is relevant.
> > > > > 
> > > > > Thank you.
> > > > > Camille Bordignon
> > > > 
> > > > What kernel were you testing this on? I know there have been a number
> > > > of changes over the past few months in this area and it would be
> > > > useful to know exactly what code base you started out with and what
> > > > the latest version of the kernel is you have tested.
> > > > 
> > > > Looking over the code change the net effect of it should be to add a 2
> > > > second delay from the time the link has changed until you actually
> > > > check the speed/duplex configuration. It is possible we could be
> > > > seeing some sort of timing issue and adding the 2 second delay after
> > > > the link event is enough time for things to stabilize and detect the
> > > > link at 1000 instead of 10/100.
> > > > 
> > > > - Alex
> > > 
> > > We've found out this issue using Fedora 27 (4.17.11-100.fc27.x86_64).
> > > 
> > > Then I've tested wth a more recent version of the driver v4.18-rc7 but
> > > behavior looks the same.
> > > 
> > > Thanks for you reply.
> > > 
> > > Camille Bordignon
> > > ___
> > > Intel-wired-lan 

[PATCH net-next] net: dsa: mv88e6xxx: missing unlock on error path

2018-08-14 Thread Dan Carpenter
We added a new error path, but we need to drop the lock before we return.

Fixes: 2d2e1dd29962 ("net: dsa: mv88e6xxx: Cache the port cmode")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 17752316ab10..8da3d39e3218 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2408,7 +2408,7 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (chip->info->ops->port_get_cmode) {
err = chip->info->ops->port_get_cmode(chip, i, );
if (err)
-   return err;
+   goto unlock;
 
chip->ports[i].cmode = cmode;
}


[PATCH net-next] net: dsa: mv88e6xxx: bitwise vs logical bug

2018-08-14 Thread Dan Carpenter
We are trying to test if these flags are set but there are some && vs &
typos.

Fixes: efd1ba6af93f ("net: dsa: mv88e6xxx: Add SERDES phydev_mac_change up for 
6390")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c 
b/drivers/net/dsa/mv88e6xxx/serdes.c
index f007d109b385..e82983975754 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -502,8 +502,8 @@ static irqreturn_t mv88e6390_serdes_thread_fn(int irq, void 
*dev_id)
err = mv88e6390_serdes_irq_status_sgmii(chip, lane, );
if (err)
goto out;
-   if (status && (MV88E6390_SGMII_INT_LINK_DOWN ||
-  MV88E6390_SGMII_INT_LINK_UP)) {
+   if (status & (MV88E6390_SGMII_INT_LINK_DOWN |
+ MV88E6390_SGMII_INT_LINK_UP)) {
ret = IRQ_HANDLED;
mv88e6390_serdes_irq_link_sgmii(chip, port->port, lane);
}


Re: [PATCH net 2/2] net/mlx5e: Cleanup of dcbnl related fields

2018-08-14 Thread Yuval Shaia
On Wed, Aug 08, 2018 at 03:48:08PM -0700, Saeed Mahameed wrote:
> From: Huy Nguyen 
> 
> Remove unused netdev_registered_init/remove in en.h
> Return ENOSUPPORT if the check MLX5_DSCP_SUPPORTED fails.

s/ENOSUPPORT/EOPNOTSUPP
(noted by Haakon)

> Remove extra white space
> 
> Fixes: 2a5e7a1344f4 ("net/mlx5e: Add dcbnl dscp to priority support")
> Signed-off-by: Huy Nguyen 
> Cc: Yuval Shaia 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 --
>  .../ethernet/mellanox/mlx5/core/en_dcbnl.c| 30 +++
>  2 files changed, 11 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index eb9eb7aa953a..405236cf0b04 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -858,8 +858,6 @@ struct mlx5e_profile {
>   mlx5e_fp_handle_rx_cqe handle_rx_cqe;
>   mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe;
>   } rx_handlers;
> - void(*netdev_registered_init)(struct mlx5e_priv *priv);
> - void(*netdev_registered_remove)(struct mlx5e_priv *priv);
>   int max_tc;
>  };
>  
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
> index e33afa8d2417..722998d68564 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
> @@ -443,16 +443,12 @@ static int mlx5e_dcbnl_ieee_setapp(struct net_device 
> *dev, struct dcb_app *app)
>   bool is_new;
>   int err;
>  
> - if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP)
> - return -EINVAL;
> -
> - if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager))
> - return -EINVAL;
> -
> - if (!MLX5_DSCP_SUPPORTED(priv->mdev))
> - return -EINVAL;
> + if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager) ||
> + !MLX5_DSCP_SUPPORTED(priv->mdev))
> + return -EOPNOTSUPP;
>  
> - if (app->protocol >= MLX5E_MAX_DSCP)
> + if ((app->selector != IEEE_8021QAZ_APP_SEL_DSCP) ||
> + (app->protocol >= MLX5E_MAX_DSCP))
>   return -EINVAL;
>  
>   /* Save the old entry info */
> @@ -500,16 +496,12 @@ static int mlx5e_dcbnl_ieee_delapp(struct net_device 
> *dev, struct dcb_app *app)
>   struct mlx5e_priv *priv = netdev_priv(dev);
>   int err;
>  
> - if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP)
> - return -EINVAL;
> -
> - if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager))
> - return -EINVAL;
> -
> - if (!MLX5_DSCP_SUPPORTED(priv->mdev))
> - return -EINVAL;
> + if  (!MLX5_CAP_GEN(priv->mdev, vport_group_manager) ||
> +  !MLX5_DSCP_SUPPORTED(priv->mdev))
> + return -EOPNOTSUPP;
>  
> - if (app->protocol >= MLX5E_MAX_DSCP)
> + if ((app->selector != IEEE_8021QAZ_APP_SEL_DSCP) ||
> + (app->protocol >= MLX5E_MAX_DSCP))
>   return -EINVAL;
>  
>   /* Skip if no dscp app entry */
> @@ -1146,7 +1138,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv 
> *priv, u8 trust_state)
>  {
>   int err;
>  
> - err =  mlx5_set_trust_state(priv->mdev, trust_state);
> + err = mlx5_set_trust_state(priv->mdev, trust_state);
>   if (err)
>   return err;
>   priv->dcbx_dp.trust_state = trust_state;
> -- 
> 2.17.0
> 


[PATCH net-next][RFC] net/tls: Add support for async decryption of tls records

2018-08-14 Thread Vakul Garg
Incoming TLS records which are directly decrypted into user space
application buffer i.e. records which are decrypted in zero-copy mode
are submitted for async decryption. When the decryption cryptoapi
returns -EINPROGRESS, the next tls record is parsed and then submitted
for decryption. The references to records which has been sent for async
decryption are dropped. This happens in a loop for all the records that
can be decrypted in zero-copy mode. For records for which decryption is
not possible in zero-copy mode, asynchronous decryption is not used and
we wait for decryption crypto api to complete.

For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context. This happens when the user context is done
enqueueing all the crypto requests and is waiting for all the async
operations to finish. Since the splice() operation does not use
zero-copy decryption, async remains disabled for splice().

Signed-off-by: Vakul Garg 
---
 include/net/tls.h |   6 +++
 net/tls/tls_sw.c  | 134 +-
 2 files changed, 129 insertions(+), 11 deletions(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index d5c683e8bb22..cd0a65bd92f9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,6 +124,12 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
+   atomic_t decrypt_pending;
+   bool async_notify;
+};
+
+struct decrypt_req_ctx {
+   struct sock *sk;
 };
 
 struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..e2f0df18b6cf 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,12 +43,50 @@
 
 #define MAX_IV_SIZETLS_CIPHER_AES_GCM_128_IV_SIZE
 
+static void tls_decrypt_done(struct crypto_async_request *req, int err)
+{
+   struct aead_request *aead_req = (struct aead_request *)req;
+   struct decrypt_req_ctx *req_ctx =
+   (struct decrypt_req_ctx *)(aead_req + 1);
+
+   struct scatterlist *sgout = aead_req->dst;
+
+   struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
+   struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+   int pending = atomic_dec_return(>decrypt_pending);
+   struct scatterlist *sg;
+   unsigned int pages;
+
+   /* Propagate if there was an err */
+   if (err) {
+   ctx->async_wait.err = err;
+   tls_err_abort(req_ctx->sk, err);
+   }
+
+   /* Release the skb, pages and memory allocated for crypto req */
+   kfree_skb(req->data);
+
+   /* Skip the first S/G entry as it points to AAD */
+   for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
+   if (!sg)
+   break;
+   put_page(sg_page(sg));
+   }
+
+   kfree(aead_req);
+
+   if (!pending && READ_ONCE(ctx->async_notify))
+   complete(>async_wait.completion);
+}
+
 static int tls_do_decryption(struct sock *sk,
+struct sk_buff *skb,
 struct scatterlist *sgin,
 struct scatterlist *sgout,
 char *iv_recv,
 size_t data_len,
-struct aead_request *aead_req)
+struct aead_request *aead_req,
+bool async)
 {
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -59,10 +97,34 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_crypt(aead_req, sgin, sgout,
   data_len + tls_ctx->rx.tag_size,
   (u8 *)iv_recv);
-   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, >async_wait);
 
-   ret = crypto_wait_req(crypto_aead_decrypt(aead_req), >async_wait);
+   if (async) {
+   struct decrypt_req_ctx *req_ctx;
+
+   req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
+   req_ctx->sk = sk;
+
+   aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ tls_decrypt_done, skb);
+   atomic_inc(>decrypt_pending);
+   } else {
+   aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+   }
+
+   ret = crypto_aead_decrypt(aead_req);
+   if (ret == -EINPROGRESS) {
+   if (async)
+   return ret;
+
+   ret = crypto_wait_req(ret, >async_wait);
+   }
+
+   if (async)
+   

[PATCH] 9p/xen: fix check for xenbus_read error in front_probe

2018-08-14 Thread Dominique Martinet
From: Dominique Martinet 

If the xen bus exists but does not expose the proper interface, it is
possible to get a non-zero length but still some error, leading to
strcmp failing trying to load invalid memory addresses e.g.
fffe.

There is then no need to check length when there is no error, as the
xenbus driver guarantees that the string is nul-terminated.

Signed-off-by: Dominique Martinet 
Cc: Stefano Stabellini 
Cc: Eric Van Hensbergen 
Cc: Latchesar Ionkov 
---

This is a trivial bug I stumbled on when setting up xen with p9fs and
running the VM in pvm: it had enough in the bus to trigger the probe
but then there was no version and it tried to return ENOENT but len
was set to the lower-level message size.

 net/9p/trans_xen.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 1a5b38892eb4..f76beadddfc3 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -391,9 +391,9 @@ static int xen_9pfs_front_probe(struct xenbus_device *dev,
unsigned int max_rings, max_ring_order, len = 0;
 
versions = xenbus_read(XBT_NIL, dev->otherend, "versions", );
-   if (!len)
-   return -EINVAL;
+   if (IS_ERR(versions))
+   return PTR_ERR(versions);
if (strcmp(versions, "1")) {
kfree(versions);
return -EINVAL;
}
-- 
2.17.1



Re: TJA1100 100Base-T1 PHY features via ethtool?

2018-08-14 Thread Michael Grzeschik
On Mon, Aug 13, 2018 at 12:53:35PM -0700, Florian Fainelli wrote:
> On 08/13/2018 12:35 PM, Michael Grzeschik wrote:
> > Hi David,
> > 
> > I use a special 100Base-T1 phy (NXP TJA1100 [1]) that has some features
> > like:
> > 
> > - enabling/disabling test modes
> > - fault detection
> > - switching managed/autonomous mode
> > - signal quality indication
> > - ...
> > 
> > I already implemented the support of the features with the
> > ethtool --get/set-phy-tunables features by adding ethtool_phy_tunables:
> > 
> > ETHTOOL_PHY_TEST_MODE
> > ETHTOOL_PHY_FAULT_DETECTION
> > ETHTOOL_PHY_MANAGED_MODE
> > ETHTOOL_PHY_SIGNAL_QUALITY
> > 
> > Before posting my series I wanted to ensure that this is the preferred
> > interface for those options.
> 
> The tunable interface is there, but is very limited. A few months ago, I
> had started proposing an interface to support PHY test modes [1] (the
> standard IEEE 802.3 defined ones) but a lot of it should now be migrated
> to the work that Michal is doing on the conversion of ethtool to netlink
> [2].
> 
> [1]: https://lkml.org/lkml/2018/4/27/1172
> [2]: https://www.spinics.net/lists/netdev/msg516233.html

The ethtool userspace tool is somehow odd to program on,
so switching to netlink is definitively a great idea!

> > 
> > I found a series from 2016 [2] that implements the userspace part for
> > the loopback feature of some phys, that did not get mainline so far
> > which makes me wonder if ethtool is still the way to go.
> 
> ethtool is being converted to netlink, and that will be a much more
> flexible interface to work with since it is basically easily extensible
> (unlike the current ethtool + ioctl approach).

Yes, netlink sounds absolutely more useful here.

> Back when the patches were proposed, we just had mild disagreement on
> the loopback terminology being used, and then nothing happened.

Right, thanks for clarification!

> 
> > 
> > [1] https://www.nxp.com/docs/en/data-sheet/TJA1100.pdf
> > [2] https://www.spinics.net/lists/netdev/msg406614.html
> > 
> > Thanks,
> > Michael
> > 
> 
> 
> -- 
> Florian
> 

-- 
Pengutronix e.K.   | |
Industrial Linux Solutions | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature