Re: [PATCHv4 net-next 2/2] openvswitch: add erspan version I and II support

2018-01-20 Thread Pravin Shelar
On Thu, Jan 18, 2018 at 2:04 PM, William Tu  wrote:
> The patch adds support for openvswitch to configure erspan
> v1 and v2.  The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr is added
> to uapi as a nested attribute to support all ERSPAN v1 and v2's
> fields.  Note taht Previous commit "openvswitch: Add erspan tunnel
> support." was reverted since it does not design properly.
>
> Signed-off-by: William Tu 
> ---
>  include/uapi/linux/openvswitch.h |  11 +++
>  net/openvswitch/flow_netlink.c   | 154 
> ++-
>  2 files changed, 164 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index dcfab5e3b55c..f1f98fd703fe 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -273,6 +273,16 @@ enum {
>
>  #define OVS_VXLAN_EXT_MAX (__OVS_VXLAN_EXT_MAX - 1)
>
> +enum {
> +   OVS_ERSPAN_OPT_UNSPEC,
> +   OVS_ERSPAN_OPT_IDX, /* u32 index */
> +   OVS_ERSPAN_OPT_VER, /* u8 version number */
> +   OVS_ERSPAN_OPT_DIR, /* u8 direction */
> +   OVS_ERSPAN_OPT_HWID,/* u8 hardware ID */
> +   __OVS_ERSPAN_OPT_MAX,
> +};
> +
> +#define OVS_ERSPAN_OPT_MAX (__OVS_ERSPAN_OPT_MAX - 1)
>
>  /* OVS_VPORT_ATTR_OPTIONS attributes for tunnels.
>   */
> @@ -363,6 +373,7 @@ enum ovs_tunnel_key_attr {
> OVS_TUNNEL_KEY_ATTR_IPV6_SRC,   /* struct in6_addr src IPv6 
> address. */
> OVS_TUNNEL_KEY_ATTR_IPV6_DST,   /* struct in6_addr dst IPv6 
> address. */
> OVS_TUNNEL_KEY_ATTR_PAD,
> +   OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS,/* Nested OVS_ERSPAN_OPT_* */
> __OVS_TUNNEL_KEY_ATTR_MAX
>  };
>
Rather than passing individual members of erspan_metadata, can you
just pass whole structure between kernel and userspace. So that ovs
kernel module can just handle all erspan options as one binary blob
and does not need to interpret it.


Re: [Patch net-next 1/3] net: introduce helper dev_change_tx_queue_len()

2018-01-20 Thread John Fastabend
On 01/19/2018 03:09 PM, Cong Wang wrote:
> This patch promotes the local change_tx_queue_len() to a core
> helper function, dev_change_tx_queue_len(), so that rtnetlink
> and net-sysfs could share the code. This also prepares for the
> following patch.
> 
> Note, the -EFAULT in the original code doesn't make sense,
> we should propagate the errno from notifiers.
> 
> Cc: John Fastabend 
> Signed-off-by: Cong Wang 
> ---


[...]

>  static ssize_t tx_queue_len_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> @@ -355,7 +332,7 @@ static ssize_t tx_queue_len_store(struct device *dev,
>   if (!capable(CAP_NET_ADMIN))
>   return -EPERM;
>  
> - return netdev_store(dev, attr, buf, len, change_tx_queue_len);
> + return netdev_store(dev, attr, buf, len, dev_change_tx_queue_len);
>  }

Is this protected by RTNL lock? If not what happens if this and do_setlink
both try to change tx queue length at the same time? Seems we could get
a race with multiple dev_deactivate/dev_activate sequences in-flight in
the following 2/3 patch.

Thanks,
John

>  NETDEVICE_SHOW_RW(tx_queue_len, fmt_dec);
>  


Re: [PATCH net] net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

2018-01-20 Thread Saeed Mahameed


On 01/20/2018 12:10 AM, Sergei Shtylyov wrote:
> Hello!
> 
> On 1/19/2018 11:10 PM, Saeed Mahameed wrote:
> 
>> From: Talat Batheesh 
>>
>> Helmut reported a bug about devision by zero while
> 
>    Division.
> 
>> running traffic and doing physical cable pull test.
>>
>> When the cable unplugged the ppms become zero, so when
>> dividing the current ppms by the previous ppms in the
> 
>    You got it right the on the 2nd time. :-)
> 

BTW, you missed one ;-)
Fixed both in V2.

Thanks !


[PATCH net V2] net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

2018-01-20 Thread Saeed Mahameed
From: Talat Batheesh 

Helmut reported a bug about division by zero while
running traffic and doing physical cable pull test.

When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is division by zero.

This patch prevent this division for both ppms and epms.

Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer 
Signed-off-by: Talat Batheesh 
Signed-off-by: Saeed Mahameed 
---
v2:
- Fix spelling in commit message.

This patch is the equivalent of:
("net/dim: Fix fixpoint divide exception in net_dim_stats_compare")
which was already submitted to net-next. Since mlx5 irq moderation code
got moved in net-next, this patch should only go to net and skip net-next.

for -stable: 4.12.y and later.

 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index e401d9d245f3..b69a705fd787 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -201,9 +201,15 @@ static int mlx5e_am_stats_compare(struct mlx5e_rx_am_stats 
*curr,
return (curr->bpms > prev->bpms) ? MLX5E_AM_STATS_BETTER :
   MLX5E_AM_STATS_WORSE;
 
+   if (!prev->ppms)
+   return curr->ppms ? MLX5E_AM_STATS_BETTER :
+   MLX5E_AM_STATS_SAME;
+
if (IS_SIGNIFICANT_DIFF(curr->ppms, prev->ppms))
return (curr->ppms > prev->ppms) ? MLX5E_AM_STATS_BETTER :
   MLX5E_AM_STATS_WORSE;
+   if (!prev->epms)
+   return MLX5E_AM_STATS_SAME;
 
if (IS_SIGNIFICANT_DIFF(curr->epms, prev->epms))
return (curr->epms < prev->epms) ? MLX5E_AM_STATS_BETTER :
-- 
2.13.0



Re: pull-request: bpf-next 2018-01-19

2018-01-20 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 19 Jan 2018 21:12:29 -0800

> The following pull-request contains BPF updates for your *net-next* tree.
> 
> The main changes are:
> 
> 1) bpf array map HW offload, from Jakub.
> 
> 2) support for bpf_get_next_key() for LPM map, from Yonghong.
> 
> 3) test_verifier now runs loaded programs, from Alexei.
> 
> 4) xdp cpumap monitoring, from Jesper.
> 
> 5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.
> 
> 6) user space can now retrieve HW JITed program, from Jiong.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> 
> Note there is a minor conflict between Russell's arm32 JIT fixes
> and removal of bpf_jit_enable variable by Daniel which should
> be resolved by keeping Russell's comment and removing that variable.

Pulled, thanks for the guidance on the merge conflict.

I hope I interpreted it correctly :-)


Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-20 Thread Andrew Lunn
> I'm not familiar with MDIO bus but an alternative to GeneriSerialBus
> would be to follow what SDIO is doing, e.g have the PHY devices listed
> below the MDIO controller and use _ADR to describe their "address" on
> that bus. You can see how _ADR applies to SDIO bus from ACPI spec.

Hi Mika

SDIO is not a serial bus, well it can be in its simplest form, but
high speed implementations have 4 data lines. So i can understand them
not using GenericSerialBus.

MDIO is a serial bus, very similar to SPI, I2C, and UART.

> If you go with the SDIO way then each PHY is described as normal ACPI
> device and you can use ACPI _HID/_CID to match the device to the
> corresponding driver.

Just some background here. If you have a plain PHY as a device on an
MDIO bus, you don't need to match it to a driver within ACPI.
Registers 2 and 3 contain a vendor and product ID. That is what it
used to match the device to the driver.

What you might need to know is the protocol to talk on the bus. Most
devices use clause 22 protocol. A few devices are clause 45. 22 is the
default in Linux, and you need to indicate if 45 should be used. You
can also indicate 22.

It gets more complex when the device on the bus is not a PHY. It is a
generic bus, you can connect anything to it. Ethernet switches can be
on the bus. They generally cannot be identified using registers 2 and
3. So you do need to match the device to the driver. Most do have ID
registers, so the driver can work out what specific device is on the
bus. However, Marvell moved the ID registers on there newer generation
of devices, so we need to give the driver a hint where to look. So in
device tree, we have two different compatible string.

Broadcom really do use it as a generic bus. They have their USB PHYs
and PCIE PHYs on an MDIO bus. In DT, they have compatible strings to
match the device to the driver, as normal.

We need to ensure what we define for ACPI has the same level of
flexibility.

Andrew


Memory corruption with r8169 across several device revisions and kernels

2018-01-20 Thread Oliver Freyermuth
Dear network experts,

please redirect me if this is the wrong place. 

I have reproduced the following issue across three devices with different 
Realtek card revisions
and different Distros (Debian 9, Ubuntu 17.04, Gentoo with kernels 4.9, 4.11.3, 
4.14.12). 

It's safely reproducible with at least:
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI 
Express Gigabit Ethernet Controller (rev 06)
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI 
Express Gigabit Ethernet Controller (rev 12)

Memory corruption at physical addresses either in low memory or kernel memory 
or user space memory occurs when reading from:
/proc/self/net/dev
The physical memory addresses which get corrupted change with each boot of the 
system,
and also appear to change with each reload of the kernel module (I have only 
one data point on that). 

To reproduce, execute:
$ while true; do cat /proc/self/net/dev > /dev/null; done
and in parallel, scan memory for corruption, e.g.
$ memtester 15G
Of course, one should try to map all system memory here. 
It usually shows up in the first loop iteration if the "while" loop is executed 
in parallel. 

Depending on the actual memory being corrupted, it may also become visible via
Corrupted low memory at 8800b000 (b000 phys) = 0016e109
in klog, if the low memory corruption scanning is activated. 

The values found in overwritten memory match those contained in 
/proc/self/net/dev for the realtek ethernet device. 

Unloading r8169 or disabling the card in bios "fixes" this issue. 

I have already ended up with two corrupted btrfs filesystems due to this issue, 
and many segfaults in userspace. 

Please include me directly in replies, I may not stay subscribed to the list. 

Cheers,
Oliver


Re: net merged into net-next

2018-01-20 Thread Cong Wang
On Fri, Jan 19, 2018 at 8:02 PM, David Miller  wrote:
>
> Cong, please check my conflict resolution of drivers/net/tun.c, thank
> you.

It looks good to me except I am not sure about the xdp_rxq_info_unreg()
inside tun_cleanup_tx_ring().


backport of "netfilter: nf_tables: fix mismatch in big-endian system" to stable

2018-01-20 Thread Hauke Mehrtens
nftables does not work correctly on big endian systems without this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10596608c4d62cb8c1c2b806debcbd32fe657e71

With nftables v0.8.1 this rule does not work:
nft add rule ip foo bar ct state new tcp dport 22 accept
I can set it successfully, but it will just not match to port 22.
The examples from this tutorial will not work with kernel 4.9 + nftables
v0.8.1:
https://people.netfilter.org/pablo/nft-tutorial.pdf

This nftables commit broke nftables for me on my MIPS BE system:
https://git.netfilter.org/nftables/commit/?id=0825c57d571bb7121e7048e198b9b023f7e7f358
but the root cause is in the kernel.


I would like to get commit 10596608c4d62cb8c1c2b806debcbd32fe657e71
("netfilter: nf_tables: fix mismatch in big-endian system") backported
to kernel 4.9 or even earlier to make nftable work correctly on big
endian systems.


Hauke


Re: [net-next: PATCH 0/8] Armada 7k/8k PP2 ACPI support

2018-01-20 Thread Mika Westerberg
On Fri, Jan 19, 2018 at 07:07:29PM +0100, Marcin Wojtas wrote:
> Hi Mika,

Hi,

> 2018-01-18 14:00 GMT+01:00 Andrew Lunn :
> >> I CC'ed Mika since he is more familiar with handling these bits of ACPI
> >> specs - I wonder whether this is a problem that cropped up on x86
> >> systems too.
> >
> > Hi Lorenzo
> >
> > There is nothing about MDIO, PHYs, Ethernet switches, etc in version
> > 6.2 of the spec. If x86 has this, it must be after 6.2 was released.
> > I would not be too surprised if x86 has none of this. If you look at
> > the typical drives used on x86, i210, e1000e, ixgb, r8169, etc. They
> > are all PCI devices, and hide all this.
> >
> >> I do not think there is one and only answer but there must be a single
> >> set of bindings and if the ACPI specs already cater for some of them
> >> we have to reuse them.
> >
> > Agreed. Due diligence so far suggests there is nothing already
> > defined. But im a newbie to ACPI, so could be looking in the wrong
> > place. I really hope there is somebody like Rob Herring, the DT
> > maintainer, who keeps an eye on all ACPI talk and would tell us if we
> > are heading off in the wrong direction.
> >
> 
> My initial approach with MDIO bus with PHYs as child nodes was super
> easy to describe and handle in Linux - please refer to:
> - typical representation of mdio bus with the phys -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/net/mdio.txt?h=v4.15-rc8
> - my patches in the initial series
> However I guess it would be more proper to use the
> GenericSerialBus-based description, as advised in ACPI Spec. The
> question is, whether we should define new type of a bus or not
> (MdioSerialBus, similar to e.g. I2cSerialBus).

I'm not familiar with MDIO bus but an alternative to GeneriSerialBus
would be to follow what SDIO is doing, e.g have the PHY devices listed
below the MDIO controller and use _ADR to describe their "address" on
that bus. You can see how _ADR applies to SDIO bus from ACPI spec.

Of course ACPI spec should then be updated accordingly to describe what
_ADR means for devices on MDIO bus.

> Since I have a code that can be tested and easily modified to use
> different ACPI approaches with real platform MDIO controller
> (mvmdio.c) and NIC (mvpp2.c), in coming weeks I may be able to find
> some time to prepare a proof of concept based on GenericSerialBus.
> Please expect some RFC patches hopefully right after the coming merge
> window is closed.
> 
> Of course, if I come up on some ACPI - specific implementation
> questions, I won't hesitate to ask in this thred. I will also
> appreciate any hints. For now my two main concerns are:
> - The PHY address on the mdio bus - should it be put into _CRS ->
> GenericSerialBus() field or separate _ADR? I'd lean towards first
> option.
> - The PHY type - in Linux it's resolved basing on two generic
> compatible strings
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/net/phy.txt?h=v4.15-rc8).
> I'd put it as a sort of ID into GenericSerialBus(). If you agree - any
> specific filed that you would try to use?

If you go with the SDIO way then each PHY is described as normal ACPI
device and you can use ACPI _HID/_CID to match the device to the
corresponding driver.

> Do I understand correctly that the MDIO controller node should
> comprise OperationRegion() definition of the GenericSerialBus?

I don't think OpRegions are useful in this case because they are mainly
used to allow BIOS AML code to access the hardware through OS driver.


Re: ipv6_addrconf: WARNING about suspicious RCU usage

2018-01-20 Thread Heiner Kallweit
Am 20.01.2018 um 20:19 schrieb Ido Schimmel:
> On Sat, Jan 20, 2018 at 10:49:03AM -0800, Eric Dumazet wrote:
>> On Sat, 2018-01-20 at 15:37 +0200, Ido Schimmel wrote:
>>> On Sat, Jan 20, 2018 at 12:57:01PM +0100, Heiner Kallweit wrote:
 Since some time (didn't bisect it yet) I get the following warning.
 Is it a known issue?
>>>
>>> [...]
>>>
 [86220.126999] BUG: sleeping function called from invalid context at 
 mm/slab.h:420
 [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: 
 kworker/0:2
 [86220.127082] 4 locks held by kworker/0:2/1003:
 [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
 [] process_one_work+0x1de/0x680
 [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: 
 [] process_one_work+0x1de/0x680
 [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [] 
 rtnl_lock+0x12/0x20
 [86220.127300]  #3:  (rcu_read_lock_bh){}, at: [] 
 addrconf_verify_rtnl+0x1e/0x510 [ipv6]
 [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
 4.15.0-rc7-next-20180110+ #7
 [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
 B246P105 06/01/2015
 [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
 [86220.127568] Call Trace:
 [86220.127591]  dump_stack+0x70/0x9e
 [86220.127616]  ___might_sleep+0x14d/0x240
 [86220.127644]  __might_sleep+0x45/0x80
 [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
 [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
 [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
 [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
 [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
 [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
 [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
 [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
 [86220.128032]  process_one_work+0x258/0x680
 [86220.128063]  worker_thread+0x35/0x3f0
 [86220.128091]  kthread+0x124/0x140
 [86220.128117]  ? process_one_work+0x680/0x680
 [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
 [86220.128180]  ? umh_complete+0x40/0x40
 [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
 [86220.128243]  ret_from_fork+0x4b/0x60
>>>
>>> Can you please try attached patch (untested)?
>>
>>
>>
>> I would also/instead break rcu section.
> 
> Thanks Eric, this should work. We can continue to block in
> ipv6_create_tempaddr().
> 
> Heiner, can you try Eric's patch instead?
> 
Sure, thanks for the quick response to both of you.

>>
>> Holding RCU (and BH) for whole hash traversal is a recipe for disaster,
>> if we have thousands of IPv6 addresses.
>>
>> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
>> index 
>> ab99cb641b7cccdda0ad4ae553c09274d7dbc047..adda73466ae1dd0f3b700b3db5fbf3065e4d3f7f
>>  100644
>> --- a/net/ipv6/addrconf.c
>> +++ b/net/ipv6/addrconf.c
>> @@ -4356,9 +4356,11 @@ static void addrconf_verify_rtnl(void)
>>  spin_lock(>lock);
>>  ifpub->regen_count = 0;
>>  spin_unlock(>lock);
>> +rcu_read_unlock_bh();
>>  ipv6_create_tempaddr(ifpub, 
>> ifp, true);
>>  in6_ifa_put(ifpub);
>>  in6_ifa_put(ifp);
>> +rcu_read_lock_bh();
>>  goto restart;
>>  }
>>  } else if (time_before(ifp->tstamp + 
>> ifp->prefered_lft * HZ - regen_advance * HZ, next))
>>
>>
>>
>>
> 



Re: ipv6_addrconf: WARNING about suspicious RCU usage

2018-01-20 Thread Ido Schimmel
On Sat, Jan 20, 2018 at 10:49:03AM -0800, Eric Dumazet wrote:
> On Sat, 2018-01-20 at 15:37 +0200, Ido Schimmel wrote:
> > On Sat, Jan 20, 2018 at 12:57:01PM +0100, Heiner Kallweit wrote:
> > > Since some time (didn't bisect it yet) I get the following warning.
> > > Is it a known issue?
> > 
> > [...]
> > 
> > > [86220.126999] BUG: sleeping function called from invalid context at 
> > > mm/slab.h:420
> > > [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: 
> > > kworker/0:2
> > > [86220.127082] 4 locks held by kworker/0:2/1003:
> > > [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
> > > [] process_one_work+0x1de/0x680
> > > [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: 
> > > [] process_one_work+0x1de/0x680
> > > [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [] 
> > > rtnl_lock+0x12/0x20
> > > [86220.127300]  #3:  (rcu_read_lock_bh){}, at: [] 
> > > addrconf_verify_rtnl+0x1e/0x510 [ipv6]
> > > [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
> > > 4.15.0-rc7-next-20180110+ #7
> > > [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
> > > B246P105 06/01/2015
> > > [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
> > > [86220.127568] Call Trace:
> > > [86220.127591]  dump_stack+0x70/0x9e
> > > [86220.127616]  ___might_sleep+0x14d/0x240
> > > [86220.127644]  __might_sleep+0x45/0x80
> > > [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
> > > [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
> > > [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
> > > [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
> > > [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
> > > [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
> > > [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
> > > [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
> > > [86220.128032]  process_one_work+0x258/0x680
> > > [86220.128063]  worker_thread+0x35/0x3f0
> > > [86220.128091]  kthread+0x124/0x140
> > > [86220.128117]  ? process_one_work+0x680/0x680
> > > [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
> > > [86220.128180]  ? umh_complete+0x40/0x40
> > > [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
> > > [86220.128243]  ret_from_fork+0x4b/0x60
> > 
> > Can you please try attached patch (untested)?
> 
> 
> 
> I would also/instead break rcu section.

Thanks Eric, this should work. We can continue to block in
ipv6_create_tempaddr().

Heiner, can you try Eric's patch instead?

> 
> Holding RCU (and BH) for whole hash traversal is a recipe for disaster,
> if we have thousands of IPv6 addresses.
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 
> ab99cb641b7cccdda0ad4ae553c09274d7dbc047..adda73466ae1dd0f3b700b3db5fbf3065e4d3f7f
>  100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4356,9 +4356,11 @@ static void addrconf_verify_rtnl(void)
>   spin_lock(>lock);
>   ifpub->regen_count = 0;
>   spin_unlock(>lock);
> + rcu_read_unlock_bh();
>   ipv6_create_tempaddr(ifpub, 
> ifp, true);
>   in6_ifa_put(ifpub);
>   in6_ifa_put(ifp);
> + rcu_read_lock_bh();
>   goto restart;
>   }
>   } else if (time_before(ifp->tstamp + 
> ifp->prefered_lft * HZ - regen_advance * HZ, next))
> 
> 
> 
> 


Re: ipv6_addrconf: WARNING about suspicious RCU usage

2018-01-20 Thread Eric Dumazet
On Sat, 2018-01-20 at 15:37 +0200, Ido Schimmel wrote:
> On Sat, Jan 20, 2018 at 12:57:01PM +0100, Heiner Kallweit wrote:
> > Since some time (didn't bisect it yet) I get the following warning.
> > Is it a known issue?
> 
> [...]
> 
> > [86220.126999] BUG: sleeping function called from invalid context at 
> > mm/slab.h:420
> > [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: 
> > kworker/0:2
> > [86220.127082] 4 locks held by kworker/0:2/1003:
> > [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
> > [] process_one_work+0x1de/0x680
> > [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: [] 
> > process_one_work+0x1de/0x680
> > [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [] 
> > rtnl_lock+0x12/0x20
> > [86220.127300]  #3:  (rcu_read_lock_bh){}, at: [] 
> > addrconf_verify_rtnl+0x1e/0x510 [ipv6]
> > [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
> > 4.15.0-rc7-next-20180110+ #7
> > [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
> > B246P105 06/01/2015
> > [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
> > [86220.127568] Call Trace:
> > [86220.127591]  dump_stack+0x70/0x9e
> > [86220.127616]  ___might_sleep+0x14d/0x240
> > [86220.127644]  __might_sleep+0x45/0x80
> > [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
> > [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
> > [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
> > [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
> > [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
> > [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
> > [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
> > [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
> > [86220.128032]  process_one_work+0x258/0x680
> > [86220.128063]  worker_thread+0x35/0x3f0
> > [86220.128091]  kthread+0x124/0x140
> > [86220.128117]  ? process_one_work+0x680/0x680
> > [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
> > [86220.128180]  ? umh_complete+0x40/0x40
> > [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
> > [86220.128243]  ret_from_fork+0x4b/0x60
> 
> Can you please try attached patch (untested)?



I would also/instead break rcu section.

Holding RCU (and BH) for whole hash traversal is a recipe for disaster,
if we have thousands of IPv6 addresses.

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 
ab99cb641b7cccdda0ad4ae553c09274d7dbc047..adda73466ae1dd0f3b700b3db5fbf3065e4d3f7f
 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4356,9 +4356,11 @@ static void addrconf_verify_rtnl(void)
    spin_lock(>lock);
    ifpub->regen_count = 0;
    spin_unlock(>lock);
+   rcu_read_unlock_bh();
    ipv6_create_tempaddr(ifpub, 
ifp, true);
    in6_ifa_put(ifpub);
    in6_ifa_put(ifp);
+   rcu_read_lock_bh();
    goto restart;
    }
    } else if (time_before(ifp->tstamp + 
ifp->prefered_lft * HZ - regen_advance * HZ, next))






Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC.

2018-01-20 Thread Alexander Duyck
On Fri, Jan 19, 2018 at 2:55 PM, Benjamin Poirier
 wrote:
> On 2018/01/20 07:45, Benjamin Poirier wrote:
> [...]
>> >
>> > I'm of the mind that we need to cut down on the code thrash.  This
>> > driver is supposed to have been in a "maintenance" mode for the last
>> > year or so as there aren't being any new parts added is my
>> > understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e:
>> > Do not read ICR in Other interrupt", v4.5-rc1) was submitted or
>> > accepted in the first place. I don't see any notes about it fixing any
>> > bug or addressing any issue and it seems like that is the start of all
>> > the issues we have been having recently with RXO triggering more
>> > interrupts, various link issues, and this most recent VMware issue.
>>
>> I'm sorry to say but you're the one who suggested that change:
>>
>> http://lkml.iu.edu/hypermail/linux/kernel/1510.3/03528.html
>>
>> On 2015/10/28 23:08, Alexander Duyck wrote:
>> > On 10/22/2015 05:32 PM, Benjamin Poirier wrote:
>> [...]
>> >
>> > I would argue your first patch probably didn't go far enough to remove dead
>> > code.  Specifically you should only ever get into this function if LSC is
>> > set.  There are no other causes that should trigger this.  As such you 
>> > could
>> > probably remove the ICR read, and instead replace it with an ICR write of
>> > the LSC bit since OTHER is already cleared via EIAC.
>> >
>
> ... The assumption that "There are no other causes that should trigger
> this." turned out to be wrong and that caused the RXO problems. Clearing
> OTHER via EIAC is causing the problems with vmware now. I don't think
> you foresaw those problems back in 2015 and neither did I.

Well that explains why I felt like I was explaining things to a
younger version of myself. I was a bit more relaxed in terms of being
willing to make arbitrary changes a few years ago. I tend to be a bit
more conservative now, at least as far as having justifications as to
why I want to do things. With any change you end up taking on risk,
and so usually a patch has a justification as to why you are making
the change.

Unfortunately at the time I didn't have all the information and based
my suggestion on a bad assumption. I would guess at the time I was
thinking of doing general code cleanup. Other drivers such as igb work
this way, but it led us down the path we are on now where we are
having to make one fix after another. It is leading in the opposite
direction of maintainability as this is all becoming more complex.
Suggesting this was a bad decision on my part at the time. I'm only
human, I make mistakes.. :-)

With further review of the code I am seeing various other issues that
could still pop up as I am not certain we should even have the "other"
interrupt messing with the NAPI polling or packet accounting logic at
all. The question I would have at this point is if we revert
16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1)
and all the related fixes for it, what do we end up with? It seems
like the code is slowly heading back in the direction of where it was
originally anyway since there have been a number of partial reverts.
I'm wondering what would happen if we were to just short-cut that and
audit the patches involved to see what we really need and don't.

Your patch as proposed is essentially another step in that direction.
I'm thinking we may want to drop my currently proposed fix for now and
instead look at going through and figure out what changes after that
first one are still really needed. It doesn't look like my fix will
make it for 4.15 anyway so we might as well focus on making certain to
have things as solid as possible by the time 4.16-rc1 rolls around.

Thanks.

- Alex


Re: ipv6_addrconf: WARNING about suspicious RCU usage

2018-01-20 Thread Ido Schimmel
On Sat, Jan 20, 2018 at 12:57:01PM +0100, Heiner Kallweit wrote:
> Since some time (didn't bisect it yet) I get the following warning.
> Is it a known issue?

[...]

> [86220.126999] BUG: sleeping function called from invalid context at 
> mm/slab.h:420
> [86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: 
> kworker/0:2
> [86220.127082] 4 locks held by kworker/0:2/1003:
> [86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
> [] process_one_work+0x1de/0x680
> [86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: [] 
> process_one_work+0x1de/0x680
> [86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [] 
> rtnl_lock+0x12/0x20
> [86220.127300]  #3:  (rcu_read_lock_bh){}, at: [] 
> addrconf_verify_rtnl+0x1e/0x510 [ipv6]
> [86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
> 4.15.0-rc7-next-20180110+ #7
> [86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
> B246P105 06/01/2015
> [86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
> [86220.127568] Call Trace:
> [86220.127591]  dump_stack+0x70/0x9e
> [86220.127616]  ___might_sleep+0x14d/0x240
> [86220.127644]  __might_sleep+0x45/0x80
> [86220.127672]  kmem_cache_alloc_trace+0x53/0x250
> [86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
> [86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
> [86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
> [86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
> [86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
> [86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
> [86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
> [86220.128032]  process_one_work+0x258/0x680
> [86220.128063]  worker_thread+0x35/0x3f0
> [86220.128091]  kthread+0x124/0x140
> [86220.128117]  ? process_one_work+0x680/0x680
> [86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
> [86220.128180]  ? umh_complete+0x40/0x40
> [86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
> [86220.128243]  ret_from_fork+0x4b/0x60

Can you please try attached patch (untested)?
>From 70e4bc2a8fe08fb30251f786990a91d3ed2232e6 Mon Sep 17 00:00:00 2001
From: Ido Schimmel 
Date: Sat, 20 Jan 2018 15:29:33 +0200
Subject: [PATCH] ipv6: Do not sleep in RCU read-side critical section

Signed-off-by: Ido Schimmel 
---
 net/ipv6/addrconf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ab99cb641b7c..9ad1365c91ca 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4356,7 +4356,8 @@ static void addrconf_verify_rtnl(void)
spin_lock(>lock);
ifpub->regen_count = 0;
spin_unlock(>lock);
-   ipv6_create_tempaddr(ifpub, 
ifp, true);
+   ipv6_create_tempaddr(ifpub, ifp,
+false);
in6_ifa_put(ifpub);
in6_ifa_put(ifp);
goto restart;
-- 
2.14.3



Re: KASAN: use-after-free Read in fib6_ifup (2)

2018-01-20 Thread Ido Schimmel
On Thu, Jan 18, 2018 at 09:58:01PM -0800, syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on net-next commit
> 564737f981fb4b4b3266901508bb9b90d9d43de8
> 
> So far this crash happened 18 times on mmots, net-next.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.

[...]

> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title

The reproducer configures a default source-specific route which triggers
a bug that is now fixed in net-next, so:

#syz fix: ipv6: don't let tb6_root node share routes with other node

Verified it on my machine using the reproducer attached by the bot.


ipv6_addrconf: WARNING about suspicious RCU usage

2018-01-20 Thread Heiner Kallweit
Since some time (didn't bisect it yet) I get the following warning.
Is it a known issue?

[86220.125562] =
[86220.125586] WARNING: suspicious RCU usage
[86220.125612] 4.15.0-rc7-next-20180110+ #7 Not tainted
[86220.125641] -
[86220.125666] kernel/sched/core.c:6026 Illegal context switch in RCU-bh 
read-side critical section!
[86220.125711]
   other info that might help us debug this:

[86220.125755]
   rcu_scheduler_active = 2, debug_locks = 1
[86220.125792] 4 locks held by kworker/0:2/1003:
[86220.125817]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
[] process_one_work+0x1de/0x680
[86220.125895]  #1:  ((addr_chk_work).work){+.+.}, at: [] 
process_one_work+0x1de/0x680
[86220.125959]  #2:  (rtnl_mutex){+.+.}, at: [] 
rtnl_lock+0x12/0x20
[86220.126017]  #3:  (rcu_read_lock_bh){}, at: [] 
addrconf_verify_rtnl+0x1e/0x510 [ipv6]
[86220.126111]
   stack backtrace:
[86220.126142] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
4.15.0-rc7-next-20180110+ #7
[86220.126185] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
B246P105 06/01/2015
[86220.126250] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
[86220.126288] Call Trace:
[86220.126312]  dump_stack+0x70/0x9e
[86220.126337]  lockdep_rcu_suspicious+0xce/0xf0
[86220.126365]  ___might_sleep+0x1d3/0x240
[86220.126390]  __might_sleep+0x45/0x80
[86220.126416]  kmem_cache_alloc_trace+0x53/0x250
[86220.126458]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.126498]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.126538]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.126580]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.126623]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.126664]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.126708]  addrconf_verify_work+0xe/0x20 [ipv6]
[86220.126738]  process_one_work+0x258/0x680
[86220.126765]  worker_thread+0x35/0x3f0
[86220.126790]  kthread+0x124/0x140
[86220.126813]  ? process_one_work+0x680/0x680
[86220.126839]  ? kthread_create_worker_on_cpu+0x40/0x40
[86220.126869]  ? umh_complete+0x40/0x40
[86220.126893]  ? call_usermodehelper_exec_async+0x12a/0x160
[86220.126926]  ret_from_fork+0x4b/0x60
[86220.126999] BUG: sleeping function called from invalid context at 
mm/slab.h:420
[86220.127041] in_atomic(): 1, irqs_disabled(): 0, pid: 1003, name: kworker/0:2
[86220.127082] 4 locks held by kworker/0:2/1003:
[86220.127107]  #0:  ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: 
[] process_one_work+0x1de/0x680
[86220.127179]  #1:  ((addr_chk_work).work){+.+.}, at: [] 
process_one_work+0x1de/0x680
[86220.127242]  #2:  (rtnl_mutex){+.+.}, at: [] 
rtnl_lock+0x12/0x20
[86220.127300]  #3:  (rcu_read_lock_bh){}, at: [] 
addrconf_verify_rtnl+0x1e/0x510 [ipv6]
[86220.127414] CPU: 0 PID: 1003 Comm: kworker/0:2 Not tainted 
4.15.0-rc7-next-20180110+ #7
[86220.127463] Hardware name: ZOTAC ZBOX-CI321NANO/ZBOX-CI321NANO, BIOS 
B246P105 06/01/2015
[86220.127528] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
[86220.127568] Call Trace:
[86220.127591]  dump_stack+0x70/0x9e
[86220.127616]  ___might_sleep+0x14d/0x240
[86220.127644]  __might_sleep+0x45/0x80
[86220.127672]  kmem_cache_alloc_trace+0x53/0x250
[86220.127717]  ? ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.127762]  ipv6_add_addr+0xfe/0x6e0 [ipv6]
[86220.127807]  ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.127854]  ? ipv6_create_tempaddr+0x24d/0x430 [ipv6]
[86220.127903]  addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.127950]  ? addrconf_verify_rtnl+0x339/0x510 [ipv6]
[86220.127998]  addrconf_verify_work+0xe/0x20 [ipv6]
[86220.128032]  process_one_work+0x258/0x680
[86220.128063]  worker_thread+0x35/0x3f0
[86220.128091]  kthread+0x124/0x140
[86220.128117]  ? process_one_work+0x680/0x680
[86220.128146]  ? kthread_create_worker_on_cpu+0x40/0x40
[86220.128180]  ? umh_complete+0x40/0x40
[86220.128207]  ? call_usermodehelper_exec_async+0x12a/0x160
[86220.128243]  ret_from_fork+0x4b/0x60


[net-next:master 308/357] gemini.c:undefined reference to `devm_ioremap_resource'

2018-01-20 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   8565d26bcb2ff6df646e946d2913fcf706d46b66
commit: 4d5ae32f5e1e13f7f36d6439ec3257993b9f5b88 [308/357] net: ethernet: Add a 
driver for Gemini gigabit ethernet
config: um-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
git checkout 4d5ae32f5e1e13f7f36d6439ec3257993b9f5b88
# save the attached .config to linux build tree
make ARCH=um 

All errors (new ones prefixed by >>):

   arch/um/drivers/vde.o: In function `vde_open_real':
   (.text+0x951): warning: Using 'getgrnam' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
   (.text+0x79c): warning: Using 'getpwuid' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
   (.text+0xab5): warning: Using 'getaddrinfo' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoaddr':
   (.text+0xdee5): warning: Using 'gethostbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametonetaddr':
   (.text+0xdf85): warning: Using 'getnetbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoproto':
   (.text+0xe1a5): warning: Using 'getprotobyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   arch/um/drivers/pcap.o: In function `pcap_nametoport':
   (.text+0xdfd7): warning: Using 'getservbyname' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
   drivers/net/ethernet/cortina/gemini.o: In function `geth_cleanup_freeq':
>> gemini.c:(.text+0x97d): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0xad5): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0xb85): undefined reference to `bad_dma_ops'
   drivers/net/ethernet/cortina/gemini.o: In function `gemini_ethernet_probe':
>> gemini.c:(.text+0xd2b): undefined reference to `devm_ioremap_resource'
   drivers/net/ethernet/cortina/gemini.o: In function 
`geth_freeq_alloc_map_page':
   gemini.c:(.text+0x11c4): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0x11d0): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0x13dc): undefined reference to `bad_dma_ops'
   drivers/net/ethernet/cortina/gemini.o: In function `geth_resize_freeq':
   gemini.c:(.text+0x1dcd): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0x1fc4): undefined reference to `bad_dma_ops'
   drivers/net/ethernet/cortina/gemini.o:gemini.c:(.text+0x2167): more 
undefined references to `bad_dma_ops' follow
   drivers/net/ethernet/cortina/gemini.o: In function 
`gemini_ethernet_port_probe':
   gemini.c:(.text+0x2dec): undefined reference to `devm_ioremap_resource'
   gemini.c:(.text+0x2e4c): undefined reference to `devm_ioremap_resource'
   drivers/net/ethernet/cortina/gemini.o: In function `gmac_map_tx_bufs.isra.0':
   gemini.c:(.text+0x37fb): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0x3813): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0x393a): undefined reference to `bad_dma_ops'
   drivers/net/ethernet/cortina/gemini.o: In function `gmac_cleanup_rxq':
   gemini.c:(.text+0x3ffa): undefined reference to `bad_dma_ops'
   gemini.c:(.text+0x409d): undefined reference to `bad_dma_ops'
   drivers/net/ethernet/cortina/gemini.o:gemini.c:(.text+0x4245): more 
undefined references to `bad_dma_ops' follow
   collect2: error: ld returned 1 exit status

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v6 2/3] clocksource/drivers/atcpit100: VDSO support

2018-01-20 Thread Vincent Chen
Correct the composition

Dear Arnd:

1. Setup an additional memory mapping for clock hardware in user space
when establishing vdso-needed memory mapping
 In arch_setup_additional_pages(), kernel establishes memory
mapping for vdso's text and vdata page in user space. In order to make
clock hardware be accessible in user space, we try to establish an
additional memory mapping for clock hardware here based on clock
information from driver. This page is located between vdata page and
vdso text page. For safety, this region for clock accessing is
read-only.

2. Accessing clock hardware in vdso
After step 1, clock hardware is accessible in user space through
memory-mapped IO. However, it is not enough to access a specific
register. Therefore, we store register offset information in vdata
page to make it visible in user space.  Now, vdso can derive the
address of counter register by summation of __get_timerpage() and
counter register offset where __get_timerpage() is used to derive the
virtual address of memory-mapped clock.

2018-01-20 19:11 GMT+08:00 Vincent Chen :
> 2018-01-18 19:08 GMT+08:00 Arnd Bergmann :
>> On Mon, Jan 15, 2018 at 6:57 AM, Greentime Hu  wrote:
>>> From: Rick Chen 
>>>
>>> VDSO needs real-time cycle count to ensure the time accuracy.
>>> Unlike others, nds32 architecture does not define clock source,
>>> hence VDSO needs atcpit100 offering real-time cycle count
>>> to derive the correct time.
>>>
>>> Signed-off-by: Vincent Chen 
>>> Signed-off-by: Rick Chen 
>>> Signed-off-by: Greentime Hu 
>>
>> I'm a bit puzzled by this patch, can you explain how the vdso actually
>> manages to access the clock hardware? It looks like you make the
>> physical address and the register offset available to user space, but
>> how does it end up accessing it?
>>
>>   Arnd
>
> Dear Arnd:
>
> Accessing clock hardware in vdso can be divided to 2 step.
>
> 1. Setup an additional memory mapping for clock hardware in user space
> when establishing
> vdso-needed memory mapping
>
> In arch_setup_additional_pages(), kernel establishes memory
> mapping for vdso's text and vdata page
> in user space. In order to make clock hardware be accessible in
> user space, we try to establish an
> additional memory mapping for clock hardware here based on clock
> information from driver. This page is
> located between vdata page and vdso text page. For safety, this
> region for clock accessing is read-only.
>
> 2. Accessing clock hardware in vdso
>
> After step 1, clock hardware is accessible in user space
> through memory-mapped IO. However, it is not
>enough to access a specific register. Therefore, we store register
> offset information in vdata page to make it
>visible in user space.  Now, vdso can derive the address of counter
> register by summation of __get_timerpage()
>and counter register offset where __get_timerpage() is used to
> derive the virtual address of memory-mapped
>clock.
>
>
> Vincent


Re: [PATCH net-next v4 6/8] net: sched: create tc_can_offload_extack() wrapper

2018-01-20 Thread Jakub Kicinski
On Sat, Jan 20, 2018 at 2:54 AM, Jiri Pirko  wrote:
> Sat, Jan 20, 2018 at 11:33:31AM CET, jakub.kicin...@netronome.com wrote:
>>On Sat, Jan 20, 2018 at 2:22 AM, Jiri Pirko  wrote:
>>> Sat, Jan 20, 2018 at 11:12:25AM CET, jakub.kicin...@netronome.com wrote:
net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_red.c:if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_red.c:if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)

Do you mean the qdisc offloads too?  The whole lot?
>>>
>>> Yes.
>>
>>Actually looking at the qdisc code and destroy callbacks, if we plumb
>>it through everywhere won't that mean user will see error messages on
>>destroy of qdiscs/filters which were never offloaded?
>>
>>Just looking at prio_offload() as a simple example.  prio_destroy()
>>will always call tc_can_offload().
>
> Hmmm. You are right.
> Either we pass null from there (NL_SET_ERR_MSG can cope), or we leave
> tc_can_offload helper as is and let the caller to set the extack.

I'm afraid it doesn't stop there though :/  Even now if one install a
filter on a netdev with offload abilities and no skip_* flag there is
this:

# tc filter add dev eth0 ingress bpf obj drop.o da
Warning: TC offload is disabled on net device.

I'm not sure we should warn the user if there are no skip_* flags,
just because the device has *some* offload capabilities?   Perhaps we
should put the flags into tc_cls_common_offload and add a helper to
only set extack if skip_sw is set..

> I see that tc_can_offload_extack is probably good idea then.
>
> But please use it in all drivers that are calling tc_can_offload so the
> user experience is consistent for all.

Certainly, will do.


Re: [PATCH v6 2/3] clocksource/drivers/atcpit100: VDSO support

2018-01-20 Thread Vincent Chen
2018-01-18 19:08 GMT+08:00 Arnd Bergmann :
> On Mon, Jan 15, 2018 at 6:57 AM, Greentime Hu  wrote:
>> From: Rick Chen 
>>
>> VDSO needs real-time cycle count to ensure the time accuracy.
>> Unlike others, nds32 architecture does not define clock source,
>> hence VDSO needs atcpit100 offering real-time cycle count
>> to derive the correct time.
>>
>> Signed-off-by: Vincent Chen 
>> Signed-off-by: Rick Chen 
>> Signed-off-by: Greentime Hu 
>
> I'm a bit puzzled by this patch, can you explain how the vdso actually
> manages to access the clock hardware? It looks like you make the
> physical address and the register offset available to user space, but
> how does it end up accessing it?
>
>   Arnd

Dear Arnd:

Accessing clock hardware in vdso can be divided to 2 step.

1. Setup an additional memory mapping for clock hardware in user space
when establishing
vdso-needed memory mapping

In arch_setup_additional_pages(), kernel establishes memory
mapping for vdso's text and vdata page
in user space. In order to make clock hardware be accessible in
user space, we try to establish an
additional memory mapping for clock hardware here based on clock
information from driver. This page is
located between vdata page and vdso text page. For safety, this
region for clock accessing is read-only.

2. Accessing clock hardware in vdso

After step 1, clock hardware is accessible in user space
through memory-mapped IO. However, it is not
   enough to access a specific register. Therefore, we store register
offset information in vdata page to make it
   visible in user space.  Now, vdso can derive the address of counter
register by summation of __get_timerpage()
   and counter register offset where __get_timerpage() is used to
derive the virtual address of memory-mapped
   clock.


Vincent


Hello dear

2018-01-20 Thread verify
  
Hello dear
My name is abudul  Ahassan, I work with one of the leading Banks here in Africa.
 I have a business proposal worth 18$million dollars for more information 
contact me on
my private email

I urgently hope to get your response as soon as possible.
Yours Sincerely, 
Abudu Ahassan   






Re: [PATCH net-next v4 6/8] net: sched: create tc_can_offload_extack() wrapper

2018-01-20 Thread Jiri Pirko
Sat, Jan 20, 2018 at 11:33:31AM CET, jakub.kicin...@netronome.com wrote:
>On Sat, Jan 20, 2018 at 2:22 AM, Jiri Pirko  wrote:
>> Sat, Jan 20, 2018 at 11:12:25AM CET, jakub.kicin...@netronome.com wrote:
>>>net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
>>>!dev->netdev_ops->ndo_setup_tc)
>>>net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
>>>!dev->netdev_ops->ndo_setup_tc)
>>>net/sched/sch_red.c:if (!tc_can_offload(dev) ||
>>>!dev->netdev_ops->ndo_setup_tc)
>>>net/sched/sch_red.c:if (!tc_can_offload(dev) ||
>>>!dev->netdev_ops->ndo_setup_tc)
>>>
>>>Do you mean the qdisc offloads too?  The whole lot?
>>
>> Yes.
>
>Actually looking at the qdisc code and destroy callbacks, if we plumb
>it through everywhere won't that mean user will see error messages on
>destroy of qdiscs/filters which were never offloaded?
>
>Just looking at prio_offload() as a simple example.  prio_destroy()
>will always call tc_can_offload().

Hmmm. You are right.
Either we pass null from there (NL_SET_ERR_MSG can cope), or we leave
tc_can_offload helper as is and let the caller to set the extack.
I see that tc_can_offload_extack is probably good idea then.

But please use it in all drivers that are calling tc_can_offload so the
user experience is consistent for all.

Thanks!


Re: [PATCH net-next v4 6/8] net: sched: create tc_can_offload_extack() wrapper

2018-01-20 Thread Jakub Kicinski
On Sat, Jan 20, 2018 at 2:22 AM, Jiri Pirko  wrote:
> Sat, Jan 20, 2018 at 11:12:25AM CET, jakub.kicin...@netronome.com wrote:
>>net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
>>!dev->netdev_ops->ndo_setup_tc)
>>net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
>>!dev->netdev_ops->ndo_setup_tc)
>>net/sched/sch_red.c:if (!tc_can_offload(dev) ||
>>!dev->netdev_ops->ndo_setup_tc)
>>net/sched/sch_red.c:if (!tc_can_offload(dev) ||
>>!dev->netdev_ops->ndo_setup_tc)
>>
>>Do you mean the qdisc offloads too?  The whole lot?
>
> Yes.

Actually looking at the qdisc code and destroy callbacks, if we plumb
it through everywhere won't that mean user will see error messages on
destroy of qdiscs/filters which were never offloaded?

Just looking at prio_offload() as a simple example.  prio_destroy()
will always call tc_can_offload().

Hmm...


Re: [PATCH net-next v4 5/8] net: sched: add extack support for offload via tc_cls_common_offload

2018-01-20 Thread Jakub Kicinski
On Sat, Jan 20, 2018 at 12:54 AM, Jiri Pirko  wrote:
> Sat, Jan 20, 2018 at 02:44:47AM CET, jakub.kicin...@netronome.com wrote:
>>From: Quentin Monnet 
>>
>>Add extack support for hardware offload of classifiers. In order
>>to achieve this, a pointer to a struct netlink_ext_ack is added to the
>>struct tc_cls_common_offload that is passed to the callback for setting
>>up the classifier. Function tc_cls_common_offload_init() is updated to
>>support initialization of this new attribute.
>>
>>Signed-off-by: Quentin Monnet 
>>Reviewed-by: Jakub Kicinski 
>
> [...]
>
>
>>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>>index f675a92e1b66..727c10378f37 100644
>>--- a/net/sched/cls_flower.c
>>+++ b/net/sched/cls_flower.c
>>@@ -223,7 +223,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
>>struct cls_fl_filter *f)
>>   struct tc_cls_flower_offload cls_flower = {};
>>   struct tcf_block *block = tp->chain->block;
>>
>>-  tc_cls_common_offload_init(_flower.common, tp);
>>+  tc_cls_common_offload_init(_flower.common, tp, NULL);
>
> While you are at it, you should propagate extack whenever possible. For
> this, you can easily pass extack to fl_hw_destroy_filter.
> Same for other classifiers.
>
>>   cls_flower.command = TC_CLSFLOWER_DESTROY;
>>   cls_flower.cookie = (unsigned long) f;
>>
>>@@ -243,7 +243,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>>   bool skip_sw = tc_skip_sw(f->flags);
>>   int err;
>>
>>-  tc_cls_common_offload_init(_flower.common, tp);
>>+  tc_cls_common_offload_init(_flower.common, tp, extack);
>>   cls_flower.command = TC_CLSFLOWER_REPLACE;
>>   cls_flower.cookie = (unsigned long) f;
>>   cls_flower.dissector = dissector;
>>@@ -272,7 +272,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, 
>>struct cls_fl_filter *f)
>>   struct tc_cls_flower_offload cls_flower = {};
>>   struct tcf_block *block = tp->chain->block;
>>
>>-  tc_cls_common_offload_init(_flower.common, tp);
>>+  tc_cls_common_offload_init(_flower.common, tp, NULL);
>
> For this, it would be probably a bit trickier to get extack, but also
> possible.
> My motivation is to make the extack always available for users of
> tc_cls_common_offload

Not sure I can think of anything other than "IO error" that could stop
us from destroy or dumping stats ;)

Also I'm not sure extack fits well with dumps of multiple entities,
how would user know which entity produced the message?  Extack is best
for configuration requests IMHO..  (I'm not saying we shouldn't plumb
it through to more places, just wondering what you think.)


Re: [PATCH net-next v4 6/8] net: sched: create tc_can_offload_extack() wrapper

2018-01-20 Thread Jiri Pirko
Sat, Jan 20, 2018 at 11:12:25AM CET, jakub.kicin...@netronome.com wrote:
>On Sat, Jan 20, 2018 at 12:59 AM, Jiri Pirko  wrote:
>> Sat, Jan 20, 2018 at 02:44:48AM CET, jakub.kicin...@netronome.com wrote:
>>>From: Quentin Monnet 
>>>
>>>Create a wrapper around tc_can_offload() that takes an additional
>>>extack pointer argument in order to output an error message if TC
>>>offload is disabled on the device.
>>>
>>>In this way, the error message is handled by the core and can be the
>>>same for all drivers.
>>>
>>>Signed-off-by: Quentin Monnet 
>>>Reviewed-by: Jakub Kicinski 
>>>---
>>> include/net/pkt_cls.h | 11 +++
>>> 1 file changed, 11 insertions(+)
>>>
>>>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>>>index f497f622580b..2f8f16a4d88e 100644
>>>--- a/include/net/pkt_cls.h
>>>+++ b/include/net/pkt_cls.h
>>>@@ -656,6 +656,17 @@ static inline bool tc_can_offload(const struct 
>>>net_device *dev)
>>>   return dev->features & NETIF_F_HW_TC;
>>> }
>>>
>>>+static inline bool tc_can_offload_extack(const struct net_device *dev,
>>>+   struct netlink_ext_ack *extack)
>>
>> I don't like to add tc_can_offload variant for this. It makes sense
>> the original tc_can_offload to be extended and set the extack message
>> always.
>>
>> It would require some more work in drivers (5), sure, but we endup with
>> nicer and consistent code.
>
>$ git grep tc_can_offload
>drivers/net/ethernet/broadcom/bnxt/bnxt.c:  if
>(!bnxt_tc_flower_enabled(bp) || !tc_can_offload(bp->dev))
>drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:  if
>(!bnxt_tc_flower_enabled(vf_rep->bp) || !tc_can_offload(bp->dev))
>drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:if
>(!tc_can_offload(dev))
>drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:  if
>(!tc_can_offload(adapter->netdev))
>drivers/net/ethernet/mellanox/mlx5/core/en_main.c:  if
>(!tc_can_offload(priv->netdev))
>drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:   if
>(!tc_can_offload(priv->netdev))
>drivers/net/ethernet/mellanox/mlxsw/spectrum.c: if
>(!tc_can_offload(mlxsw_sp_port->dev))
>drivers/net/ethernet/netronome/nfp/bpf/main.c:
>!tc_can_offload(nn->dp.netdev) ||
>drivers/net/ethernet/netronome/nfp/flower/offload.c:if
>(!tc_can_offload(repr->netdev))
>drivers/net/ethernet/netronome/nfp/flower/offload.c:if
>(!tc_can_offload(repr->netdev))
>drivers/net/netdevsim/bpf.c:!tc_can_offload(ns->netdev) ||
>include/net/pkt_cls.h:static inline bool tc_can_offload(const struct
>net_device *dev)
>net/dsa/slave.c:if (!tc_can_offload(dev))
>net/sched/cls_api.c:if (!tc_can_offload(dev) &&
>tcf_block_offload_in_use(block))
>net/sched/sch_mq.c: if (!tc_can_offload(dev) ||
>!dev->netdev_ops->ndo_setup_tc)
>net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
>!dev->netdev_ops->ndo_setup_tc)
>net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
>!dev->netdev_ops->ndo_setup_tc)
>net/sched/sch_red.c:if (!tc_can_offload(dev) ||
>!dev->netdev_ops->ndo_setup_tc)
>net/sched/sch_red.c:if (!tc_can_offload(dev) ||
>!dev->netdev_ops->ndo_setup_tc)
>
>Do you mean the qdisc offloads too?  The whole lot?

Yes.



Re: [PATCH net-next v4 6/8] net: sched: create tc_can_offload_extack() wrapper

2018-01-20 Thread Jakub Kicinski
On Sat, Jan 20, 2018 at 12:59 AM, Jiri Pirko  wrote:
> Sat, Jan 20, 2018 at 02:44:48AM CET, jakub.kicin...@netronome.com wrote:
>>From: Quentin Monnet 
>>
>>Create a wrapper around tc_can_offload() that takes an additional
>>extack pointer argument in order to output an error message if TC
>>offload is disabled on the device.
>>
>>In this way, the error message is handled by the core and can be the
>>same for all drivers.
>>
>>Signed-off-by: Quentin Monnet 
>>Reviewed-by: Jakub Kicinski 
>>---
>> include/net/pkt_cls.h | 11 +++
>> 1 file changed, 11 insertions(+)
>>
>>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>>index f497f622580b..2f8f16a4d88e 100644
>>--- a/include/net/pkt_cls.h
>>+++ b/include/net/pkt_cls.h
>>@@ -656,6 +656,17 @@ static inline bool tc_can_offload(const struct 
>>net_device *dev)
>>   return dev->features & NETIF_F_HW_TC;
>> }
>>
>>+static inline bool tc_can_offload_extack(const struct net_device *dev,
>>+   struct netlink_ext_ack *extack)
>
> I don't like to add tc_can_offload variant for this. It makes sense
> the original tc_can_offload to be extended and set the extack message
> always.
>
> It would require some more work in drivers (5), sure, but we endup with
> nicer and consistent code.

$ git grep tc_can_offload
drivers/net/ethernet/broadcom/bnxt/bnxt.c:  if
(!bnxt_tc_flower_enabled(bp) || !tc_can_offload(bp->dev))
drivers/net/ethernet/broadcom/bnxt/bnxt_vfr.c:  if
(!bnxt_tc_flower_enabled(vf_rep->bp) || !tc_can_offload(bp->dev))
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:if
(!tc_can_offload(dev))
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:  if
(!tc_can_offload(adapter->netdev))
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:  if
(!tc_can_offload(priv->netdev))
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:   if
(!tc_can_offload(priv->netdev))
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: if
(!tc_can_offload(mlxsw_sp_port->dev))
drivers/net/ethernet/netronome/nfp/bpf/main.c:
!tc_can_offload(nn->dp.netdev) ||
drivers/net/ethernet/netronome/nfp/flower/offload.c:if
(!tc_can_offload(repr->netdev))
drivers/net/ethernet/netronome/nfp/flower/offload.c:if
(!tc_can_offload(repr->netdev))
drivers/net/netdevsim/bpf.c:!tc_can_offload(ns->netdev) ||
include/net/pkt_cls.h:static inline bool tc_can_offload(const struct
net_device *dev)
net/dsa/slave.c:if (!tc_can_offload(dev))
net/sched/cls_api.c:if (!tc_can_offload(dev) &&
tcf_block_offload_in_use(block))
net/sched/sch_mq.c: if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_prio.c:   if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_red.c:if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)
net/sched/sch_red.c:if (!tc_can_offload(dev) ||
!dev->netdev_ops->ndo_setup_tc)

Do you mean the qdisc offloads too?  The whole lot?


[patch iproute2 net-next v12 1/3] tc: introduce tc_qdisc_block_exists helper

2018-01-20 Thread Jiri Pirko
From: Jiri Pirko 

This hepler used qdisc dump to list all qdisc and find if block index in
question is used by any of them. That means the block with specified
index exists.

Signed-off-by: Jiri Pirko 
---
v11->v12:
- fixed return type of tc_qdisc_block_exists
- removed 0 checks for block in tc_qdisc_block_exists
- rever xmas tree variables in tc_qdisc_block_exists
---
 tc/tc_qdisc.c | 61 +++
 tc/tc_util.h  |  2 ++
 2 files changed, 63 insertions(+)

diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index 70279b9d..54701c26 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -412,3 +412,64 @@ int do_qdisc(int argc, char **argv)
fprintf(stderr, "Command \"%s\" is unknown, try \"tc qdisc help\".\n", 
*argv);
return -1;
 }
+
+struct tc_qdisc_block_exists_ctx {
+   __u32 block_index;
+   bool found;
+};
+
+static int tc_qdisc_block_exists_cb(const struct sockaddr_nl *who,
+   struct nlmsghdr *n, void *arg)
+{
+   struct tc_qdisc_block_exists_ctx *ctx = arg;
+   struct tcmsg *t = NLMSG_DATA(n);
+   struct rtattr *tb[TCA_MAX+1];
+   int len = n->nlmsg_len;
+
+   if (n->nlmsg_type != RTM_NEWQDISC)
+   return 0;
+
+   len -= NLMSG_LENGTH(sizeof(*t));
+   if (len < 0)
+   return -1;
+
+   parse_rtattr(tb, TCA_MAX, TCA_RTA(t), len);
+
+   if (tb[TCA_KIND] == NULL)
+   return -1;
+
+   if (tb[TCA_INGRESS_BLOCK] &&
+   RTA_PAYLOAD(tb[TCA_INGRESS_BLOCK]) >= sizeof(__u32)) {
+   __u32 block = rta_getattr_u32(tb[TCA_INGRESS_BLOCK]);
+
+   if (block == ctx->block_index)
+   ctx->found = true;
+   }
+
+   if (tb[TCA_EGRESS_BLOCK] &&
+   RTA_PAYLOAD(tb[TCA_EGRESS_BLOCK]) >= sizeof(__u32)) {
+   __u32 block = rta_getattr_u32(tb[TCA_EGRESS_BLOCK]);
+
+   if (block == ctx->block_index)
+   ctx->found = true;
+   }
+   return 0;
+}
+
+bool tc_qdisc_block_exists(__u32 block_index)
+{
+   struct tc_qdisc_block_exists_ctx ctx = { .block_index = block_index };
+   struct tcmsg t = { .tcm_family = AF_UNSPEC };
+
+   if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
+   perror("Cannot send dump request");
+   return false;
+   }
+
+   if (rtnl_dump_filter(, tc_qdisc_block_exists_cb, ) < 0) {
+   perror("Dump terminated\n");
+   return false;
+   }
+
+   return ctx.found;
+}
diff --git a/tc/tc_util.h b/tc/tc_util.h
index 1218610d..682dd4fd 100644
--- a/tc/tc_util.h
+++ b/tc/tc_util.h
@@ -132,4 +132,6 @@ int prio_print_opt(struct qdisc_util *qu, FILE *f, struct 
rtattr *opt);
 int cls_names_init(char *path);
 void cls_names_uninit(void);
 
+bool tc_qdisc_block_exists(__u32 block_index);
+
 #endif
-- 
2.14.3



[patch iproute2 net-next v12 0/3] tc: shared block support

2018-01-20 Thread Jiri Pirko
From: Jiri Pirko 

Kernel allows to share all filters between qdiscs with use
of shared block.

Example:

block number 22. "22" is just an identification:
$ tc qdisc add dev ens7 ingress_block 22 ingress

$ tc qdisc add dev ens8 ingress_block 22 ingress


If we don't specify "block" command line option, no shared block would
be created:
$ tc qdisc add dev ens9 ingress

Now if we list the qdiscs, we will see the block index in the output:

$ tc qdisc
qdisc ingress : dev ens7 parent :fff1 ingress_block 22
qdisc ingress : dev ens8 parent :fff1 ingress_block 22
qdisc ingress : dev ens9 parent :fff1


To make is more visual, the situation looks like this:

   ens7 ingress qdisc ens7 ingress qdisc
  |  |
  |  |
  +-->  block 22  <--+

Unlimited number of qdiscs may share the same block.

Block sharing is also supported for clsact qdisc:
$ tc qdisc add dev ens10 ingress_block 23 egress_block 24 clsact
$ tc qdisc show dev ens10
qdisc clsact : dev ens10 parent :fff1 ingress_block 23 egress_block 24


We can add filter using the block index:

$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 
action drop


Note we cannot use the qdisc for filter manipulations of shared blocks:

$ tc filter add dev ens8 ingress protocol ip pref 1 flower dst_ip 192.168.100.2 
action drop
Error: This filter block is shared. Please use the block index to manipulate 
the filters.


We will see the same output if we list filters for ingress qdisc of
ens7 and ens8, also for the block 22:

$ tc filter show block 22
filter protocol ip pref 25 flower chain 0
filter protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens7 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

$ tc filter show dev ens8 ingress
filter block 22 protocol ip pref 25 flower chain 0
filter block 22 protocol ip pref 25 flower chain 0 handle 0x1
...

---
v11->v12:
- separate iproute2 patchset, kernel part is merged
- original patch 1 is removed as the header changes are already
  in iproute2 code
- patch 1:
 - fixed return type of tc_qdisc_block_exists
 - removed 0 checks for block in tc_qdisc_block_exists
 - rever xmas tree variables in tc_qdisc_block_exists
- patch 2
 - fixes error message when both dev and block are on the command line

---
Note that if you want me to change the ">=" vs "==" thing, I will do it.
But I think that it is a global iproute2 thing and should be changed in
a separate patch/patchset.

Jiri Pirko (3):
  tc: introduce tc_qdisc_block_exists helper
  tc: introduce support for block-handle for filter operations
  tc: implement ingress/egress block index attributes for qdiscs

 man/man8/tc.8  |  24 -
 tc/tc_filter.c | 105 +++--
 tc/tc_qdisc.c  |  97 
 tc/tc_util.h   |   2 ++
 4 files changed, 210 insertions(+), 18 deletions(-)

-- 
2.14.3



[patch iproute2 net-next v12 3/3] tc: implement ingress/egress block index attributes for qdiscs

2018-01-20 Thread Jiri Pirko
From: Jiri Pirko 

During qdisc creation it is possible to specify shared block for bot
ingress and egress. Pass this values to kernel according to the command
line options.

Signed-off-by: Jiri Pirko 
---
 man/man8/tc.8 |  6 +-
 tc/tc_qdisc.c | 36 
 2 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index d65f0583..5ffea373 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -11,7 +11,11 @@ tc \- show / manipulate traffic control settings
 \fIqdisc-id\fR
 .B | root ]
 .B [ handle
-\fIqdisc-id\fR ] qdisc
+\fIqdisc-id\fR ]
+.B [ ingress_block
+\fIBLOCK_INDEX\fR ]
+.B [ egress_block
+\fIBLOCK_INDEX\fR ] qdisc
 [ qdisc specific parameters ]
 .P
 
diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index 54701c26..8cc4b73d 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -31,6 +31,7 @@ static int usage(void)
fprintf(stderr, "   [ handle QHANDLE ] [ root | ingress | clsact | 
parent CLASSID ]\n");
fprintf(stderr, "   [ estimator INTERVAL TIME_CONSTANT ]\n");
fprintf(stderr, "   [ stab [ help | STAB_OPTIONS] ]\n");
+   fprintf(stderr, "   [ ingress_block BLOCK_INDEX ] [ egress_block 
BLOCK_INDEX ]\n");
fprintf(stderr, "   [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n");
fprintf(stderr, "\n");
fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
] [ invisible ]\n");
@@ -61,6 +62,8 @@ static int tc_qdisc_modify(int cmd, unsigned int flags, int 
argc, char **argv)
.n.nlmsg_type = cmd,
.t.tcm_family = AF_UNSPEC,
};
+   __u32 ingress_block = 0;
+   __u32 egress_block = 0;
 
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@@ -121,6 +124,14 @@ static int tc_qdisc_modify(int cmd, unsigned int flags, 
int argc, char **argv)
if (parse_size_table(, , ) < 0)
return -1;
continue;
+   } else if (matches(*argv, "ingress_block") == 0) {
+   NEXT_ARG();
+   if (get_u32(_block, *argv, 0) || !ingress_block)
+   invarg("invalid ingress block index value", 
*argv);
+   } else if (matches(*argv, "egress_block") == 0) {
+   NEXT_ARG();
+   if (get_u32(_block, *argv, 0) || !egress_block)
+   invarg("invalid egress block index value", 
*argv);
} else if (matches(*argv, "help") == 0) {
usage();
} else {
@@ -138,6 +149,13 @@ static int tc_qdisc_modify(int cmd, unsigned int flags, 
int argc, char **argv)
if (est.ewma_log)
addattr_l(, sizeof(req), TCA_RATE, , sizeof(est));
 
+   if (ingress_block)
+   addattr32(, sizeof(req),
+ TCA_INGRESS_BLOCK, ingress_block);
+   if (egress_block)
+   addattr32(, sizeof(req),
+ TCA_EGRESS_BLOCK, egress_block);
+
if (q) {
if (q->parse_qopt) {
if (q->parse_qopt(q, argc, argv, , d))
@@ -270,6 +288,24 @@ int print_qdisc(const struct sockaddr_nl *who,
(rta_getattr_u8(tb[TCA_HW_OFFLOAD])))
print_bool(PRINT_ANY, "offloaded", "offloaded ", true);
 
+   if (tb[TCA_INGRESS_BLOCK] &&
+   RTA_PAYLOAD(tb[TCA_INGRESS_BLOCK]) >= sizeof(__u32)) {
+   __u32 block = rta_getattr_u32(tb[TCA_INGRESS_BLOCK]);
+
+   if (block)
+   print_uint(PRINT_ANY, "ingress_block",
+  "ingress_block %u ", block);
+   }
+
+   if (tb[TCA_EGRESS_BLOCK] &&
+   RTA_PAYLOAD(tb[TCA_EGRESS_BLOCK]) >= sizeof(__u32)) {
+   __u32 block = rta_getattr_u32(tb[TCA_EGRESS_BLOCK]);
+
+   if (block)
+   print_uint(PRINT_ANY, "egress_block",
+  "egress_block %u ", block);
+   }
+
/* pfifo_fast is generic enough to warrant the hardcoding --JHS */
if (strcmp("pfifo_fast", RTA_DATA(tb[TCA_KIND])) == 0)
q = get_qdisc_kind("prio");
-- 
2.14.3



[patch iproute2 net-next v12 2/3] tc: introduce support for block-handle for filter operations

2018-01-20 Thread Jiri Pirko
From: Jiri Pirko 

So far, qdisc was the only handle that could be used to manipulate
filters. Kernel added support for using block to manipulate it. So add
the support to use block index to manipulate filters. The magic
TCM_IFINDEX_MAGIC_BLOCK indicates the block index is in use.

Signed-off-by: Jiri Pirko 
---
v11->v12:
- fixes error message when both dev and block are on the command line
---
 man/man8/tc.8  |  18 ++
 tc/tc_filter.c | 105 +++--
 2 files changed, 106 insertions(+), 17 deletions(-)

diff --git a/man/man8/tc.8 b/man/man8/tc.8
index ff071b33..d65f0583 100644
--- a/man/man8/tc.8
+++ b/man/man8/tc.8
@@ -41,6 +41,19 @@ tc \- show / manipulate traffic control settings
 .B flowid
 \fIflow-id\fR
 
+.B tc
+.RI "[ " OPTIONS " ]"
+.B filter [ add | change | replace | delete | get ] block
+\fIBLOCK_INDEX\fR
+.B [ handle \fIfilter-id\fR ]
+.B protocol
+\fIprotocol\fR
+.B prio
+\fIpriority\fR filtertype
+[ filtertype specific parameters ]
+.B flowid
+\fIflow-id\fR
+
 .B tc
 .RI "[ " OPTIONS " ]"
 .RI "[ " FORMAT " ]"
@@ -58,6 +71,11 @@ tc \- show / manipulate traffic control settings
 .RI "[ " OPTIONS " ]"
 .B filter show dev
 \fIDEV\fR
+.P
+.B tc
+.RI "[ " OPTIONS " ]"
+.B filter show block
+\fIBLOCK_INDEX\fR
 
 .P
 .ti 8
diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 7dd123ab..c619e464 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -28,14 +28,17 @@
 static void usage(void)
 {
fprintf(stderr,
-   "Usage: tc filter [ add | del | change | replace | show ] dev 
STRING\n"
-   "Usage: tc filter get dev STRING parent CLASSID protocol PROTO 
handle FILTERID pref PRIO FILTER_TYPE\n"
+   "Usage: tc filter [ add | del | change | replace | show ] [ dev 
STRING ]\n"
+   "   tc filter [ add | del | change | replace | show ] [ 
block BLOCK_INDEX ]\n"
+   "   tc filter get dev STRING parent CLASSID protocol PROTO 
handle FILTERID pref PRIO FILTER_TYPE\n"
+   "   tc filter get block BLOCK_INDEX protocol PROTO handle 
FILTERID pref PRIO FILTER_TYPE\n"
"   [ pref PRIO ] protocol PROTO [ chain CHAIN_INDEX ]\n"
"   [ estimator INTERVAL TIME_CONSTANT ]\n"
"   [ root | ingress | egress | parent CLASSID ]\n"
"   [ handle FILTERID ] [ [ FILTER_TYPE ] [ help | OPTIONS 
] ]\n"
"\n"
"   tc filter show [ dev STRING ] [ root | ingress | egress 
| parent CLASSID ]\n"
+   "   tc filter show [ block BLOCK_INDEX ]\n"
"Where:\n"
"FILTER_TYPE := { rsvp | u32 | bpf | fw | route | etc. }\n"
"FILTERID := ... format depends on classifier, see there\n"
@@ -58,6 +61,7 @@ static int tc_filter_modify(int cmd, unsigned int flags, int 
argc, char **argv,
int chain_index_set = 0;
char d[IFNAMSIZ] = {};
int protocol_set = 0;
+   __u32 block_index = 0;
char *fhandle = NULL;
__u32 protocol = 0;
__u32 chain_index;
@@ -89,7 +93,21 @@ static int tc_filter_modify(int cmd, unsigned int flags, int 
argc, char **argv,
NEXT_ARG();
if (d[0])
duparg("dev", *argv);
+   if (block_index) {
+   fprintf(stderr, "Error: \"dev\" and \"block\" 
are mutually exlusive\n");
+   return -1;
+   }
strncpy(d, *argv, sizeof(d)-1);
+   } else if (matches(*argv, "block") == 0) {
+   NEXT_ARG();
+   if (block_index)
+   duparg("block", *argv);
+   if (d[0]) {
+   fprintf(stderr, "Error: \"dev\" and \"block\" 
are mutually exlusive\n");
+   return -1;
+   }
+   if (get_u32(_index, *argv, 0) || !block_index)
+   invarg("invalid block index value", *argv);
} else if (strcmp(*argv, "root") == 0) {
if (req->t.tcm_parent) {
fprintf(stderr,
@@ -184,6 +202,9 @@ static int tc_filter_modify(int cmd, unsigned int flags, 
int argc, char **argv,
fprintf(stderr, "Cannot find device \"%s\"\n", d);
return 1;
}
+   } else if (block_index) {
+   req->t.tcm_ifindex = TCM_IFINDEX_MAGIC_BLOCK;
+   req->t.tcm_block_index = block_index;
}
 
if (q) {
@@ -228,6 +249,7 @@ static __u32 filter_prio;
 static __u32 filter_protocol;
 static __u32 filter_chain_index;
 static int filter_chain_index_set;
+static __u32 filter_block_index;
 __u16 f_proto;
 
 int print_filter(const struct 

Re: [patch iproute2 net-next v11 3/4] tc: introduce support for block-handle for filter operations

2018-01-20 Thread Jiri Pirko
Fri, Jan 19, 2018 at 09:51:22PM CET, dsah...@gmail.com wrote:
>On 1/17/18 2:48 AM, Jiri Pirko wrote:
>> @@ -89,7 +93,21 @@ static int tc_filter_modify(int cmd, unsigned int flags, 
>> int argc, char **argv,
>>  NEXT_ARG();
>>  if (d[0])
>>  duparg("dev", *argv);
>> +if (block_index) {
>> +fprintf(stderr, "Error: \"dev\" cannot be used 
>> in the same time as \"block\"\n");
>
>'in the same time' does not sound right. something like: 'dev and block
>are mutually exlusive'

ack


>
>> +return -1;
>> +}
>>  strncpy(d, *argv, sizeof(d)-1);
>> +} else if (matches(*argv, "block") == 0) {
>> +NEXT_ARG();
>> +if (block_index)
>> +duparg("block", *argv);
>> +if (d[0]) {
>> +fprintf(stderr, "Error: \"block\" cannot be 
>> used in the same time as \"dev\"\n");
>
>same here. Correct the ones below as well.

ack


>
>
>> +return -1;
>> +}
>> +if (get_u32(_index, *argv, 0) || !block_index)
>> +invarg("invalid block index value", *argv);


Re: [patch iproute2 net-next v11 2/4] tc: introduce tc_qdisc_block_exists helper

2018-01-20 Thread Jiri Pirko
Fri, Jan 19, 2018 at 09:45:57PM CET, dsah...@gmail.com wrote:
>On 1/17/18 2:48 AM, Jiri Pirko wrote:
>> From: Jiri Pirko 
>> 
>> This hepler used qdisc dump to list all qdisc and find if block index in
>> question is used by any of them. That means the block with specified
>> index exists.
>> 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  tc/tc_qdisc.c | 61 
>> +++
>>  tc/tc_util.h  |  2 ++
>>  2 files changed, 63 insertions(+)
>> 
>> diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
>> index 70279b9..3f91558 100644
>> --- a/tc/tc_qdisc.c
>> +++ b/tc/tc_qdisc.c
>> @@ -412,3 +412,64 @@ int do_qdisc(int argc, char **argv)
>>  fprintf(stderr, "Command \"%s\" is unknown, try \"tc qdisc help\".\n", 
>> *argv);
>>  return -1;
>>  }
>> +
>> +struct tc_qdisc_block_exists_ctx {
>> +__u32 block_index;
>> +bool found;
>> +};
>> +
>> +static int tc_qdisc_block_exists_cb(const struct sockaddr_nl *who,
>> +struct nlmsghdr *n, void *arg)
>> +{
>> +struct tc_qdisc_block_exists_ctx *ctx = arg;
>> +struct tcmsg *t = NLMSG_DATA(n);
>> +int len = n->nlmsg_len;
>> +struct rtattr *tb[TCA_MAX+1];
>
>reverse xmas tree

:) Will fix


>
>> +
>> +if (n->nlmsg_type != RTM_NEWQDISC)
>> +return 0;
>> +
>> +len -= NLMSG_LENGTH(sizeof(*t));
>> +if (len < 0)
>> +return -1;
>> +
>> +parse_rtattr(tb, TCA_MAX, TCA_RTA(t), len);
>> +
>> +if (tb[TCA_KIND] == NULL)
>> +return -1;
>> +
>> +if (tb[TCA_INGRESS_BLOCK] &&
>> +RTA_PAYLOAD(tb[TCA_INGRESS_BLOCK]) >= sizeof(__u32)) {
>
>why not just == sizeof(__u32) since that's what it should be?

test1:~/iproute2$ git grep "== sizeof(__u32)" | grep RTA_PAYLOAD| wc -l
1
test1:~/iproute2$ git grep ">= sizeof(__u32)" | grep RTA_PAYLOAD| wc -l
42

I just go with the flow :)


>
>> +__u32 block = rta_getattr_u32(tb[TCA_INGRESS_BLOCK]);
>> +
>> +if (block && block == ctx->block_index)
>
>block > 0 test should not be needed. If someone is calling
>tc_qdisc_block_exists, then block_index should be > 0 in which case you
>just need block == ctx->block_index

Right. Will Fix.


>
>
>> +ctx->found = true;
>> +}
>> +
>> +if (tb[TCA_EGRESS_BLOCK] &&
>> +RTA_PAYLOAD(tb[TCA_EGRESS_BLOCK]) >= sizeof(__u32)) {
>> +__u32 block = rta_getattr_u32(tb[TCA_EGRESS_BLOCK]);
>> +
>> +if (block && block == ctx->block_index)
>
>same 2 comments for this block

ack


>
>> +ctx->found = true;
>> +}
>> +return 0;
>> +}
>> +
>> +int tc_qdisc_block_exists(__u32 block_index)
>
>This should be bool since you are returning true / false

Ah, missed that. Fixing.


>
>> +{
>> +struct tc_qdisc_block_exists_ctx ctx = { .block_index = block_index };
>> +struct tcmsg t = { .tcm_family = AF_UNSPEC };
>> +
>> +if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
>> +perror("Cannot send dump request");
>> +return false;
>> +}
>> +
>> +if (rtnl_dump_filter(, tc_qdisc_block_exists_cb, ) < 0) {
>> +perror("Dump terminated\n");
>> +return false;
>> +}
>> +
>> +return ctx.found;
>> +}
>> diff --git a/tc/tc_util.h b/tc/tc_util.h
>> index 1218610..72f4282 100644
>> --- a/tc/tc_util.h
>> +++ b/tc/tc_util.h
>> @@ -132,4 +132,6 @@ int prio_print_opt(struct qdisc_util *qu, FILE *f, 
>> struct rtattr *opt);
>>  int cls_names_init(char *path);
>>  void cls_names_uninit(void);
>>  
>> +int tc_qdisc_block_exists(__u32 block_index);
>> +
>>  #endif
>> 
>


Re: [PATCH net-next v4 6/8] net: sched: create tc_can_offload_extack() wrapper

2018-01-20 Thread Jiri Pirko
Sat, Jan 20, 2018 at 02:44:48AM CET, jakub.kicin...@netronome.com wrote:
>From: Quentin Monnet 
>
>Create a wrapper around tc_can_offload() that takes an additional
>extack pointer argument in order to output an error message if TC
>offload is disabled on the device.
>
>In this way, the error message is handled by the core and can be the
>same for all drivers.
>
>Signed-off-by: Quentin Monnet 
>Reviewed-by: Jakub Kicinski 
>---
> include/net/pkt_cls.h | 11 +++
> 1 file changed, 11 insertions(+)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index f497f622580b..2f8f16a4d88e 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -656,6 +656,17 @@ static inline bool tc_can_offload(const struct net_device 
>*dev)
>   return dev->features & NETIF_F_HW_TC;
> }
> 
>+static inline bool tc_can_offload_extack(const struct net_device *dev,
>+   struct netlink_ext_ack *extack)

I don't like to add tc_can_offload variant for this. It makes sense
the original tc_can_offload to be extended and set the extack message
always.

It would require some more work in drivers (5), sure, but we endup with
nicer and consistent code.


>+{
>+  bool can = tc_can_offload(dev);
>+
>+  if (!can)
>+  NL_SET_ERR_MSG(extack, "TC offload is disabled on net device");
>+
>+  return can;
>+}
>+
> static inline bool tc_skip_hw(u32 flags)
> {
>   return (flags & TCA_CLS_FLAGS_SKIP_HW) ? true : false;
>-- 
>2.15.1
>


Re: [PATCH net-next v4 5/8] net: sched: add extack support for offload via tc_cls_common_offload

2018-01-20 Thread Jiri Pirko
Sat, Jan 20, 2018 at 02:44:47AM CET, jakub.kicin...@netronome.com wrote:
>From: Quentin Monnet 
>
>Add extack support for hardware offload of classifiers. In order
>to achieve this, a pointer to a struct netlink_ext_ack is added to the
>struct tc_cls_common_offload that is passed to the callback for setting
>up the classifier. Function tc_cls_common_offload_init() is updated to
>support initialization of this new attribute.
>
>Signed-off-by: Quentin Monnet 
>Reviewed-by: Jakub Kicinski 
>---
> include/net/pkt_cls.h| 5 -
> net/sched/cls_bpf.c  | 4 ++--
> net/sched/cls_flower.c   | 6 +++---
> net/sched/cls_matchall.c | 4 ++--
> net/sched/cls_u32.c  | 8 
> 5 files changed, 15 insertions(+), 12 deletions(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 2e4b8e436d25..f497f622580b 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -602,15 +602,18 @@ struct tc_cls_common_offload {
>   u32 chain_index;
>   __be16 protocol;
>   u32 prio;
>+  struct netlink_ext_ack *extack;
> };
> 
> static inline void
> tc_cls_common_offload_init(struct tc_cls_common_offload *cls_common,
>- const struct tcf_proto *tp)
>+ const struct tcf_proto *tp,
>+ struct netlink_ext_ack *extack)
> {
>   cls_common->chain_index = tp->chain->index;
>   cls_common->protocol = tp->protocol;
>   cls_common->prio = tp->prio;
>+  cls_common->extack = extack;
> }

[...]


>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index f675a92e1b66..727c10378f37 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -223,7 +223,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
>struct cls_fl_filter *f)
>   struct tc_cls_flower_offload cls_flower = {};
>   struct tcf_block *block = tp->chain->block;
> 
>-  tc_cls_common_offload_init(_flower.common, tp);
>+  tc_cls_common_offload_init(_flower.common, tp, NULL);

While you are at it, you should propagate extack whenever possible. For
this, you can easily pass extack to fl_hw_destroy_filter.
Same for other classifiers.


>   cls_flower.command = TC_CLSFLOWER_DESTROY;
>   cls_flower.cookie = (unsigned long) f;
> 
>@@ -243,7 +243,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>   bool skip_sw = tc_skip_sw(f->flags);
>   int err;
> 
>-  tc_cls_common_offload_init(_flower.common, tp);
>+  tc_cls_common_offload_init(_flower.common, tp, extack);
>   cls_flower.command = TC_CLSFLOWER_REPLACE;
>   cls_flower.cookie = (unsigned long) f;
>   cls_flower.dissector = dissector;
>@@ -272,7 +272,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, 
>struct cls_fl_filter *f)
>   struct tc_cls_flower_offload cls_flower = {};
>   struct tcf_block *block = tp->chain->block;
> 
>-  tc_cls_common_offload_init(_flower.common, tp);
>+  tc_cls_common_offload_init(_flower.common, tp, NULL);

For this, it would be probably a bit trickier to get extack, but also
possible.
My motivation is to make the extack always available for users of
tc_cls_common_offload


>   cls_flower.command = TC_CLSFLOWER_STATS;
>   cls_flower.cookie = (unsigned long) f;
>   cls_flower.exts = >exts;

[...]


Re: [PATCH bpf-next 0/4] libbpf: add XDP binding support

2018-01-20 Thread Daniel Borkmann
On 01/20/2018 03:27 AM, Alexei Starovoitov wrote:
> On Sat, Jan 20, 2018 at 03:00:37AM +0100, Daniel Borkmann wrote:
>> On 01/19/2018 12:43 AM, Eric Leblond wrote:
>>> Hello,
>>>
>>> This patchset rebases the libbpf code on latest bpf-next code and addresses
>>> remarks by Daniel.
>>
>> Ok, I think it's a good start. We should later on clean up the
>> netlink handling code a bit, but that's all internal and can be
>> done in a second step. Applied to bpf-next, thanks Eric.
> 
> Sorry, Eric, Daniel.
> I had to revert this patch set. It breaks build on systems
> where headers are not the most recent.
> 
> Since libbpf is used by perf it has to be built cleanly on centos7 at least.
> 
> The errors I got:
> bpf.c: In function ‘bpf_set_link_xdp_fd’:
> bpf.c:456:23: error: ‘SOL_NETLINK’ undeclared (first use in this function)
>   if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
>^~~
> bpf.c:456:23: note: each undeclared identifier is reported only once for each 
> function it appears in
> bpf.c:456:36: error: ‘NETLINK_EXT_ACK’ undeclared (first use in this function)
>   if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK,
> ^~~
> nlattr.c: In function ‘nla_dump_errormsg’:
> nlattr.c:152:34: error: ‘NLMSGERR_ATTR_MAX’ undeclared (first use in this 
> function)
>   struct nla_policy extack_policy[NLMSGERR_ATTR_MAX + 1] = {

Yeah, fully agree, thanks for catching this, Alexei!


Re: [PATCH net] net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare

2018-01-20 Thread Sergei Shtylyov

Hello!

On 1/19/2018 11:10 PM, Saeed Mahameed wrote:


From: Talat Batheesh 

Helmut reported a bug about devision by zero while


   Division.


running traffic and doing physical cable pull test.

When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the


   You got it right the on the 2nd time. :-)


next dim iteration there is devision by zero.

This patch prevent this division for both ppms and epms.

Fixes: c3164d2fc48f ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer 
Signed-off-by: Talat Batheesh 
Signed-off-by: Saeed Mahameed 

[...]

MBR, Sergei