Re: pull-request: bpf 2018-08-24

2018-08-23 Thread David Miller
From: Daniel Borkmann 
Date: Fri, 24 Aug 2018 01:09:29 +0200

> The following pull-request contains BPF updates for your *net* tree.

Pulled, thanks Daniel.


[Patch net] tipc: fix a missing rhashtable_walk_exit()

2018-08-23 Thread Cong Wang
rhashtable_walk_exit() must be paired with rhashtable_walk_enter().

Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions")
Cc: Herbert Xu 
Cc: Ying Xue 
Signed-off-by: Cong Wang 
---
 net/tipc/socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index c1e93c9515bc..c9a50b62c738 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2672,6 +2672,8 @@ void tipc_sk_reinit(struct net *net)
 
rhashtable_walk_stop();
} while (tsk == ERR_PTR(-EAGAIN));
+
+   rhashtable_walk_exit();
 }
 
 static struct tipc_sock *tipc_sk_lookup(struct net *net, u32 portid)
-- 
2.14.4



Re: [PATCH bpf] xsk: fix return value of xdp_umem_assign_dev()

2018-08-23 Thread Daniel Borkmann
On 08/23/2018 08:29 PM, Jakub Kicinski wrote:
> On Mon, 20 Aug 2018 09:54:25 +0900, Prashant Bhole wrote:
>> s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev().
>> This function's return value is directly returned by xsk_bind().
>> EOPNOTSUPP is bind()'s possible return value.
>>
>> Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()")
>> Signed-off-by: Prashant Bhole 
> 
> FWIW the refactoring commit just cleaned up the code, is it worth
> submitting a patch to stable to correct 4.18 as well?

Yep, lets do that once it lands in mainline. Thanks!


pull-request: bpf 2018-08-24

2018-08-23 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix BPF sockmap and tls where we get a hang in do_tcp_sendpages()
   when sndbuf is full due to missing calls into underlying socket's
   sk_write_space(), from John.

2) Two BPF sockmap fixes to reject invalid parameters on map creation
   and to fix a map element miscount on allocation failure. Another fix
   for BPF hash tables to use per hash table salt for jhash(), from Daniel.

3) Fix for bpftool's command line parsing in order to terminate on bad
   arguments instead of keeping looping in some border cases, from Quentin.

4) Fix error value of xdp_umem_assign_dev() in order to comply with
   expected bind ops error codes, from Prashant.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit b93c1b5ac8643cc08bb74fa8ae21d6c63dfcb23d:

  hv_netvsc: ignore devices that are not PCI (2018-08-21 12:02:11 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 785e76d7a2051a9e28b9134d5388a45b16f5eb72:

  tools: bpftool: return from do_event_pipe() on bad arguments (2018-08-23 
20:17:57 +0200)


Daniel Borkmann (3):
  bpf, sockmap: fix sock_hash_alloc and reject zero-sized keys
  bpf, sockmap: fix sock hash count in alloc_sock_hash_elem
  bpf: use per htab salt for bucket hash

John Fastabend (2):
  tls: possible hang when do_tcp_sendpages hits sndbuf is full case
  bpf: sockmap: write_space events need to be passed to TCP handler

Prashant Bhole (1):
  xsk: fix return value of xdp_umem_assign_dev()

Quentin Monnet (1):
  tools: bpftool: return from do_event_pipe() on bad arguments

 kernel/bpf/hashtab.c  | 23 +--
 kernel/bpf/sockmap.c  | 11 +--
 net/tls/tls_main.c|  9 +++--
 net/xdp/xdp_umem.c|  4 ++--
 tools/bpf/bpftool/map_perf_ring.c |  5 -
 5 files changed, 35 insertions(+), 17 deletions(-)


RE: [Intel-wired-lan] [PATCH v1 net-next] igb: Use an advanced ctx descriptor for launchtime

2018-08-23 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Jesus Sanchez-Palencia
> Sent: Thursday, July 26, 2018 10:21 AM
> To: intel-wired-...@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Sanchez-Palencia, Jesus  palen...@intel.com>
> Subject: [Intel-wired-lan] [PATCH v1 net-next] igb: Use an advanced ctx
> descriptor for launchtime
> 
> On i210, Launchtime (TxTime) requires the usage of an "Advanced
> Transmit Context Descriptor" for retrieving the timestamp of a packet.
> 
> The igb driver correctly builds such descriptor on the segmentation
> flow (i.e. igb_tso()) or on the checksum one (i.e. igb_tx_csum()), but the
> feature is broken for AF_PACKET if the IGB_TX_FLAGS_VLAN is not set,
> which happens due to an early return on igb_tx_csum().
> 
> This flag is only set by the kernel when a VLAN interface is used,
> thus we can't just rely on it. Here we are fixing this issue by checking
> if launchtime is enabled for the current tx_ring before performing the
> early return.
> 
> Signed-off-by: Jesus Sanchez-Palencia 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Tested-by: Aaron Brown 


RE: [PATCH net 1/1] qlge: Fix netdev features configuration.

2018-08-23 Thread Chopra, Manish
> -Original Message-
> From: Manish Chopra 
> Sent: Friday, August 24, 2018 2:02 AM
> To: da...@davemloft.net
> Cc: netdev@vger.kernel.org; Dept-GE Linux NIC Dev  gelinuxnic...@cavium.com>; bpoir...@suse.com
> Subject: [PATCH net 1/1] qlge: Fix netdev features configuration.
> 
> qlge_fix_features() is not supposed to modify hardware or driver state,
> rather it is supposed to only fix requested fetures bits. Currently
> qlge_fix_features() also goes for interface down and up unnecessarily if there
> is not even any change in features set.
> 
> This patch changes/fixes following -
> 
> 1) Move reload of interface or device re-config from
>qlge_fix_features() to qlge_set_features().
> 2) Reload of interface in qlge_set_features() only if
>relevant feature bit (NETIF_F_HW_VLAN_CTAG_RX) is changed.
> 3) Get rid of qlge_fix_features() since driver is not really
>required to fix any features bit.
> 
> Signed-off-by: Manish 
> Reviewed-by: Benjamin Poirier 
> ---
>  drivers/net/ethernet/qlogic/qlge/qlge_main.c | 23 ---
>  1 file changed, 8 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> index 353f1c1..059ba94 100644
> --- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> +++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
> @@ -2384,26 +2384,20 @@ static int qlge_update_hw_vlan_features(struct
> net_device *ndev,
>   return status;
>  }
> 
> -static netdev_features_t qlge_fix_features(struct net_device *ndev,
> - netdev_features_t features)
> -{
> - int err;
> -
> - /* Update the behavior of vlan accel in the adapter */
> - err = qlge_update_hw_vlan_features(ndev, features);
> - if (err)
> - return err;
> -
> - return features;
> -}
> -
>  static int qlge_set_features(struct net_device *ndev,
>   netdev_features_t features)
>  {
>   netdev_features_t changed = ndev->features ^ features;
> + int err;
> +
> + if (changed & NETIF_F_HW_VLAN_CTAG_RX) {
> + /* Update the behavior of vlan accel in the adapter */
> + err = qlge_update_hw_vlan_features(ndev, features);
> + if (err)
> + return err;
> 
> - if (changed & NETIF_F_HW_VLAN_CTAG_RX)
>   qlge_vlan_mode(ndev, features);
> + }
> 
>   return 0;
>  }
> @@ -4719,7 +4713,6 @@ static int ql_init_device(struct pci_dev *pdev,
> struct net_device *ndev,
>   .ndo_set_mac_address= qlge_set_mac_address,
>   .ndo_validate_addr  = eth_validate_addr,
>   .ndo_tx_timeout = qlge_tx_timeout,
> - .ndo_fix_features   = qlge_fix_features,
>   .ndo_set_features   = qlge_set_features,
>   .ndo_vlan_rx_add_vid= qlge_vlan_rx_add_vid,
>   .ndo_vlan_rx_kill_vid   = qlge_vlan_rx_kill_vid,
> --
> 1.8.3.1

Hello David,  Please ignore this patch and consider the previous same patch. By 
mistake it was submitted twice.

Thanks !!


broken behaviour of TC filter delete

2018-08-23 Thread Roman Mashak



It appears that the following commit changed the behaviour of scenario where a
filter is deleted twice:

commit f71e0ca4db187af7c44987e9d21e9042c3046070
Author: Jiri Pirko 
Date:   Mon Jul 23 09:23:05 2018 +0200

net: sched: Avoid implicit chain 0 creation


Steps to reproduce :

1) create dummy device
   $ ip link add dev dummy0 type dummy

2) create qdisc
   $ tc qdisc add dev dummy0 ingress

3) create simple u32 filter with action attached
   $ tc filter add dev dummy0 parent : protocol ip prio 1 u32 match ip src 
10.10.10.1/32 action ok

4) list the filter
   $ tc filter ls dev dummy0 parent :

5) delete the filter with the given protocol and priority
   $ tc filter del dev dummy0 parent : protocol ip prio 1

6) repeat step 5, this will return -ENOENT ("Error: Filter with specified 
priority/protocol not found.")
However, before the change at step 6 we would get -EINVAL (Error: Cannot find 
specified filter chain.)
and that makes sense.

The change breaks a number of our internal TC tests.


[PATCH net 1/1] qlge: Fix netdev features configuration.

2018-08-23 Thread Manish Chopra
qlge_fix_features() is not supposed to modify hardware or
driver state, rather it is supposed to only fix requested
fetures bits. Currently qlge_fix_features() also goes for
interface down and up unnecessarily if there is not even
any change in features set.

This patch changes/fixes following -

1) Move reload of interface or device re-config from
   qlge_fix_features() to qlge_set_features().
2) Reload of interface in qlge_set_features() only if
   relevant feature bit (NETIF_F_HW_VLAN_CTAG_RX) is changed.
3) Get rid of qlge_fix_features() since driver is not really
   required to fix any features bit.

Signed-off-by: Manish 
Reviewed-by: Benjamin Poirier 
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c 
b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 353f1c1..059ba94 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2384,26 +2384,20 @@ static int qlge_update_hw_vlan_features(struct 
net_device *ndev,
return status;
 }
 
-static netdev_features_t qlge_fix_features(struct net_device *ndev,
-   netdev_features_t features)
-{
-   int err;
-
-   /* Update the behavior of vlan accel in the adapter */
-   err = qlge_update_hw_vlan_features(ndev, features);
-   if (err)
-   return err;
-
-   return features;
-}
-
 static int qlge_set_features(struct net_device *ndev,
netdev_features_t features)
 {
netdev_features_t changed = ndev->features ^ features;
+   int err;
+
+   if (changed & NETIF_F_HW_VLAN_CTAG_RX) {
+   /* Update the behavior of vlan accel in the adapter */
+   err = qlge_update_hw_vlan_features(ndev, features);
+   if (err)
+   return err;
 
-   if (changed & NETIF_F_HW_VLAN_CTAG_RX)
qlge_vlan_mode(ndev, features);
+   }
 
return 0;
 }
@@ -4719,7 +4713,6 @@ static int ql_init_device(struct pci_dev *pdev, struct 
net_device *ndev,
.ndo_set_mac_address= qlge_set_mac_address,
.ndo_validate_addr  = eth_validate_addr,
.ndo_tx_timeout = qlge_tx_timeout,
-   .ndo_fix_features   = qlge_fix_features,
.ndo_set_features   = qlge_set_features,
.ndo_vlan_rx_add_vid= qlge_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = qlge_vlan_rx_kill_vid,
-- 
1.8.3.1



[PATCH net 1/1] qlge: Fix netdev features configuration.

2018-08-23 Thread Manish Chopra
qlge_fix_features() is not supposed to modify hardware or
driver state, rather it is supposed to only fix requested
fetures bits. Currently qlge_fix_features() also goes for
interface down and up unnecessarily if there is not even
any change in features set.

This patch changes/fixes following -

1) Move reload of interface or device re-config from
   qlge_fix_features() to qlge_set_features().
2) Reload of interface in qlge_set_features() only if
   relevant feature bit (NETIF_F_HW_VLAN_CTAG_RX) is changed.
3) Get rid of qlge_fix_features() since driver is not really
   required to fix any features bit.

Signed-off-by: Manish 
Reviewed-by: Benjamin Poirier 
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c 
b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 353f1c1..059ba94 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -2384,26 +2384,20 @@ static int qlge_update_hw_vlan_features(struct 
net_device *ndev,
return status;
 }
 
-static netdev_features_t qlge_fix_features(struct net_device *ndev,
-   netdev_features_t features)
-{
-   int err;
-
-   /* Update the behavior of vlan accel in the adapter */
-   err = qlge_update_hw_vlan_features(ndev, features);
-   if (err)
-   return err;
-
-   return features;
-}
-
 static int qlge_set_features(struct net_device *ndev,
netdev_features_t features)
 {
netdev_features_t changed = ndev->features ^ features;
+   int err;
+
+   if (changed & NETIF_F_HW_VLAN_CTAG_RX) {
+   /* Update the behavior of vlan accel in the adapter */
+   err = qlge_update_hw_vlan_features(ndev, features);
+   if (err)
+   return err;
 
-   if (changed & NETIF_F_HW_VLAN_CTAG_RX)
qlge_vlan_mode(ndev, features);
+   }
 
return 0;
 }
@@ -4719,7 +4713,6 @@ static int ql_init_device(struct pci_dev *pdev, struct 
net_device *ndev,
.ndo_set_mac_address= qlge_set_mac_address,
.ndo_validate_addr  = eth_validate_addr,
.ndo_tx_timeout = qlge_tx_timeout,
-   .ndo_fix_features   = qlge_fix_features,
.ndo_set_features   = qlge_set_features,
.ndo_vlan_rx_add_vid= qlge_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid   = qlge_vlan_rx_kill_vid,
-- 
1.8.3.1



Re: [net 00/13][pull request] Intel Wired LAN Driver Fixes 2018-08-23

2018-08-23 Thread David Miller
From: Jeff Kirsher 
Date: Thu, 23 Aug 2018 12:14:50 -0700

> This series contains bug fixes to the ice driver.

Pulled, thanks Jeff.


[net 00/13][pull request] Intel Wired LAN Driver Fixes 2018-08-23

2018-08-23 Thread Jeff Kirsher
This series contains bug fixes to the ice driver.

Anirudh provides several fixes, starting with static analysis fixes by
replacing a bitwise-and with a constant value and replace "magic"
numbers with defines.  Fixed the control queue processing by removing
unnecessary read/writes to registers, as well as getting a accurate
value for "pending".  Added additional checks to avoid NULL pointer
dereferences.  Fixed up code formatting issues, by cleaning up code
comments and coding style.

Bruce cleans up a duplicate check for owner, within the same function.
Also cleans up interrupt causes that are not handled or applicable.
Fix checkpatch warning about the use of bool in structures due to the
wasted space and size of bool, so convert struct members to u8 instead.

Jake fixes a number of potential bugs in the reporting of stats via
ethtool, by simply reporting all the queue statistics, even for the
queues that are not activated.  Fixed a compiler warning, as well as
make the code a bit cleaner but just using order_base_2() for
calculating the power-of-2.

Preethi adds a check to avoid a NULL pointer dereference crash during
initialization.

Brett clarifies the code when it comes to port VLANs and regular VLANs,
by renaming defines and field values to match their intended use and
purpose.

Jesse initializes a variable to avoid garbage values being returned to
the caller.

The following are changes since commit 0d092f06faa46b95a8e07b9bb5737b7c0f1176ee:
  Merge branch 'for-upstream' of 
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 100GbE

Anirudh Venkataramanan (5):
  ice: Fix multiple static analyser warnings
  ice: Cleanup magic number
  ice: Fix bugs in control queue processing
  ice: Fix a few null pointer dereference issues
  ice: Trivial formatting fixes

Brett Creeley (1):
  ice: Set VLAN flags correctly

Bruce Allan (3):
  ice: Remove unnecessary node owner check
  ice: Update to interrupts enabled in OICR
  ice: Change struct members from bool to u8

Jacob Keller (2):
  ice: Report stats for allocated queues via ethtool stats
  ice: Use order_base_2 to calculate higher power of 2

Jesse Brandeburg (1):
  ice: Fix potential return of uninitialized value

Preethi Banala (1):
  ice: Clean control queues only when they are initialized

 drivers/net/ethernet/intel/ice/ice.h  |  15 ++-
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  25 ++--
 drivers/net/ethernet/intel/ice/ice_common.c   |  30 +++--
 drivers/net/ethernet/intel/ice/ice_controlq.c |  29 +++--
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  52 ++--
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 --
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h|   1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 115 ++
 drivers/net/ethernet/intel/ice/ice_nvm.c  |   5 +-
 drivers/net/ethernet/intel/ice/ice_sched.c|   3 +-
 drivers/net/ethernet/intel/ice/ice_switch.c   |   4 +-
 drivers/net/ethernet/intel/ice/ice_switch.h   |   6 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h |   2 +-
 drivers/net/ethernet/intel/ice/ice_type.h |  16 +--
 14 files changed, 185 insertions(+), 126 deletions(-)

-- 
2.17.1



[net 03/13] ice: Cleanup magic number

2018-08-23 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Use define for the unit size shift of the Rx LAN context descriptor base
address instead of the magic number 7.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h | 1 +
 drivers/net/ethernet/intel/ice/ice_main.c  | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h 
b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index d23a91665b46..068dbc740b76 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -265,6 +265,7 @@ enum ice_rx_flex_desc_status_error_0_bits {
 struct ice_rlan_ctx {
u16 head;
u16 cpuid; /* bigger than needed, see above for reason */
+#define ICE_RLAN_BASE_S 7
u64 base;
u16 qlen;
 #define ICE_RLAN_CTX_DBUF_S 7
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 186e764a469a..7d65e0ed3588 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3983,7 +3983,7 @@ static int ice_setup_rx_ctx(struct ice_ring *ring)
/* clear the context structure first */
memset(_ctx, 0, sizeof(rlan_ctx));
 
-   rlan_ctx.base = ring->dma >> 7;
+   rlan_ctx.base = ring->dma >> ICE_RLAN_BASE_S;
 
rlan_ctx.qlen = ring->count;
 
-- 
2.17.1



[net 04/13] ice: Report stats for allocated queues via ethtool stats

2018-08-23 Thread Jeff Kirsher
From: Jacob Keller 

It is not safe to have the string table for statistics change order or
size over the lifetime of a given netdevice. This is because of the
nature of the 3-step process for obtaining stats. First, user space
performs a request for the size of the strings table. Second it performs
a separate request for the strings themselves, after allocating space
for the table. Third, it requests the stats themselves, also allocating
space for the table.

If the size decreased, there is potential to see garbage data or stats
values. In the worst case, we could potentially see stats values become
mis-aligned with their strings, so that it looks like a statistic is
being reported differently than it actually is.

Even worse, if the size increased, there is potential that the strings
table or stats table was not allocated large enough and the stats code
could access and write to memory it should not, potentially resulting in
undefined behavior and system crashes.

It isn't even safe if the size always changes under the RTNL lock. This
is because the calls take place over multiple user space commands, so it
is not possible to hold the RTNL lock for the entire duration of
obtaining strings and stats. Further, not all consumers of the ethtool
API are the user space ethtool program, and it is possible that one
assumes the strings will not change (valid under the current contract),
and thus only requests the stats values when requesting stats in a loop.

Finally, it's not possible in the general case to detect when the size
changes, because it is quite possible that one value which could impact
the stat size increased, while another decreased. This would result in
the same total number of stats, but reordering them so that stats no
longer line up with the strings they belong to. Since only size changes
aren't enough, we would need some sort of hash or token to determine
when the strings no longer match. This would require extending the
ethtool stats commands, but there is no more space in the relevant
structures.

The real solution to resolve this would be to add a completely new API
for stats, probably over netlink.

In the ice driver, the only thing impacting the stats that is not
constant is the number of queues. Instead of reporting stats for each
used queue, report stats for each allocated queue. We do not change the
number of queues allocated for a given netdevice, as we pass this into
the alloc_etherdev_mq() function to set the num_tx_queues and
num_rx_queues.

This resolves the potential bugs at the slight cost of displaying many
queue statistics which will not be activated.

Signed-off-by: Jacob Keller 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h |  7 +++
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 52 +++-
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index d8b5fff581e7..ed071ea75f20 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -89,6 +89,13 @@ extern const char ice_drv_ver[];
 #define ice_for_each_rxq(vsi, i) \
for ((i) = 0; (i) < (vsi)->num_rxq; (i)++)
 
+/* Macros for each allocated tx/rx ring whether used or not in a VSI */
+#define ice_for_each_alloc_txq(vsi, i) \
+   for ((i) = 0; (i) < (vsi)->alloc_txq; (i)++)
+
+#define ice_for_each_alloc_rxq(vsi, i) \
+   for ((i) = 0; (i) < (vsi)->alloc_rxq; (i)++)
+
 struct ice_tc_info {
u16 qoffset;
u16 qcount;
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c 
b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 1db304c01d10..c71a9b528d6d 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -26,7 +26,7 @@ static int ice_q_stats_len(struct net_device *netdev)
 {
struct ice_netdev_priv *np = netdev_priv(netdev);
 
-   return ((np->vsi->num_txq + np->vsi->num_rxq) *
+   return ((np->vsi->alloc_txq + np->vsi->alloc_rxq) *
(sizeof(struct ice_q_stats) / sizeof(u64)));
 }
 
@@ -218,7 +218,7 @@ static void ice_get_strings(struct net_device *netdev, u32 
stringset, u8 *data)
p += ETH_GSTRING_LEN;
}
 
-   ice_for_each_txq(vsi, i) {
+   ice_for_each_alloc_txq(vsi, i) {
snprintf(p, ETH_GSTRING_LEN,
 "tx-queue-%u.tx_packets", i);
p += ETH_GSTRING_LEN;
@@ -226,7 +226,7 @@ static void ice_get_strings(struct net_device *netdev, u32 
stringset, u8 *data)
p += ETH_GSTRING_LEN;
}
 
-   ice_for_each_rxq(vsi, i) {
+   ice_for_each_alloc_rxq(vsi, i) {
snprintf(p, ETH_GSTRING_LEN,
 "rx-queue-%u.rx_packets", i);
   

[net 06/13] ice: Fix bugs in control queue processing

2018-08-23 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch is a consolidation of multiple bug fixes for control queue
processing.

1)  In ice_clean_adminq_subtask() remove unnecessary reads/writes to
registers. The bits PFINT_FW_CTL, PFINT_MBX_CTL and PFINT_SB_CTL
are not set when an interrupt arrives, which means that clearing them
again can be omitted.

2)  Get an accurate value in "pending" by re-reading the control queue
head register from the hardware.

3)  Fix a corner case involving lost control queue messages by checking
for new control messages (using ice_ctrlq_pending) before exiting the
cleanup routine.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_controlq.c |  5 +++-
 drivers/net/ethernet/intel/ice/ice_main.c | 26 ---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c 
b/drivers/net/ethernet/intel/ice/ice_controlq.c
index c064416080e7..62be72fdc8f3 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.c
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.c
@@ -1065,8 +1065,11 @@ ice_clean_rq_elem(struct ice_hw *hw, struct 
ice_ctl_q_info *cq,
 
 clean_rq_elem_out:
/* Set pending if needed, unlock and return */
-   if (pending)
+   if (pending) {
+   /* re-read HW head to calculate actual pending messages */
+   ntu = (u16)(rd32(hw, cq->rq.head) & cq->rq.head_mask);
*pending = (u16)((ntc > ntu ? cq->rq.count : 0) + (ntu - ntc));
+   }
 clean_rq_elem_err:
mutex_unlock(>rq_lock);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 7d65e0ed3588..f3ba4f76b6cb 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -916,6 +916,21 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum 
ice_ctl_q q_type)
return pending && (i == ICE_DFLT_IRQ_WORK);
 }
 
+/**
+ * ice_ctrlq_pending - check if there is a difference between ntc and ntu
+ * @hw: pointer to hardware info
+ * @cq: control queue information
+ *
+ * returns true if there are pending messages in a queue, false if there aren't
+ */
+static bool ice_ctrlq_pending(struct ice_hw *hw, struct ice_ctl_q_info *cq)
+{
+   u16 ntu;
+
+   ntu = (u16)(rd32(hw, cq->rq.head) & cq->rq.head_mask);
+   return cq->rq.next_to_clean != ntu;
+}
+
 /**
  * ice_clean_adminq_subtask - clean the AdminQ rings
  * @pf: board private structure
@@ -923,7 +938,6 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum 
ice_ctl_q q_type)
 static void ice_clean_adminq_subtask(struct ice_pf *pf)
 {
struct ice_hw *hw = >hw;
-   u32 val;
 
if (!test_bit(__ICE_ADMINQ_EVENT_PENDING, pf->state))
return;
@@ -933,9 +947,13 @@ static void ice_clean_adminq_subtask(struct ice_pf *pf)
 
clear_bit(__ICE_ADMINQ_EVENT_PENDING, pf->state);
 
-   /* re-enable Admin queue interrupt causes */
-   val = rd32(hw, PFINT_FW_CTL);
-   wr32(hw, PFINT_FW_CTL, (val | PFINT_FW_CTL_CAUSE_ENA_M));
+   /* There might be a situation where new messages arrive to a control
+* queue between processing the last message and clearing the
+* EVENT_PENDING bit. So before exiting, check queue head again (using
+* ice_ctrlq_pending) and process new messages if any.
+*/
+   if (ice_ctrlq_pending(hw, >adminq))
+   __ice_clean_ctrlq(pf, ICE_CTL_Q_ADMIN);
 
ice_flush(hw);
 }
-- 
2.17.1



[net 02/13] ice: Remove unnecessary node owner check

2018-08-23 Thread Jeff Kirsher
From: Bruce Allan 

There is already a check for owner == ICE_SCHED_NODE_OWNER_LAN at the
beginning of ice_sched_update_vsi_child_nodes. Remove the additional
check to address the static analysis tool smatch issue "warn: we tested
'owner' before and it was 'false'".

Signed-off-by: Bruce Allan 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_sched.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c 
b/drivers/net/ethernet/intel/ice/ice_sched.c
index 2e6c1d92cc88..eeae199469b6 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -1576,8 +1576,7 @@ ice_sched_update_vsi_child_nodes(struct ice_port_info 
*pi, u16 vsi_id, u8 tc,
return status;
}
 
-   if (owner == ICE_SCHED_NODE_OWNER_LAN)
-   vsi->max_lanq[tc] = new_numqs;
+   vsi->max_lanq[tc] = new_numqs;
 
return status;
 }
-- 
2.17.1



[net 01/13] ice: Fix multiple static analyser warnings

2018-08-23 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch fixes the following smatch errors:

1) Fix "odd binop '0x0 & 0xc'" when performing the bitwise-and with a
   constant value of zero (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128_FLAG).
   Remove a similar bitwise-and with 0 in ice_add_marker_act() and use the
   right mask ICE_LG_ACT_GENERIC_OFFSET_M in the expression.

2) Fix a similar issue "odd binop '0x0 & 0x1800' in ice_req_irq_msix_misc.

3) Fix "odd binop '0x38 & 0x7fff8'" in ice_add_marker_act(). Also, use
   a new define ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX instead of magic
   number '7'.

4) Fix warn: odd binop '0x0 & 0x18' in ice_set_dflt_vsi_ctx() by removing
   unnecessary logic to explicitly unset bits 3 and 4 in port_vlan_bits.
   These bits are unset already by the memset on ctxt->info.

Reported-by: Dan Carpenter 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_common.c   | 25 +++
 drivers/net/ethernet/intel/ice/ice_main.c | 19 ++
 drivers/net/ethernet/intel/ice/ice_switch.c   |  4 +--
 4 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 7541ec2270b3..6d3e11659ba5 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -594,6 +594,7 @@ struct ice_sw_rule_lg_act {
 #define ICE_LG_ACT_GENERIC_OFFSET_M(0x7 << ICE_LG_ACT_GENERIC_OFFSET_S)
 #define ICE_LG_ACT_GENERIC_PRIORITY_S  22
 #define ICE_LG_ACT_GENERIC_PRIORITY_M  (0x7 << ICE_LG_ACT_GENERIC_PRIORITY_S)
+#define ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX7
 
/* Action = 7 - Set Stat count */
 #define ICE_LG_ACT_STAT_COUNT  0x7
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index 71d032cc5fa7..d5300b606d5a 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1619,20 +1619,23 @@ __ice_aq_get_set_rss_lut(struct ice_hw *hw, u16 vsi_id, 
u8 lut_type, u8 *lut,
}
 
/* LUT size is only valid for Global and PF table types */
-   if (lut_size == ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128) {
-   flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128_FLAG <<
- ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
-ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
-   } else if (lut_size == ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_512) {
+   switch (lut_size) {
+   case ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128:
+   break;
+   case ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_512:
flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_512_FLAG <<
  ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
 ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
-   } else if ((lut_size == ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K) &&
-  (lut_type == ICE_AQC_GSET_RSS_LUT_TABLE_TYPE_PF)) {
-   flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K_FLAG <<
- ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
-ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
-   } else {
+   break;
+   case ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K:
+   if (lut_type == ICE_AQC_GSET_RSS_LUT_TABLE_TYPE_PF) {
+   flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K_FLAG <<
+ ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
+ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
+   break;
+   }
+   /* fall-through */
+   default:
status = ICE_ERR_PARAM;
goto ice_aq_get_set_rss_lut_exit;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 5299caf55a7f..186e764a469a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1352,14 +1352,13 @@ static void ice_set_dflt_vsi_ctx(struct ice_vsi_ctx 
*ctxt)
ctxt->info.sw_flags = ICE_AQ_VSI_SW_FLAG_SRC_PRUNE;
/* Traffic from VSI can be sent to LAN */
ctxt->info.sw_flags2 = ICE_AQ_VSI_SW_FLAG_LAN_ENA;
-   /* Allow all packets untagged/tagged */
+   /* By default bits 3 and 4 in port_vlan_flags are 0's which results in
+* legacy behavior (show VLAN, DEI, and UP) in descriptor. Also, allow
+* all packets untagged/tagged.
+*/
ctxt->info.port_vlan_flags = ((ICE_AQ_VSI_PVLAN_MODE_ALL &
   ICE_AQ_VSI_PVLAN_MODE_M) >>
  ICE_AQ_VSI_PVLAN_MODE_S);
-   /* Show VLAN/UP from packets in Rx descriptors */
-   ctxt->info.port_vlan_flags |= ((ICE_AQ_VSI_PVLAN_EMOD_STR_BOTH &
-  

[net 05/13] ice: Clean control queues only when they are initialized

2018-08-23 Thread Jeff Kirsher
From: Preethi Banala 

Clean control queues only when they are initialized. One of the ways to
validate if the basic initialization is done is by checking value of
cq->sq.head and cq->rq.head variables that specify the register address.
This patch adds a check to avoid NULL pointer dereference crash when tried
to shutdown uninitialized control queue.

Signed-off-by: Preethi Banala 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_controlq.c | 24 ---
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c 
b/drivers/net/ethernet/intel/ice/ice_controlq.c
index 7c511f144ed6..c064416080e7 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.c
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.c
@@ -597,10 +597,14 @@ static enum ice_status ice_init_check_adminq(struct 
ice_hw *hw)
return 0;
 
 init_ctrlq_free_rq:
-   ice_shutdown_rq(hw, cq);
-   ice_shutdown_sq(hw, cq);
-   mutex_destroy(>sq_lock);
-   mutex_destroy(>rq_lock);
+   if (cq->rq.head) {
+   ice_shutdown_rq(hw, cq);
+   mutex_destroy(>rq_lock);
+   }
+   if (cq->sq.head) {
+   ice_shutdown_sq(hw, cq);
+   mutex_destroy(>sq_lock);
+   }
return status;
 }
 
@@ -706,10 +710,14 @@ static void ice_shutdown_ctrlq(struct ice_hw *hw, enum 
ice_ctl_q q_type)
return;
}
 
-   ice_shutdown_sq(hw, cq);
-   ice_shutdown_rq(hw, cq);
-   mutex_destroy(>sq_lock);
-   mutex_destroy(>rq_lock);
+   if (cq->sq.head) {
+   ice_shutdown_sq(hw, cq);
+   mutex_destroy(>sq_lock);
+   }
+   if (cq->rq.head) {
+   ice_shutdown_rq(hw, cq);
+   mutex_destroy(>rq_lock);
+   }
 }
 
 /**
-- 
2.17.1



[net 11/13] ice: Fix potential return of uninitialized value

2018-08-23 Thread Jeff Kirsher
From: Jesse Brandeburg 

In ice_vsi_setup_[tx|rx]_rings, err is uninitialized which can result in
a garbage value return to the caller. Fix that.

Signed-off-by: Jesse Brandeburg 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index d5d83c8848f8..e23156515186 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -4885,7 +4885,7 @@ int ice_down(struct ice_vsi *vsi)
  */
 static int ice_vsi_setup_tx_rings(struct ice_vsi *vsi)
 {
-   int i, err;
+   int i, err = 0;
 
if (!vsi->num_txq) {
dev_err(>back->pdev->dev, "VSI %d has 0 Tx queues\n",
@@ -4910,7 +4910,7 @@ static int ice_vsi_setup_tx_rings(struct ice_vsi *vsi)
  */
 static int ice_vsi_setup_rx_rings(struct ice_vsi *vsi)
 {
-   int i, err;
+   int i, err = 0;
 
if (!vsi->num_rxq) {
dev_err(>back->pdev->dev, "VSI %d has 0 Rx queues\n",
-- 
2.17.1



[net 12/13] ice: Change struct members from bool to u8

2018-08-23 Thread Jeff Kirsher
From: Bruce Allan 

Recent versions of checkpatch have a new warning based on a documented
preference of Linus to not use bool in structures due to wasted space and
the size of bool is implementation dependent.  For more information, see
the email thread at https://lkml.org/lkml/2017/11/21/384.

Signed-off-by: Bruce Allan 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h|  8 
 drivers/net/ethernet/intel/ice/ice_switch.h |  6 +++---
 drivers/net/ethernet/intel/ice/ice_txrx.h   |  2 +-
 drivers/net/ethernet/intel/ice/ice_type.h   | 16 
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index ed071ea75f20..868f4a1d0f72 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -196,9 +196,9 @@ struct ice_vsi {
struct list_head tmp_sync_list; /* MAC filters to be synced */
struct list_head tmp_unsync_list;   /* MAC filters to be unsynced */
 
-   bool irqs_ready;
-   bool current_isup;   /* Sync 'link up' logging */
-   bool stat_offsets_loaded;
+   u8 irqs_ready;
+   u8 current_isup; /* Sync 'link up' logging */
+   u8 stat_offsets_loaded;
 
/* queue information */
u8 tx_mapping_mode;  /* ICE_MAP_MODE_[CONTIG|SCATTER] */
@@ -269,7 +269,7 @@ struct ice_pf {
struct ice_hw_port_stats stats;
struct ice_hw_port_stats stats_prev;
struct ice_hw hw;
-   bool stat_prev_loaded;  /* has previous stats been loaded */
+   u8 stat_prev_loaded;/* has previous stats been loaded */
char int_name[ICE_INT_NAME_STR_LEN];
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.h 
b/drivers/net/ethernet/intel/ice/ice_switch.h
index 6f4a0d159dbf..9b8ec128ee31 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.h
+++ b/drivers/net/ethernet/intel/ice/ice_switch.h
@@ -17,7 +17,7 @@ struct ice_vsi_ctx {
u16 vsis_unallocated;
u16 flags;
struct ice_aqc_vsi_props info;
-   bool alloc_from_pool;
+   u8 alloc_from_pool;
 };
 
 enum ice_sw_fwd_act_type {
@@ -94,8 +94,8 @@ struct ice_fltr_info {
u8 qgrp_size;
 
/* Rule creations populate these indicators basing on the switch type */
-   bool lb_en; /* Indicate if packet can be looped back */
-   bool lan_en;/* Indicate if packet can be forwarded to the uplink */
+   u8 lb_en;   /* Indicate if packet can be looped back */
+   u8 lan_en;  /* Indicate if packet can be forwarded to the uplink */
 };
 
 /* Bookkeeping structure to hold bitmap of VSIs corresponding to VSI list id */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h 
b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 567067b650c4..31bc998fe200 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -143,7 +143,7 @@ struct ice_ring {
u16 next_to_use;
u16 next_to_clean;
 
-   bool ring_active;   /* is ring online or not */
+   u8 ring_active; /* is ring online or not */
 
/* stats structs */
struct ice_q_stats  stats;
diff --git a/drivers/net/ethernet/intel/ice/ice_type.h 
b/drivers/net/ethernet/intel/ice/ice_type.h
index 99c8a9a71b5e..97c366e0ca59 100644
--- a/drivers/net/ethernet/intel/ice/ice_type.h
+++ b/drivers/net/ethernet/intel/ice/ice_type.h
@@ -83,7 +83,7 @@ struct ice_link_status {
u64 phy_type_low;
u16 max_frame_size;
u16 link_speed;
-   bool lse_ena;   /* Link Status Event notification */
+   u8 lse_ena; /* Link Status Event notification */
u8 link_info;
u8 an_info;
u8 ext_info;
@@ -101,7 +101,7 @@ struct ice_phy_info {
struct ice_link_status link_info_old;
u64 phy_type_low;
enum ice_media_type media_type;
-   bool get_link_info;
+   u8 get_link_info;
 };
 
 /* Common HW capabilities for SW use */
@@ -167,7 +167,7 @@ struct ice_nvm_info {
u32 oem_ver;  /* OEM version info */
u16 sr_words; /* Shadow RAM size in words */
u16 ver;  /* NVM package version */
-   bool blank_nvm_mode;  /* is NVM empty (no FW present) */
+   u8 blank_nvm_mode;/* is NVM empty (no FW present) */
 };
 
 /* Max number of port to queue branches w.r.t topology */
@@ -181,7 +181,7 @@ struct ice_sched_node {
struct ice_aqc_txsched_elem_data info;
u32 agg_id; /* aggregator group id */
u16 vsi_id;
-   bool in_use;/* suspended or in use */
+   u8 in_use;  /* suspended or in use */
u8 tx_sched_layer;  /* Logical Layer (1-9) */
u8 num_children;
u8 tc_num;

[net 09/13] ice: Update to interrupts enabled in OICR

2018-08-23 Thread Jeff Kirsher
From: Bruce Allan 

Remove the following interrupt causes that are not applicable or not
handled:
- PFINT_OICR_HLP_RDY_M
- PFINT_OICR_CPM_RDY_M
- PFINT_OICR_GPIO_M
- PFINT_OICR_STORM_DETECT_M

Add the following interrupt cause that's actually handled in ice_misc_intr:
- PFINT_OICR_PE_CRITERR_M

Signed-off-by: Bruce Allan 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 8 
 drivers/net/ethernet/intel/ice/ice_main.c   | 9 +++--
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h 
b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 499904874b3f..6076fc87df9d 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -121,10 +121,6 @@
 #define PFINT_FW_CTL_CAUSE_ENA_S   30
 #define PFINT_FW_CTL_CAUSE_ENA_M   BIT(PFINT_FW_CTL_CAUSE_ENA_S)
 #define PFINT_OICR 0x0016CA00
-#define PFINT_OICR_HLP_RDY_S   14
-#define PFINT_OICR_HLP_RDY_M   BIT(PFINT_OICR_HLP_RDY_S)
-#define PFINT_OICR_CPM_RDY_S   15
-#define PFINT_OICR_CPM_RDY_M   BIT(PFINT_OICR_CPM_RDY_S)
 #define PFINT_OICR_ECC_ERR_S   16
 #define PFINT_OICR_ECC_ERR_M   BIT(PFINT_OICR_ECC_ERR_S)
 #define PFINT_OICR_MAL_DETECT_S19
@@ -133,10 +129,6 @@
 #define PFINT_OICR_GRST_M  BIT(PFINT_OICR_GRST_S)
 #define PFINT_OICR_PCI_EXCEPTION_S 21
 #define PFINT_OICR_PCI_EXCEPTION_M BIT(PFINT_OICR_PCI_EXCEPTION_S)
-#define PFINT_OICR_GPIO_S  22
-#define PFINT_OICR_GPIO_M  BIT(PFINT_OICR_GPIO_S)
-#define PFINT_OICR_STORM_DETECT_S  24
-#define PFINT_OICR_STORM_DETECT_M  BIT(PFINT_OICR_STORM_DETECT_S)
 #define PFINT_OICR_HMC_ERR_S   26
 #define PFINT_OICR_HMC_ERR_M   BIT(PFINT_OICR_HMC_ERR_S)
 #define PFINT_OICR_PE_CRITERR_S28
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 68003fad33d1..34be94a30a60 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1704,15 +1704,12 @@ static void ice_ena_misc_vector(struct ice_pf *pf)
wr32(hw, PFINT_OICR_ENA, 0);/* disable all */
rd32(hw, PFINT_OICR);   /* read to clear */
 
-   val = (PFINT_OICR_HLP_RDY_M |
-  PFINT_OICR_CPM_RDY_M |
-  PFINT_OICR_ECC_ERR_M |
+   val = (PFINT_OICR_ECC_ERR_M |
   PFINT_OICR_MAL_DETECT_M |
   PFINT_OICR_GRST_M |
   PFINT_OICR_PCI_EXCEPTION_M |
-  PFINT_OICR_GPIO_M |
-  PFINT_OICR_STORM_DETECT_M |
-  PFINT_OICR_HMC_ERR_M);
+  PFINT_OICR_HMC_ERR_M |
+  PFINT_OICR_PE_CRITERR_M);
 
wr32(hw, PFINT_OICR_ENA, val);
 
-- 
2.17.1



[net 08/13] ice: Set VLAN flags correctly

2018-08-23 Thread Jeff Kirsher
From: Brett Creeley 

In the struct ice_aqc_vsi_props the field port_vlan_flags is an
overloaded term because it is used for both port VLANs (PVLANs) and
regular VLANs. This is an issue and is very confusing especially when
dealing with VFs because normal VLANs and port VLANs are not the same.
To fix this the field was renamed to vlan_flags and all of the #define's
labeled *_PVLAN_* were renamed to *_VLAN_* if they are not specific to
port VLANs.

Also in ice_vsi_manage_vlan_stripping, set the ICE_AQ_VSI_VLAN_MODE_ALL
bit to allow the driver to add a VLAN tag to all packets it sends.

Signed-off-by: Brett Creeley 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   | 24 +++---
 drivers/net/ethernet/intel/ice/ice_main.c | 31 +++
 2 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 6d3e11659ba5..a0614f472658 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -329,19 +329,19 @@ struct ice_aqc_vsi_props {
/* VLAN section */
__le16 pvid; /* VLANS include priority bits */
u8 pvlan_reserved[2];
-   u8 port_vlan_flags;
-#define ICE_AQ_VSI_PVLAN_MODE_S0
-#define ICE_AQ_VSI_PVLAN_MODE_M(0x3 << ICE_AQ_VSI_PVLAN_MODE_S)
-#define ICE_AQ_VSI_PVLAN_MODE_UNTAGGED 0x1
-#define ICE_AQ_VSI_PVLAN_MODE_TAGGED   0x2
-#define ICE_AQ_VSI_PVLAN_MODE_ALL  0x3
+   u8 vlan_flags;
+#define ICE_AQ_VSI_VLAN_MODE_S 0
+#define ICE_AQ_VSI_VLAN_MODE_M (0x3 << ICE_AQ_VSI_VLAN_MODE_S)
+#define ICE_AQ_VSI_VLAN_MODE_UNTAGGED  0x1
+#define ICE_AQ_VSI_VLAN_MODE_TAGGED0x2
+#define ICE_AQ_VSI_VLAN_MODE_ALL   0x3
 #define ICE_AQ_VSI_PVLAN_INSERT_PVID   BIT(2)
-#define ICE_AQ_VSI_PVLAN_EMOD_S3
-#define ICE_AQ_VSI_PVLAN_EMOD_M(0x3 << ICE_AQ_VSI_PVLAN_EMOD_S)
-#define ICE_AQ_VSI_PVLAN_EMOD_STR_BOTH (0x0 << ICE_AQ_VSI_PVLAN_EMOD_S)
-#define ICE_AQ_VSI_PVLAN_EMOD_STR_UP   (0x1 << ICE_AQ_VSI_PVLAN_EMOD_S)
-#define ICE_AQ_VSI_PVLAN_EMOD_STR  (0x2 << ICE_AQ_VSI_PVLAN_EMOD_S)
-#define ICE_AQ_VSI_PVLAN_EMOD_NOTHING  (0x3 << ICE_AQ_VSI_PVLAN_EMOD_S)
+#define ICE_AQ_VSI_VLAN_EMOD_S 3
+#define ICE_AQ_VSI_VLAN_EMOD_M (0x3 << ICE_AQ_VSI_VLAN_EMOD_S)
+#define ICE_AQ_VSI_VLAN_EMOD_STR_BOTH  (0x0 << ICE_AQ_VSI_VLAN_EMOD_S)
+#define ICE_AQ_VSI_VLAN_EMOD_STR_UP(0x1 << ICE_AQ_VSI_VLAN_EMOD_S)
+#define ICE_AQ_VSI_VLAN_EMOD_STR   (0x2 << ICE_AQ_VSI_VLAN_EMOD_S)
+#define ICE_AQ_VSI_VLAN_EMOD_NOTHING   (0x3 << ICE_AQ_VSI_VLAN_EMOD_S)
u8 pvlan_reserved2[3];
/* ingress egress up sections */
__le32 ingress_table; /* bitmap, 3 bits per up */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 3eff1d2d1543..68003fad33d1 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1367,13 +1367,15 @@ static void ice_set_dflt_vsi_ctx(struct ice_vsi_ctx 
*ctxt)
ctxt->info.sw_flags = ICE_AQ_VSI_SW_FLAG_SRC_PRUNE;
/* Traffic from VSI can be sent to LAN */
ctxt->info.sw_flags2 = ICE_AQ_VSI_SW_FLAG_LAN_ENA;
-   /* By default bits 3 and 4 in port_vlan_flags are 0's which results in
-* legacy behavior (show VLAN, DEI, and UP) in descriptor. Also, allow
-* all packets untagged/tagged.
+
+   /* By default bits 3 and 4 in vlan_flags are 0's which results in legacy
+* behavior (show VLAN, DEI, and UP) in descriptor. Also, allow all
+* packets untagged/tagged.
 */
-   ctxt->info.port_vlan_flags = ((ICE_AQ_VSI_PVLAN_MODE_ALL &
-  ICE_AQ_VSI_PVLAN_MODE_M) >>
- ICE_AQ_VSI_PVLAN_MODE_S);
+   ctxt->info.vlan_flags = ((ICE_AQ_VSI_VLAN_MODE_ALL &
+ ICE_AQ_VSI_VLAN_MODE_M) >>
+ICE_AQ_VSI_VLAN_MODE_S);
+
/* Have 1:1 UP mapping for both ingress/egress tables */
table |= ICE_UP_TABLE_TRANSLATE(0, 0);
table |= ICE_UP_TABLE_TRANSLATE(1, 1);
@@ -3732,10 +3734,10 @@ static int ice_vsi_manage_vlan_insertion(struct ice_vsi 
*vsi)
enum ice_status status;
 
/* Here we are configuring the VSI to let the driver add VLAN tags by
-* setting port_vlan_flags to ICE_AQ_VSI_PVLAN_MODE_ALL. The actual VLAN
-* tag insertion happens in the Tx hot path, in ice_tx_map.
+* setting vlan_flags to ICE_AQ_VSI_VLAN_MODE_ALL. The actual VLAN tag
+* insertion happens in the Tx hot path, in ice_tx_map.
 */
-   ctxt.info.port_vlan_flags = ICE_AQ_VSI_PVLAN_MODE_ALL;
+   ctxt.info.vlan_flags = ICE_AQ_VSI_VLAN_MODE_ALL;
 
ctxt.info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_VLAN_VALID);

[net 13/13] ice: Trivial formatting fixes

2018-08-23 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

1) Add missing "\n" when printing link event error message.

2) Update dev_err statement in probe.

3) Add function description for ice_clear_pf_cfg.

4) Fix coding style for ice_acquire_nvm.

5) netdev->mtu is unsigned so use %u.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_common.c | 3 +++
 drivers/net/ethernet/intel/ice/ice_main.c   | 6 +++---
 drivers/net/ethernet/intel/ice/ice_nvm.c| 5 ++---
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index ebd701ac9428..661beea6af79 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -45,6 +45,9 @@ static enum ice_status ice_set_mac_type(struct ice_hw *hw)
 /**
  * ice_clear_pf_cfg - Clear PF configuration
  * @hw: pointer to the hardware structure
+ *
+ * Clears any existing PF configuration (VSIs, VSI lists, switch rules, port
+ * configuration, flow director filters, etc.).
  */
 enum ice_status ice_clear_pf_cfg(struct ice_hw *hw)
 {
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index e23156515186..f1e80eed2fd6 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -901,7 +901,7 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum 
ice_ctl_q q_type)
case ice_aqc_opc_get_link_status:
if (ice_handle_link_event(pf))
dev_err(>pdev->dev,
-   "Could not handle link event");
+   "Could not handle link event\n");
break;
default:
dev_dbg(>pdev->dev,
@@ -3284,7 +3284,7 @@ static int ice_probe(struct pci_dev *pdev,
 
err = pcim_iomap_regions(pdev, BIT(ICE_BAR0), pci_name(pdev));
if (err) {
-   dev_err(>dev, "I/O map error %d\n", err);
+   dev_err(>dev, "BAR0 I/O map error %d\n", err);
return err;
}
 
@@ -5252,7 +5252,7 @@ static int ice_change_mtu(struct net_device *netdev, int 
new_mtu)
u8 count = 0;
 
if (new_mtu == netdev->mtu) {
-   netdev_warn(netdev, "mtu is already %d\n", netdev->mtu);
+   netdev_warn(netdev, "mtu is already %u\n", netdev->mtu);
return 0;
}
 
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c 
b/drivers/net/ethernet/intel/ice/ice_nvm.c
index 92da0a626ce0..295a8cd87fc1 100644
--- a/drivers/net/ethernet/intel/ice/ice_nvm.c
+++ b/drivers/net/ethernet/intel/ice/ice_nvm.c
@@ -131,9 +131,8 @@ ice_read_sr_word_aq(struct ice_hw *hw, u16 offset, u16 
*data)
  *
  * This function will request NVM ownership.
  */
-static enum
-ice_status ice_acquire_nvm(struct ice_hw *hw,
-  enum ice_aq_res_access_type access)
+static enum ice_status
+ice_acquire_nvm(struct ice_hw *hw, enum ice_aq_res_access_type access)
 {
if (hw->nvm.blank_nvm_mode)
return 0;
-- 
2.17.1



[net 10/13] ice: Fix a few null pointer dereference issues

2018-08-23 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

1) When ice_ena_msix_range() fails to reserve vectors, a devm_kfree()
   warning was seen in the error flow path. So check pf->irq_tracker
   before use in ice_clear_interrupt_scheme().

2) In ice_vsi_cfg(), check vsi->netdev before use.

3) In ice_get_link_status, check link_up before use.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_common.c |  2 +-
 drivers/net/ethernet/intel/ice/ice_main.c   | 17 ++---
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index d5300b606d5a..ebd701ac9428 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1483,7 +1483,7 @@ enum ice_status ice_get_link_status(struct ice_port_info 
*pi, bool *link_up)
struct ice_phy_info *phy_info;
enum ice_status status = 0;
 
-   if (!pi)
+   if (!pi || !link_up)
return ICE_ERR_PARAM;
 
phy_info = >phy;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 34be94a30a60..d5d83c8848f8 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3257,8 +3257,10 @@ static void ice_clear_interrupt_scheme(struct ice_pf *pf)
if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
ice_dis_msix(pf);
 
-   devm_kfree(>pdev->dev, pf->irq_tracker);
-   pf->irq_tracker = NULL;
+   if (pf->irq_tracker) {
+   devm_kfree(>pdev->dev, pf->irq_tracker);
+   pf->irq_tracker = NULL;
+   }
 }
 
 /**
@@ -4112,11 +4114,12 @@ static int ice_vsi_cfg(struct ice_vsi *vsi)
 {
int err;
 
-   ice_set_rx_mode(vsi->netdev);
-
-   err = ice_restore_vlan(vsi);
-   if (err)
-   return err;
+   if (vsi->netdev) {
+   ice_set_rx_mode(vsi->netdev);
+   err = ice_restore_vlan(vsi);
+   if (err)
+   return err;
+   }
 
err = ice_vsi_cfg_txqs(vsi);
if (!err)
-- 
2.17.1



[net 07/13] ice: Use order_base_2 to calculate higher power of 2

2018-08-23 Thread Jeff Kirsher
From: Jacob Keller 

Currently, we use a combination of ilog2 and is_power_of_2() to
calculate the next power of 2 for the qcount. This appears to be causing
a warning on some combinations of GCC and the Linux kernel:

MODPOST 1 modules
WARNING: "ilog2_NaN" [ice.ko] undefined!

This appears to because because GCC realizes that qcount could be zero
in some circumstances and thus attempts to link against the
intentionally undefined ___ilog2_NaN function.

The order_base_2 function is intentionally defined to return 0 when
passed 0 as an argument, and thus will be safe to use here.

This not only fixes the warning but makes the resulting code slightly
cleaner, and is really what we should have used originally.

Also update the comment to make it more clear that we are rounding up,
not just incrementing the ilog2 of qcount unconditionally.

Signed-off-by: Jacob Keller 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Tony Brelinski 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index f3ba4f76b6cb..3eff1d2d1543 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1313,11 +1313,8 @@ static void ice_vsi_setup_q_map(struct ice_vsi *vsi, 
struct ice_vsi_ctx *ctxt)
qcount = numq_tc;
}
 
-   /* find higher power-of-2 of qcount */
-   pow = ilog2(qcount);
-
-   if (!is_power_of_2(qcount))
-   pow++;
+   /* find the (rounded up) power-of-2 of qcount */
+   pow = order_base_2(qcount);
 
for (i = 0; i < ICE_MAX_TRAFFIC_CLASS; i++) {
if (!(vsi->tc_cfg.ena_tc & BIT(i))) {
-- 
2.17.1



Re: [v3, net-next, 02/12] net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit

2018-08-23 Thread Martin Blumenstingl
Hi,

On Fri, Aug 17, 2018 at 9:32 AM Jerome Brunet  wrote:
>
> On Fri, 2018-05-18 at 14:55 +0100, Jose Abreu wrote:
> > This is cutting down performance. Once the timer is armed it should run
> > after the time expires for the first packet sent and not the last one.
> >
> > After this change, running iperf, the performance gain is +/- 24%.
>
> Hi Guys,
>
> Since v4.18, we are getting a serious regression on Amlogic based SoCs.
> I have tested this on amlogic's:
> * gxbb S905 p200 (Micrel KSZ9031 - 1GBps)
> * axg A113 s400 (Realtek RTL8211F - 1GBps)
>
> Both SoCs use the synopsys gmac with stmmac driver.
I can confirm this on Odroid-C1 (Meson8b SoC with RTL8211F RGMII PHY) as well

> I first noticed that running NFS root filesystem became unstable but I could 
> not
> understand why. Then, running a download as simple test with iperf3 (from an
> initramfs) will break the 'network' in matter of seconds.
I didn't run iperf, simply downloading the latest rootfs package
updates (on Arch Linux ARM) caused the network to break

> I don't know exactly what breaks but bisect clearly assign the blame to this
> change. Reverting the change solve this problem.
>
> I'll be happy to make more tests to help understand what is happening here.
if some latency is fine then I can also help testing

here's a bootlog excerpt with the info from the dwmac-meson8b driver
(used on all platforms listed above):
meson8b-dwmac c941.ethernet: PTP uses main clock
meson8b-dwmac c941.ethernet: User ID: 0x10, Synopsys ID: 0x37
meson8b-dwmac c941.ethernet: DWMAC1000
meson8b-dwmac c941.ethernet: DMA HW capability register supported
meson8b-dwmac c941.ethernet: RX Checksum Offload Engine supported
meson8b-dwmac c941.ethernet: COE Type 2
meson8b-dwmac c941.ethernet: TX Checksum insertion supported
meson8b-dwmac c941.ethernet: Wake-Up On Lan supported
meson8b-dwmac c941.ethernet: Normal descriptors
meson8b-dwmac c941.ethernet: Ring mode enabled
meson8b-dwmac c941.ethernet: Enable RX Mitigation via HW Watchdog Timer
...
meson8b-dwmac c941.ethernet eth0: device MAC address [...random
mac address...]
RTL8211F Gigabit Ethernet stmmac-0:00: attached PHY driver [RTL8211F
Gigabit Ethernet] (mii_bus:phy_addr=stmmac-0:00, irq=27)
...
meson8b-dwmac c941.ethernet eth0: No Safety Features support found
meson8b-dwmac c941.ethernet eth0: PTP not supported by HW
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready


Regards
Martin


Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Daniel Borkmann
On 08/23/2018 07:48 PM, Quentin Monnet wrote:
> 2018-08-23 20:35 UTC+0300 ~ Sergei Shtylyov
> 
>> Hello!
>>
>> On 08/23/2018 07:46 PM, Quentin Monnet wrote:
>>
>>> When command line parsing fails in the while loop in do_event_pipe()
>>> because the number of arguments is incorrect or because the keyword is
>>> unknown, an error message is displayed, but bpfool
>>
>>bp-who? ;-)
>>
>>> remains stucked in
>>
>>Stuck.
>>
>>> the loop. Make sure we exit the loop upon failure.
>>>
>>> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
>>> Signed-off-by: Quentin Monnet 
>>> Reviewed-by: Jakub Kicinski 
>> [...]
>>
>> MBR, Sergei
> 
> Thanks Sergei! The patch has been applied so I cannot fix these, but
> I'll make sure to give an additional pass to my future commit logs…

I fixed the two commit log typos up, thanks.


Re: [PATCH bpf] xsk: fix return value of xdp_umem_assign_dev()

2018-08-23 Thread Jakub Kicinski
On Mon, 20 Aug 2018 09:54:25 +0900, Prashant Bhole wrote:
> s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev().
> This function's return value is directly returned by xsk_bind().
> EOPNOTSUPP is bind()'s possible return value.
> 
> Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()")
> Signed-off-by: Prashant Bhole 

FWIW the refactoring commit just cleaned up the code, is it worth
submitting a patch to stable to correct 4.18 as well?


Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-23 Thread Jakub Kicinski
On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote:
> On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:
> > On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:  
> >> On 11 Aug 2018, at 21:06, David Miller wrote:
> >>  
> >>> From: Jakub Kicinski 
> >>> Date: Thu, 9 Aug 2018 20:26:08 -0700
> >>>  
>  It is not immediately clear why this is needed.  The memory and
>  updating two sets of counters won't come for free, so perhaps a
>  stronger justification than troubleshooting is due? :S
> 
>  Netdev has counters for fallback vs forwarded traffic, so you'd 
>  know
>  that traffic hits the SW datapath, plus the rules which are in_hw
>  will
>  most likely not match as of today for flower (assuming 
>  correctness).  
> >>
> >> I strongly believe that these counters are a requirement for a mixed
> >> software/hardware (flow) based forwarding environment. The global
> >> counters will not help much here as you might have chosen to have
> >> certain traffic forwarded by software.
> >>
> >> These counters are probably the only option you have to figure out 
> >> why
> >> forwarding is not as fast as expected, and you want to blame the TC
> >> offload NIC.  
> >
> > The suggested debugging flow would be:
> >  (1) check the global counter for fallback are incrementing;
> >  (2) find a flow with high stats but no in_hw flag set.
> >
> > The in_hw indication should be sufficient in most cases (unless there
> > are shared blocks between netdevs of different ASICs...).  
> 
> I guess the aim is to find miss behaving hardware, i.e. having the in_hw 
> flag set, but flows still coming to the kernel.

For misbehaving hardware in_hw will not work indeed.  Whether we need
these extra always-on stats for such use case could be debated :)
 
>  I'm slightly concerned about potential performance impact, would 
>  you
>  be able to share some numbers for non-trivial number of flows (100k
>  active?)?  
> >>>
> >>> Agreed, features used for diagnostics cannot have a harmful penalty
> >>> for fast path performance.  
> >>
> >> Fast path performance is not affected as these counters are not
> >> incremented there. They are only incremented by the nic driver when 
> >> they
> >> gather their statistics from hardware.  
> >
> > Not by much, you are adding state to performance-critical structures,
> > though, for what is effectively debugging purposes.
> >
> > I was mostly talking about the HW offload stat updates (sorry for not
> > being clear).
> >
> > We can have some hundreds of thousands active offloaded flows, each of
> > them can have multiple actions, and stats have to be updated multiple
> > times per second and dumped probably around once a second, too.  On a
> > busy system the stats will get evicted from cache between each round.
> >
> > But I'm speculating let's see if I can get some numbers on it (if you
> > could get some too, that would be great!).  
> 
> I’ll try to measure some of this later this week/early next week.

I asked Louis to run some tests while I'm travelling, and he reports
that my worry about reporting the extra stats was unfounded.  Update
function does not show up in traces at all.  It seems under stress
(generated with stress-ng) the thread dumping the stats in userspace
(in OvS it would be the revalidator) actually consumes less CPU in
__gnet_stats_copy_basic (0.4% less for ~2.0% total).

Would this match with your results?  I'm not sure why dumping would be
faster with your change..

> >> However, the flow creation is effected, as this is where the extra
> >> memory gets allocated. I had done some 40K flow tests before and did 
> >> not
> >> see any noticeable change in flow insertion performance. As requested 
> >> by
> >> Jakub I did it again for 100K (and threw a Netronome blade in the mix
> >> ;). I used Marcelo’s test tool,
> >> https://github.com/marceloleitner/perf-flower.git.
> >>
> >> Here are the numbers (time in seconds) for 10 runs in sorted order:
> >>
> >> +-++
> >> | Base_kernel | Change_applied |
> >> +-++
> >> |5.684019 |   5.656388 |
> >> |5.699658 |   5.674974 |
> >> |5.725220 |   5.722107 |
> >> |5.739285 |   5.839855 |
> >> |5.748088 |   5.865238 |
> >> |5.766231 |   5.873913 |
> >> |5.842264 |   5.909259 |
> >> |5.902202 |   5.912685 |
> >> |5.905391 |   5.947138 |
> >> |6.032997 |   5.997779 |
> >> +-++
> >>
> >> I guess the deviation is in the userspace part, which is where in 
> >> real
> >> life flows get added anyway.
> >>
> >> Let me know if more is unclear.  


Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Quentin Monnet
2018-08-23 20:35 UTC+0300 ~ Sergei Shtylyov

> Hello!
> 
> On 08/23/2018 07:46 PM, Quentin Monnet wrote:
> 
>> When command line parsing fails in the while loop in do_event_pipe()
>> because the number of arguments is incorrect or because the keyword is
>> unknown, an error message is displayed, but bpfool
> 
>bp-who? ;-)
> 
>> remains stucked in
> 
>Stuck.
> 
>> the loop. Make sure we exit the loop upon failure.
>>
>> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
>> Signed-off-by: Quentin Monnet 
>> Reviewed-by: Jakub Kicinski 
> [...]
> 
> MBR, Sergei

Thanks Sergei! The patch has been applied so I cannot fix these, but
I'll make sure to give an additional pass to my future commit logs…

Best,
Quentin


Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Sergei Shtylyov
Hello!

On 08/23/2018 07:46 PM, Quentin Monnet wrote:

> When command line parsing fails in the while loop in do_event_pipe()
> because the number of arguments is incorrect or because the keyword is
> unknown, an error message is displayed, but bpfool

   bp-who? ;-)

> remains stucked in

   Stuck.

> the loop. Make sure we exit the loop upon failure.
> 
> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jakub Kicinski 
[...]

MBR, Sergei


Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Daniel Borkmann
On 08/23/2018 06:46 PM, Quentin Monnet wrote:
> When command line parsing fails in the while loop in do_event_pipe()
> because the number of arguments is incorrect or because the keyword is
> unknown, an error message is displayed, but bpfool remains stucked in
> the loop. Make sure we exit the loop upon failure.
> 
> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
> Signed-off-by: Quentin Monnet 
> Reviewed-by: Jakub Kicinski 

Applied to bpf, thanks Quentin!


Re: [PATCH bpf] bpf: use per htab salt for bucket hash

2018-08-23 Thread Eduardo Valentin
On Wed, Aug 22, 2018 at 11:49:37PM +0200, Daniel Borkmann wrote:
> All BPF hash and LRU maps currently have a known and global seed
> we feed into jhash() which is 0. This is suboptimal, thus fix it
> by generating a random seed upon hashtab setup time which we can
> later on feed into jhash() on lookup, update and deletions.
> 
> Fixes: 0f8e4bd8a1fc8 ("bpf: add hashtable type of eBPF maps")
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Reviewed-by: Eduardo Valentin 


> ---
>  kernel/bpf/hashtab.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 04b8eda..03cc59e 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "percpu_freelist.h"
>  #include "bpf_lru_list.h"
> @@ -41,6 +42,7 @@ struct bpf_htab {
>   atomic_t count; /* number of elements in this hashtable */
>   u32 n_buckets;  /* number of hash buckets */
>   u32 elem_size;  /* size of each element in bytes */
> + u32 hashrnd;
>  };
>  
>  /* each htab element is struct htab_elem + key + value */
> @@ -371,6 +373,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr 
> *attr)
>   if (!htab->buckets)
>   goto free_htab;
>  
> + htab->hashrnd = get_random_int();
>   for (i = 0; i < htab->n_buckets; i++) {
>   INIT_HLIST_NULLS_HEAD(>buckets[i].head, i);
>   raw_spin_lock_init(>buckets[i].lock);
> @@ -402,9 +405,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr 
> *attr)
>   return ERR_PTR(err);
>  }
>  
> -static inline u32 htab_map_hash(const void *key, u32 key_len)
> +static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
>  {
> - return jhash(key, key_len, 0);
> + return jhash(key, key_len, hashrnd);
>  }
>  
>  static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
> @@ -470,7 +473,7 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, 
> void *key)
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>   head = select_bucket(htab, hash);
>  
> @@ -597,7 +600,7 @@ static int htab_map_get_next_key(struct bpf_map *map, 
> void *key, void *next_key)
>   if (!key)
>   goto find_first_elem;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>   head = select_bucket(htab, hash);
>  
> @@ -824,7 +827,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
> *key, void *value,
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>   b = __select_bucket(htab, hash);
>   head = >head;
> @@ -880,7 +883,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, 
> void *key, void *value,
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>   b = __select_bucket(htab, hash);
>   head = >head;
> @@ -945,7 +948,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map 
> *map, void *key,
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>   b = __select_bucket(htab, hash);
>   head = >head;
> @@ -998,7 +1001,7 @@ static int __htab_lru_percpu_map_update_elem(struct 
> bpf_map *map, void *key,
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>   b = __select_bucket(htab, hash);
>   head = >head;
> @@ -1071,7 +1074,7 @@ static int htab_map_delete_elem(struct bpf_map *map, 
> void *key)
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>   b = __select_bucket(htab, hash);
>   head = >head;
>  
> @@ -1103,7 +1106,7 @@ static int htab_lru_map_delete_elem(struct bpf_map 
> *map, void *key)
>  
>   key_size = map->key_size;
>  
> - hash = htab_map_hash(key, key_size);
> + hash = htab_map_hash(key, key_size, htab->hashrnd);
>   b = __select_bucket(htab, hash);
>   head = >head;
>  
> -- 
> 2.9.5
> 
> 

-- 
All the best,
Eduardo Valentin


[PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments

2018-08-23 Thread Quentin Monnet
When command line parsing fails in the while loop in do_event_pipe()
because the number of arguments is incorrect or because the keyword is
unknown, an error message is displayed, but bpfool remains stucked in
the loop. Make sure we exit the loop upon failure.

Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
Signed-off-by: Quentin Monnet 
Reviewed-by: Jakub Kicinski 
---
 tools/bpf/bpftool/map_perf_ring.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/map_perf_ring.c 
b/tools/bpf/bpftool/map_perf_ring.c
index 1832100d1b27..6d41323be291 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -194,8 +194,10 @@ int do_event_pipe(int argc, char **argv)
}
 
while (argc) {
-   if (argc < 2)
+   if (argc < 2) {
BAD_ARG();
+   goto err_close_map;
+   }
 
if (is_prefix(*argv, "cpu")) {
char *endptr;
@@ -221,6 +223,7 @@ int do_event_pipe(int argc, char **argv)
NEXT_ARG();
} else {
BAD_ARG();
+   goto err_close_map;
}
 
do_all = false;
-- 
2.14.1



[PATCH net] vti6: remove !skb->ignore_df check from vti6_xmit()

2018-08-23 Thread Alexey Kodanev
Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.

In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.

This issue can be reproduced with the following LTP test script:
udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000

Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev 
---
Not sure about xfrmi_xmit2(), it has a similar check for ignore_df...

 net/ipv6/ip6_vti.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 38dec9d..f48d196 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -481,7 +481,7 @@ static bool vti6_state_check(const struct xfrm_state *x,
}
 
mtu = dst_mtu(dst);
-   if (!skb->ignore_df && skb->len > mtu) {
+   if (skb->len > mtu) {
skb_dst_update_pmtu(skb, mtu);
 
if (skb->protocol == htons(ETH_P_IPV6)) {
-- 
1.8.3.1



Re: is "volatile" the cause of ifconfig flags not matching sysfs flags?

2018-08-23 Thread Robert P. J. Day
On Wed, 22 Aug 2018, Stephen Hemminger wrote:

> On Wed, 22 Aug 2018 18:32:36 -0400 (EDT)
> "Robert P. J. Day"  wrote:
>
> >   almost certainly another dumb question, but i was poking around the
> > sysfs, particularly /sys/class/net//*, to familiarize myself
> > with what i can glean (or set) re interfaces under /sys, and i noticed
> > "flags", but what i get there doesn't match what i get by running
> > ifconfig.
> >
> >   specifically, if i list the flags for my wireless interface under
> > /sys:
> >
> > $ cat flags
> > 0x1003
> > $
> >
> >   but with ifconfig:
> >
> > $ ifconfig wlp2s0
> > wlp2s0: flags=4163  mtu 1500
> >   
> >
> >   do those two "flags" values represent the same set of flags? and
> > does the obvious difference have to do with some of those flags being
> > "volatile" as dewscribed in include/uapi/linux/if.h? or am i just
> > totally misreading this?
> >
> > rday
> >
>
> sysfs reports netdevice->if_flags where as ifconfig is getting hex
> value from SIOCGIFFLAGS which does:
>   dev_get_flags(dev)
>
> The value in sysfs is more intended for internal debugging, where
> all the normal userspace API's return a more limited set of
> historical values.

  so the history aside, those values ultimately represent the same
flags?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [PATCH iproute2 1/3] testsuite: remove all temp files and implement make clean

2018-08-23 Thread Luca Boccassi
On Thu, 2018-08-23 at 09:07 +0200, Stefan Bader wrote:
> On 22.08.2018 20:09, Luca Boccassi wrote:
> > Some generated test files were not removed, including one
> > executable in
> > the testsuite/tools directory.
> > Ensure make clean from the top level directory works for the
> > testsuite
> > subdirs too, and that all the files are removed.
> > 
> > Signed-off-by: Luca Boccassi 
> > ---
> 
> Patch 1+2 look good to me and I would ack if that would count in any
> way.
> For patch 3 I only wonder whether that might re-use $PREFIX (which is
> defined as "sudo -E unshare -n"). Ok, the unshare part might be
> slight
> overkill, but maybe a little better in style. Not sure though, and it
> is high level whining...
> 
> -Stefan

Hi,

Yeah I thought about that, but as you noticed it would run it through
unshare so I changed it in the end. I don't mind either way - Stephen,
let me know which one you prefer.

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part


[PATCH v2] net: macb: do not disable MDIO bus at open/close time

2018-08-23 Thread Anssi Hannula
macb_reset_hw() is called from macb_close() and indirectly from
macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE
(Management Port Enable) bit.

This will prevent accessing any other PHYs for other Ethernet MACs on
the MDIO bus, which remains registered at macb_reset_hw() time, until
macb_init_hw() is called from macb_open() which sets the MPE bit again.

I.e. currently the MDIO bus has a short disruption at open time and is
disabled at close time until the interface is opened again.

Fix that by only touching the RE and TE bits when enabling and disabling
RX/TX.

v2: Make macb_init_hw() NCR write a single statement.

Fixes: 6c36a7074436 ("macb: Use generic PHY layer")
Signed-off-by: Anssi Hannula 
---
 drivers/net/ethernet/cadence/macb_main.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c 
b/drivers/net/ethernet/cadence/macb_main.c
index dc09f9a8a49b..225a7c8bad2d 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -2028,14 +2028,17 @@ static void macb_reset_hw(struct macb *bp)
 {
struct macb_queue *queue;
unsigned int q;
+   u32 ctrl = macb_readl(bp, NCR);
 
/* Disable RX and TX (XXX: Should we halt the transmission
 * more gracefully?)
 */
-   macb_writel(bp, NCR, 0);
+   ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
 
/* Clear the stats registers (XXX: Update stats first?) */
-   macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
+   ctrl |= MACB_BIT(CLRSTAT);
+
+   macb_writel(bp, NCR, ctrl);
 
/* Clear all status flags */
macb_writel(bp, TSR, -1);
@@ -2223,7 +2226,7 @@ static void macb_init_hw(struct macb *bp)
}
 
/* Enable TX and RX */
-   macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE));
+   macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE));
 }
 
 /* The hash address register is 64 bits long and takes up two
-- 
2.17.1



Re: Failed to call bpf_l3_csum_replace() twice for IPIP tunnel

2018-08-23 Thread Daniel Borkmann
On 08/23/2018 08:39 AM, IMBRIUS AGER wrote:
> hello, I am trying to modify the src addr (both inner and outer) of IPIP 
> tunnel.
> 
> this is the testing code:
> 
> ===
> 
> void *data = (void *)(long)skb->data;
> void *data_end = (void *)(long)skb->data_end;
> 
> struct ethhdr *eth = data;
> struct iphdr *ip_outer = (void *)(eth + 1);
> 
> if (ip_outer->ihl != 5 || ip_outer + 1 > data_end)
> return;
> 
> struct iphdr *ip_inner = (void *)(ip4_outer + 1);
> 
> if (ip_inner->ihl != 5 || ip_inner + 1 > data_end)
> return;
> 
> src = 0x;
> 
> /* First I modify the inner ip */
> bpf_l3_csum_replace(skb, IP4_CSUM_OFF + sizeof(struct iphdr),
> ip_inner->saddr, src, sizeof(src));
> bpf_skb_store_bytes(skb, IP4_SRC_OFF + sizeof(struct iphdr), ,
> sizeof(src), 0);
> 
> /* Second I modify the outer ip */
> bpf_l3_csum_replace(skb, IP4_CSUM_OFF, ip_outer->saddr, src, sizeof(src));
> bpf_skb_store_bytes(skb, IP4_SRC_OFF, , sizeof(src), 0);

You need to reload skb->data pointer as otherwise the first l3_csum_replace
could have uncloned the skb and then ip_outer->saddr would point to invalid
mem. Other, better option is to use the bpf_csum_diff() helper and calc the
diff which then only needs to be fed into bpf_l3_csum_replace() once.

> 
> 
> I found that I could only modify one of the src addr (inner or outer).
> If both, the kernel always rejected the code at the first
> bpf_l3_csum_replace():
> 
> 
> 
> Prog section '__x' rejected: Permission denied (13)!
>  - Type: 3
>  - Instructions: 171 (0 over limit)
>  - License:  GPL
> 
> .
> 
> 96: (85) call bpf_l3_csum_replace#10
> 97: (61) r4 = *(u32 *)(r7 +0)
>  R0=inv(id=0) R6=ctx(id=0,off=0,imm=0)
> R7=map_value(id=0,off=0,ks=4,vs=4,imm=0) R8=inv(id=0) R9=inv(id=0)
> R10=fp0
> 98: (61) r3 = *(u32 *)(r9 +26)
> R9 invalid mem access 'inv'
> 
> Error fetching program/map!
> Unable to load program
> 
> 
> 
> I tried to validate the pointer again before the second modification.
> But nothing good has happened.
> 
> if (ip_outer->ihl != 5 || ip_outer + 1 > data_end)
> return;
> /* Second I modify the outer ip */
> bpf_l3_csum_replace(skb, IP4_CSUM_OFF, ip_outer->saddr, src, sizeof(src));
> bpf_skb_store_bytes(skb, IP4_SRC_OFF, , sizeof(src), 0);
> 
> 
> Any idea?
> Thanks very much
> 



Re: [PATCH iproute2 1/3] testsuite: remove all temp files and implement make clean

2018-08-23 Thread Stefan Bader
On 22.08.2018 20:09, Luca Boccassi wrote:
> Some generated test files were not removed, including one executable in
> the testsuite/tools directory.
> Ensure make clean from the top level directory works for the testsuite
> subdirs too, and that all the files are removed.
> 
> Signed-off-by: Luca Boccassi 
> ---

Patch 1+2 look good to me and I would ack if that would count in any way.
For patch 3 I only wonder whether that might re-use $PREFIX (which is
defined as "sudo -E unshare -n"). Ok, the unshare part might be slight
overkill, but maybe a little better in style. Not sure though, and it
is high level whining...

-Stefan

>  Makefile | 2 +-
>  testsuite/Makefile   | 3 +++
>  testsuite/tools/Makefile | 3 +++
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index 651d2a50..ea2f797c 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -96,7 +96,7 @@ snapshot:
>   > include/SNAPSHOT.h
>  
>  clean:
> - @for i in $(SUBDIRS); \
> + @for i in $(SUBDIRS) testsuite; \
>   do $(MAKE) $(MFLAGS) -C $$i clean; done
>  
>  clobber:
> diff --git a/testsuite/Makefile b/testsuite/Makefile
> index 8fcbc557..2acd0427 100644
> --- a/testsuite/Makefile
> +++ b/testsuite/Makefile
> @@ -43,6 +43,9 @@ alltests: $(TESTS)
>  clean:
>   @echo "Removing $(RESULTS_DIR) dir ..."
>   @rm -rf $(RESULTS_DIR)
> + @rm -f iproute2/iproute2-this
> + @rm -f tests/ip/link/dev_wo_vf_rate.nl
> + $(MAKE) -C tools clean
>  
>  distclean: clean
>   echo "Entering iproute2" && cd iproute2 && $(MAKE) distclean && cd ..;
> diff --git a/testsuite/tools/Makefile b/testsuite/tools/Makefile
> index f2cdc980..f0ce4ee2 100644
> --- a/testsuite/tools/Makefile
> +++ b/testsuite/tools/Makefile
> @@ -1,3 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  generate_nlmsg: generate_nlmsg.c ../../lib/libnetlink.c
>   $(CC) -o $@ $^
> +
> +clean:
> + rm -f generate_nlmsg
> 




signature.asc
Description: OpenPGP digital signature


Failed to call bpf_l3_csum_replace() twice for IPIP tunnel

2018-08-23 Thread IMBRIUS AGER
hello, I am trying to modify the src addr (both inner and outer) of IPIP tunnel.

this is the testing code:

===

void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;

struct ethhdr *eth = data;
struct iphdr *ip_outer = (void *)(eth + 1);

if (ip_outer->ihl != 5 || ip_outer + 1 > data_end)
return;

struct iphdr *ip_inner = (void *)(ip4_outer + 1);

if (ip_inner->ihl != 5 || ip_inner + 1 > data_end)
return;

src = 0x;

/* First I modify the inner ip */
bpf_l3_csum_replace(skb, IP4_CSUM_OFF + sizeof(struct iphdr),
ip_inner->saddr, src, sizeof(src));
bpf_skb_store_bytes(skb, IP4_SRC_OFF + sizeof(struct iphdr), ,
sizeof(src), 0);

/* Second I modify the outer ip */
bpf_l3_csum_replace(skb, IP4_CSUM_OFF, ip_outer->saddr, src, sizeof(src));
bpf_skb_store_bytes(skb, IP4_SRC_OFF, , sizeof(src), 0);



I found that I could only modify one of the src addr (inner or outer).
If both, the kernel always rejected the code at the first
bpf_l3_csum_replace():



Prog section '__x' rejected: Permission denied (13)!
 - Type: 3
 - Instructions: 171 (0 over limit)
 - License:  GPL

.

96: (85) call bpf_l3_csum_replace#10
97: (61) r4 = *(u32 *)(r7 +0)
 R0=inv(id=0) R6=ctx(id=0,off=0,imm=0)
R7=map_value(id=0,off=0,ks=4,vs=4,imm=0) R8=inv(id=0) R9=inv(id=0)
R10=fp0
98: (61) r3 = *(u32 *)(r9 +26)
R9 invalid mem access 'inv'

Error fetching program/map!
Unable to load program



I tried to validate the pointer again before the second modification.
But nothing good has happened.

if (ip_outer->ihl != 5 || ip_outer + 1 > data_end)
return;
/* Second I modify the outer ip */
bpf_l3_csum_replace(skb, IP4_CSUM_OFF, ip_outer->saddr, src, sizeof(src));
bpf_skb_store_bytes(skb, IP4_SRC_OFF, , sizeof(src), 0);


Any idea?
Thanks very much