Re: pull request: bluetooth 2018-08-23

2018-08-22 Thread David Miller
From: Johan Hedberg 
Date: Thu, 23 Aug 2018 08:34:40 +0300

> Here are two important Bluetooth fixes for the MediaTek and RealTek HCI
> drivers.
> 
> Please let me know if there are any issues pulling, thanks.

Pulled, thank you.


pull request: bluetooth 2018-08-23

2018-08-22 Thread Johan Hedberg
Hi Dave,

Here are two important Bluetooth fixes for the MediaTek and RealTek HCI
drivers.

Please let me know if there are any issues pulling, thanks.

Johan

---
The following changes since commit ab08dcd724543896303eae7de6288242bbaff458:

  rhashtable: remove duplicated include from rhashtable.c (2018-08-20 19:18:50 
-0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git 
for-upstream

for you to fetch changes up to addb3ffbca66954fb1d1791d2db2153c403f81af:

  Bluetooth: mediatek: Fix memory leak (2018-08-21 16:56:20 +0200)


Gustavo A. R. Silva (1):
  Bluetooth: mediatek: Fix memory leak

Hans de Goede (1):
  Bluetooth: Make BT_HCIUART_RTL configuration option depend on ACPI

 drivers/bluetooth/Kconfig | 1 +
 drivers/bluetooth/btmtkuart.c | 8 +---
 2 files changed, 6 insertions(+), 3 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH 1/2] net: netsec: enable tx-irq during open callback

2018-08-22 Thread Jassi Brar
Hi Dave,
   This patch (1/2) seems to have fallen through the cracks. The other
one (2/2), you already picked.
Thanks

On Mon, Apr 16, 2018 at 1:08 PM  wrote:
>
> From: Jassi Brar 
>
> Enable TX-irq as well during ndo_open() as we can not count upon
> RX to arrive early enough to trigger the napi. This patch is critical
> for installation over network.
>
> Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
> Signed-off-by: Jassi Brar 
> ---
>  drivers/net/ethernet/socionext/netsec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/socionext/netsec.c 
> b/drivers/net/ethernet/socionext/netsec.c
> index f4c0b02..f6fe70e 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -1313,8 +1313,8 @@ static int netsec_netdev_open(struct net_device *ndev)
> napi_enable(>napi);
> netif_start_queue(ndev);
>
> -   /* Enable RX intr. */
> -   netsec_write(priv, NETSEC_REG_INTEN_SET, NETSEC_IRQ_RX);
> +   /* Enable TX+RX intr. */
> +   netsec_write(priv, NETSEC_REG_INTEN_SET, NETSEC_IRQ_RX | 
> NETSEC_IRQ_TX);
>
> return 0;
>  err3:
> --
> 2.7.4
>


Re: [Patch net 0/2] net: hns3: bug fix & optimization for HNS3 driver

2018-08-22 Thread David Miller
From: Huazhong Tan 
Date: Thu, 23 Aug 2018 11:37:14 +0800

> This patchset presents a bug fix found out when CONFIG_ARM64_64K_PAGES
> enable and an optimization for HNS3 driver.

Series applied, thank you.


Re: [PATCH bpf] bpf: use per htab salt for bucket hash

2018-08-22 Thread Song Liu
On Wed, Aug 22, 2018 at 2:49 PM, Daniel Borkmann  wrote:
> All BPF hash and LRU maps currently have a known and global seed
> we feed into jhash() which is 0. This is suboptimal, thus fix it
> by generating a random seed upon hashtab setup time which we can
> later on feed into jhash() on lookup, update and deletions.
>
> Fixes: 0f8e4bd8a1fc8 ("bpf: add hashtable type of eBPF maps")
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Acked-by: Song Liu 

> ---
>  kernel/bpf/hashtab.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 04b8eda..03cc59e 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include "percpu_freelist.h"
>  #include "bpf_lru_list.h"
> @@ -41,6 +42,7 @@ struct bpf_htab {
> atomic_t count; /* number of elements in this hashtable */
> u32 n_buckets;  /* number of hash buckets */
> u32 elem_size;  /* size of each element in bytes */
> +   u32 hashrnd;
>  };
>
>  /* each htab element is struct htab_elem + key + value */
> @@ -371,6 +373,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr 
> *attr)
> if (!htab->buckets)
> goto free_htab;
>
> +   htab->hashrnd = get_random_int();
> for (i = 0; i < htab->n_buckets; i++) {
> INIT_HLIST_NULLS_HEAD(>buckets[i].head, i);
> raw_spin_lock_init(>buckets[i].lock);
> @@ -402,9 +405,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr 
> *attr)
> return ERR_PTR(err);
>  }
>
> -static inline u32 htab_map_hash(const void *key, u32 key_len)
> +static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
>  {
> -   return jhash(key, key_len, 0);
> +   return jhash(key, key_len, hashrnd);
>  }
>
>  static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
> @@ -470,7 +473,7 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, 
> void *key)
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
>
> head = select_bucket(htab, hash);
>
> @@ -597,7 +600,7 @@ static int htab_map_get_next_key(struct bpf_map *map, 
> void *key, void *next_key)
> if (!key)
> goto find_first_elem;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
>
> head = select_bucket(htab, hash);
>
> @@ -824,7 +827,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
> *key, void *value,
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
>
> b = __select_bucket(htab, hash);
> head = >head;
> @@ -880,7 +883,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, 
> void *key, void *value,
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
>
> b = __select_bucket(htab, hash);
> head = >head;
> @@ -945,7 +948,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map 
> *map, void *key,
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
>
> b = __select_bucket(htab, hash);
> head = >head;
> @@ -998,7 +1001,7 @@ static int __htab_lru_percpu_map_update_elem(struct 
> bpf_map *map, void *key,
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
>
> b = __select_bucket(htab, hash);
> head = >head;
> @@ -1071,7 +1074,7 @@ static int htab_map_delete_elem(struct bpf_map *map, 
> void *key)
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
> b = __select_bucket(htab, hash);
> head = >head;
>
> @@ -1103,7 +1106,7 @@ static int htab_lru_map_delete_elem(struct bpf_map 
> *map, void *key)
>
> key_size = map->key_size;
>
> -   hash = htab_map_hash(key, key_size);
> +   hash = htab_map_hash(key, key_size, htab->hashrnd);
> b = __select_bucket(htab, hash);
> head = >head;
>
> --
> 2.9.5
>


Re: [PATCH] net/ipv6: init ip6 anycast rt->dst.input as ip6_input

2018-08-22 Thread David Miller
From: Hangbin Liu 
Date: Thu, 23 Aug 2018 11:31:37 +0800

> Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
> forgot to handle anycast route and init anycast rt->dst.input to ip6_forward.
> Fix it by setting anycast rt->dst.input back to ip6_input.
> 
> Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
> Signed-off-by: Hangbin Liu 

Applied and queued up for -stable, thanks.


Re: [Patch net 0/4] net: hns: bug fixes & optimization for HNS driver

2018-08-22 Thread David Miller
From: Huazhong Tan 
Date: Thu, 23 Aug 2018 11:10:09 +0800

> This patchset presents some bug fixes found out when
> CONFIG_ARM64_64K_PAGES enable and an optimization for HNS driver.

Series applied, thank you.


Re: [PATCH net 0/3] tcp_bbr: PROBE_RTT minor bug fixes

2018-08-22 Thread David Miller
From: Kevin Yang 
Date: Wed, 22 Aug 2018 17:43:13 -0400

> From: "Kevin(Yudong) Yang" 
> 
> This series includes two minor bug fixes for the TCP BBR PROBE_RTT
> mechanism, and one preparatory patch:
> 
> (1) A preparatory patch to reorganize the PROBE_RTT logic by refactoring
> (into its own function) the code to exit PROBE_RTT, since the next
> patch will be using that code in a new context.
> 
> (2) Fix: When BBR restarts from idle and if BBR is in PROBE_RTT mode,
> BBR should check if it's time to exit PROBE_RTT. If yes, then BBR
> should exit PROBE_RTT mode and restore the cwnd to its full value.
> 
> (3) Fix: Apply the PROBE_RTT cwnd cap even if the count of fully-ACKed
> packets is 0.

Series applied, thank you.


Re: [PATCH net] ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state

2018-08-22 Thread David Miller
From: Eric Dumazet 
Date: Wed, 22 Aug 2018 13:30:45 -0700

> tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
> to send some control packets.
> 
> 1) RST packets, through tcp_v4_send_reset()
> 2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()
> 
> These packets assert IP_DF, and also use the hashed IP ident generator
> to provide an IPv4 ID number.
> 
> Geoff Alexander reported this could be used to build off-path attacks.
> 
> These packets should not be fragmented, since their size is smaller than
> IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
> regardless of inner IPID.
> 
> We really can use zero IPID, to address the flaw, and as a bonus,
> avoid a couple of atomic operations in ip_idents_reserve()
> 
> Signed-off-by: Eric Dumazet 
> Reported-by: Geoff Alexander 
> Tested-by: Geoff Alexander 

Applied and queued up for -stable.


Re: [Patch net] addrconf: reduce unnecessary atomic allocations

2018-08-22 Thread David Miller
From: Cong Wang 
Date: Wed, 22 Aug 2018 12:58:34 -0700

> All the 3 callers of addrconf_add_mroute() assert RTNL
> lock, they don't take any additional lock either, so
> it is safe to convert it to GFP_KERNEL.
> 
> Same for sit_add_v4_addrs().
> 
> Cc: David Ahern 
> Signed-off-by: Cong Wang 

Applied.


Re: [PATCH] net/ipv6: init ip6 anycast rt->dst.input as ip6_input

2018-08-22 Thread David Ahern
On 8/22/18 9:31 PM, Hangbin Liu wrote:
> Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
> forgot to handle anycast route and init anycast rt->dst.input to ip6_forward.
> Fix it by setting anycast rt->dst.input back to ip6_input.
> 
> Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
> Signed-off-by: Hangbin Liu 
> ---
>  net/ipv6/route.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Good catch.

Reviewed-by: David Ahern 



[Patch net 2/2] net: hns3: modify variable type in hns3_nic_reuse_page

2018-08-22 Thread Huazhong Tan
'truesize' is supposed to be u32, not int, so fix it.

Signed-off-by: Huazhong tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 3554dca..955c4ab 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2019,7 +2019,8 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int 
i,
struct hns3_desc_cb *desc_cb)
 {
struct hns3_desc *desc;
-   int truesize, size;
+   u32 truesize;
+   int size;
int last_offset;
bool twobufs;
 
-- 
1.9.1



[Patch net 0/2] net: hns3: bug fix & optimization for HNS3 driver

2018-08-22 Thread Huazhong Tan
This patchset presents a bug fix found out when CONFIG_ARM64_64K_PAGES
enable and an optimization for HNS3 driver.

Huazhong Tan (2):
  net: hns3: fix page_offset overflow when CONFIG_ARM64_64K_PAGES
  net: hns3: modify variable type in hns3_nic_reuse_page

 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

-- 
1.9.1



[Patch net 1/2] net: hns3: fix page_offset overflow when CONFIG_ARM64_64K_PAGES

2018-08-22 Thread Huazhong Tan
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of
PAGE_SIZE is 65536(64K). But the type of page_offset is u16, it will
overflow. So change it to u32, when "CONFIG_ARM64_64K_PAGES" enabled.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 
SoC")
Signed-off-by: Huazhong Tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
index a02a96a..cb450d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h
@@ -284,11 +284,11 @@ struct hns3_desc_cb {
 
/* priv data for the desc, e.g. skb when use with ip stack*/
void *priv;
-   u16 page_offset;
-   u16 reuse_flag;
-
+   u32 page_offset;
u32 length; /* length of the buffer */
 
+   u16 reuse_flag;
+
/* desc type, used by the ring user to mark the type of the priv data */
u16 type;
 };
-- 
1.9.1



[PATCH] net/ipv6: init ip6 anycast rt->dst.input as ip6_input

2018-08-22 Thread Hangbin Liu
Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
forgot to handle anycast route and init anycast rt->dst.input to ip6_forward.
Fix it by setting anycast rt->dst.input back to ip6_input.

Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Hangbin Liu 
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7208c16..c4ea13e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -956,7 +956,7 @@ static void ip6_rt_init_dst(struct rt6_info *rt, struct 
fib6_info *ort)
rt->dst.error = 0;
rt->dst.output = ip6_output;
 
-   if (ort->fib6_type == RTN_LOCAL) {
+   if (ort->fib6_type == RTN_LOCAL || ort->fib6_type == RTN_ANYCAST) {
rt->dst.input = ip6_input;
} else if (ipv6_addr_type(>fib6_dst.addr) & IPV6_ADDR_MULTICAST) {
rt->dst.input = ip6_mc_input;
-- 
2.5.5



[Patch net 0/4] net: hns: bug fixes & optimization for HNS driver

2018-08-22 Thread Huazhong Tan
This patchset presents some bug fixes found out when CONFIG_ARM64_64K_PAGES
enable and an optimization for HNS driver.

Huazhong Tan (4):
  net: hns: fix length and page_offset overflow when
CONFIG_ARM64_64K_PAGES
  net: hns: modify variable type in hns_nic_reuse_page
  net: hns: fix skb->truesize underestimation
  net: hns: use eth_get_headlen interface instead of hns_nic_get_headlen

 drivers/net/ethernet/hisilicon/hns/hnae.h |   6 +-
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 108 +-
 2 files changed, 7 insertions(+), 107 deletions(-)

-- 
1.9.1



[Patch net 4/4] net: hns: use eth_get_headlen interface instead of hns_nic_get_headlen

2018-08-22 Thread Huazhong Tan
Update hns to drop the hns_nic_get_headlen function in favour of
eth_get_headlen, and hence also removes now redundant hns_nic_get_headlen.

Signed-off-by: Huazhong Tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 103 +-
 1 file changed, 1 insertion(+), 102 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 71bd3bf..02a0ba2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -406,107 +406,6 @@ netdev_tx_t hns_nic_net_xmit_hw(struct net_device *ndev,
return NETDEV_TX_BUSY;
 }
 
-/**
- * hns_nic_get_headlen - determine size of header for RSC/LRO/GRO/FCOE
- * @data: pointer to the start of the headers
- * @max: total length of section to find headers in
- *
- * This function is meant to determine the length of headers that will
- * be recognized by hardware for LRO, GRO, and RSC offloads.  The main
- * motivation of doing this is to only perform one pull for IPv4 TCP
- * packets so that we can do basic things like calculating the gso_size
- * based on the average data per packet.
- **/
-static unsigned int hns_nic_get_headlen(unsigned char *data, u32 flag,
-   unsigned int max_size)
-{
-   unsigned char *network;
-   u8 hlen;
-
-   /* this should never happen, but better safe than sorry */
-   if (max_size < ETH_HLEN)
-   return max_size;
-
-   /* initialize network frame pointer */
-   network = data;
-
-   /* set first protocol and move network header forward */
-   network += ETH_HLEN;
-
-   /* handle any vlan tag if present */
-   if (hnae_get_field(flag, HNS_RXD_VLAN_M, HNS_RXD_VLAN_S)
-   == HNS_RX_FLAG_VLAN_PRESENT) {
-   if ((typeof(max_size))(network - data) > (max_size - VLAN_HLEN))
-   return max_size;
-
-   network += VLAN_HLEN;
-   }
-
-   /* handle L3 protocols */
-   if (hnae_get_field(flag, HNS_RXD_L3ID_M, HNS_RXD_L3ID_S)
-   == HNS_RX_FLAG_L3ID_IPV4) {
-   if ((typeof(max_size))(network - data) >
-   (max_size - sizeof(struct iphdr)))
-   return max_size;
-
-   /* access ihl as a u8 to avoid unaligned access on ia64 */
-   hlen = (network[0] & 0x0F) << 2;
-
-   /* verify hlen meets minimum size requirements */
-   if (hlen < sizeof(struct iphdr))
-   return network - data;
-
-   /* record next protocol if header is present */
-   } else if (hnae_get_field(flag, HNS_RXD_L3ID_M, HNS_RXD_L3ID_S)
-   == HNS_RX_FLAG_L3ID_IPV6) {
-   if ((typeof(max_size))(network - data) >
-   (max_size - sizeof(struct ipv6hdr)))
-   return max_size;
-
-   /* record next protocol */
-   hlen = sizeof(struct ipv6hdr);
-   } else {
-   return network - data;
-   }
-
-   /* relocate pointer to start of L4 header */
-   network += hlen;
-
-   /* finally sort out TCP/UDP */
-   if (hnae_get_field(flag, HNS_RXD_L4ID_M, HNS_RXD_L4ID_S)
-   == HNS_RX_FLAG_L4ID_TCP) {
-   if ((typeof(max_size))(network - data) >
-   (max_size - sizeof(struct tcphdr)))
-   return max_size;
-
-   /* access doff as a u8 to avoid unaligned access on ia64 */
-   hlen = (network[12] & 0xF0) >> 2;
-
-   /* verify hlen meets minimum size requirements */
-   if (hlen < sizeof(struct tcphdr))
-   return network - data;
-
-   network += hlen;
-   } else if (hnae_get_field(flag, HNS_RXD_L4ID_M, HNS_RXD_L4ID_S)
-   == HNS_RX_FLAG_L4ID_UDP) {
-   if ((typeof(max_size))(network - data) >
-   (max_size - sizeof(struct udphdr)))
-   return max_size;
-
-   network += sizeof(struct udphdr);
-   }
-
-   /* If everything has gone correctly network should be the
-* data section of the packet and will be the end of the header.
-* If not then it probably represents the end of the last recognized
-* header.
-*/
-   if ((typeof(max_size))(network - data) < max_size)
-   return network - data;
-   else
-   return max_size;
-}
-
 static void hns_nic_reuse_page(struct sk_buff *skb, int i,
   struct hnae_ring *ring, int pull_len,
   struct hnae_desc_cb *desc_cb)
@@ -696,7 +595,7 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data 
*ring_data,
} else {
ring->stats.seg_pkt_cnt++;
 
-   pull_len = hns_nic_get_headlen(va, bnum_flag, HNS_RX_HEAD_SIZE);
+   pull_len 

[Patch net 3/4] net: hns: fix skb->truesize underestimation

2018-08-22 Thread Huazhong Tan
skb->truesize is not meant to be tracking amount of used bytes in a skb,
but amount of reserved/consumed bytes in memory.

For instance, if we use a single byte in last page fragment, we have to
account the full size of the fragment.

So skb_add_rx_frag needs to calculate the length of the entire buffer into
turesize.

Fixes: 9cbe9fd5214e ("net: hns: optimize XGE capability by reducing cpu usage")
Signed-off-by: Huazhong tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index c8c0b03..71bd3bf 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -531,7 +531,7 @@ static void hns_nic_reuse_page(struct sk_buff *skb, int i,
}
 
skb_add_rx_frag(skb, i, desc_cb->priv, desc_cb->page_offset + pull_len,
-   size - pull_len, truesize - pull_len);
+   size - pull_len, truesize);
 
 /* avoid re-using remote pages,flag default unreuse */
if (unlikely(page_to_nid(desc_cb->priv) != numa_node_id()))
-- 
1.9.1



[Patch net 2/4] net: hns: modify variable type in hns_nic_reuse_page

2018-08-22 Thread Huazhong Tan
'truesize' is supposed to be u32, not int, so fix it.

Signed-off-by: Huazhong tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 9f2b552..c8c0b03 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -512,7 +512,8 @@ static void hns_nic_reuse_page(struct sk_buff *skb, int i,
   struct hnae_desc_cb *desc_cb)
 {
struct hnae_desc *desc;
-   int truesize, size;
+   u32 truesize;
+   int size;
int last_offset;
bool twobufs;
 
-- 
1.9.1



[Patch net 1/4] net: hns: fix length and page_offset overflow when CONFIG_ARM64_64K_PAGES

2018-08-22 Thread Huazhong Tan
When enable the config item "CONFIG_ARM64_64K_PAGES", the size of PAGE_SIZE
is 65536(64K). But the  type of length and page_offset are u16, they will
overflow. So change them to u32.

Fixes: 6fe6611ff275 ("net: add Hisilicon Network Subsystem hnae framework 
support")
Signed-off-by: Huazhong Tan 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hnae.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index fa5b30f..cad52bd 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -220,10 +220,10 @@ struct hnae_desc_cb {
 
/* priv data for the desc, e.g. skb when use with ip stack*/
void *priv;
-   u16 page_offset;
-   u16 reuse_flag;
+   u32 page_offset;
+   u32 length; /* length of the buffer */
 
-   u16 length; /* length of the buffer */
+   u16 reuse_flag;
 
/* desc type, used by the ring user to mark the type of the priv data */
u16 type;
-- 
1.9.1



[PATCHv3 iproute2 0/2] clang + misc changes

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar 

The primary theme is to make clang compile the iproute2 package without
warnings. Along with this there are two other misc patches in the series.

First patch uses the preferred_family when operating with maddr feature.
Prior to this patch, it would always open an AF_INET socket irrespective
of the family that is preferred via command-line. 

Second patch mostly adds format attributes to make the c-lang compiler
happy and not throw the warning messages.

Mahesh Bandewar (2):
  ipmaddr: use preferred_family when given
  iproute: make clang happy with iproute2 package

 include/json_writer.h |  3 +--
 ip/iplink_can.c   | 19 ---
 ip/ipmaddr.c  | 13 -
 lib/color.c   |  1 +
 lib/json_print.c  |  1 +
 lib/json_writer.c | 15 +--
 misc/ss.c |  3 ++-
 tc/m_ematch.c |  1 +
 tc/m_ematch.h |  1 +
 9 files changed, 32 insertions(+), 25 deletions(-)

-- 
2.18.0.1017.ga543ac7ca45-goog



[PATCHv3 iproute2 1/2] ipmaddr: use preferred_family when given

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar 

When creating socket() AF_INET is used irrespective of the family
that is given at the command-line (with -4, -6, or -0). This change
will open the socket with the preferred family.

Signed-off-by: Mahesh Bandewar 
---
 ip/ipmaddr.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/ip/ipmaddr.c b/ip/ipmaddr.c
index a48499029e17..abf83784d0df 100644
--- a/ip/ipmaddr.c
+++ b/ip/ipmaddr.c
@@ -289,6 +289,7 @@ static int multiaddr_list(int argc, char **argv)
 static int multiaddr_modify(int cmd, int argc, char **argv)
 {
struct ifreq ifr = {};
+   int family;
int fd;
 
if (cmd == RTM_NEWADDR)
@@ -324,7 +325,17 @@ static int multiaddr_modify(int cmd, int argc, char **argv)
exit(-1);
}
 
-   fd = socket(AF_INET, SOCK_DGRAM, 0);
+   switch (preferred_family) {
+   case AF_INET6:
+   case AF_PACKET:
+   case AF_INET:
+   family = preferred_family;
+   break;
+   default:
+   family = AF_INET;
+   }
+
+   fd = socket(family, SOCK_DGRAM, 0);
if (fd < 0) {
perror("Cannot create socket");
exit(1);
-- 
2.18.0.1017.ga543ac7ca45-goog



[PATCHv3 iproute2 2/2] iproute: make clang happy

2018-08-22 Thread Mahesh Bandewar
From: Mahesh Bandewar 

These are primarily fixes for "string is not string literal" warnings
/ errors (with -Werror -Wformat-nonliteral). This should be a no-op
change. I had to replace couple of print helper functions with the
code they call as it was becoming harder to eliminate these warnings,
however these helpers were used only at couple of places, so no
major change as such.

Signed-off-by: Mahesh Bandewar 
---
 include/json_writer.h |  3 +--
 ip/iplink_can.c   | 19 ---
 lib/color.c   |  1 +
 lib/json_print.c  |  1 +
 lib/json_writer.c | 15 +--
 misc/ss.c |  3 ++-
 tc/m_ematch.c |  1 +
 tc/m_ematch.h |  1 +
 8 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/include/json_writer.h b/include/json_writer.h
index 9ab88e1dbdd9..0c8831c1136d 100644
--- a/include/json_writer.h
+++ b/include/json_writer.h
@@ -29,6 +29,7 @@ void jsonw_pretty(json_writer_t *self, bool on);
 void jsonw_name(json_writer_t *self, const char *name);
 
 /* Add value  */
+__attribute__((format(printf, 2, 3)))
 void jsonw_printf(json_writer_t *self, const char *fmt, ...);
 void jsonw_string(json_writer_t *self, const char *value);
 void jsonw_bool(json_writer_t *self, bool value);
@@ -59,8 +60,6 @@ void jsonw_luint_field(json_writer_t *self, const char *prop,
unsigned long int num);
 void jsonw_lluint_field(json_writer_t *self, const char *prop,
unsigned long long int num);
-void jsonw_float_field_fmt(json_writer_t *self, const char *prop,
-  const char *fmt, double val);
 
 /* Collections */
 void jsonw_start_object(json_writer_t *self);
diff --git a/ip/iplink_can.c b/ip/iplink_can.c
index 587413da15c4..c0deeb1f1fcf 100644
--- a/ip/iplink_can.c
+++ b/ip/iplink_can.c
@@ -316,11 +316,14 @@ static void can_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
struct can_bittiming *bt = RTA_DATA(tb[IFLA_CAN_BITTIMING]);
 
if (is_json_context()) {
+   json_writer_t *jw;
+
open_json_object("bittiming");
print_int(PRINT_ANY, "bitrate", NULL, bt->bitrate);
-   jsonw_float_field_fmt(get_json_writer(),
- "sample_point", "%.3f",
- (float) bt->sample_point / 1000.);
+   jw = get_json_writer();
+   jsonw_name(jw, "sample_point");
+   jsonw_printf(jw, "%.3f",
+(float) bt->sample_point / 1000);
print_int(PRINT_ANY, "tq", NULL, bt->tq);
print_int(PRINT_ANY, "prop_seg", NULL, bt->prop_seg);
print_int(PRINT_ANY, "phase_seg1",
@@ -415,12 +418,14 @@ static void can_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
RTA_DATA(tb[IFLA_CAN_DATA_BITTIMING]);
 
if (is_json_context()) {
+   json_writer_t *jw;
+
open_json_object("data_bittiming");
print_int(PRINT_JSON, "bitrate", NULL, dbt->bitrate);
-   jsonw_float_field_fmt(get_json_writer(),
- "sample_point",
- "%.3f",
- (float) dbt->sample_point / 
1000.);
+   jw = get_json_writer();
+   jsonw_name(jw, "sample_point");
+   jsonw_printf(jw, "%.3f",
+(float) dbt->sample_point / 1000.);
print_int(PRINT_JSON, "tq", NULL, dbt->tq);
print_int(PRINT_JSON, "prop_seg", NULL, dbt->prop_seg);
print_int(PRINT_JSON, "phase_seg1",
diff --git a/lib/color.c b/lib/color.c
index eaf69e74d673..e5406294dfc4 100644
--- a/lib/color.c
+++ b/lib/color.c
@@ -132,6 +132,7 @@ void set_color_palette(void)
is_dark_bg = 1;
 }
 
+__attribute__((format(printf, 3, 4)))
 int color_fprintf(FILE *fp, enum color_attr attr, const char *fmt, ...)
 {
int ret = 0;
diff --git a/lib/json_print.c b/lib/json_print.c
index 5dc41bfabfd4..77902824a738 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -100,6 +100,7 @@ void close_json_array(enum output_type type, const char 
*str)
  * functions handling different types
  */
 #define _PRINT_FUNC(type_name, type)   \
+   __attribute__((format(printf, 4, 0)))   \
void print_color_##type_name(enum output_type t,\
 enum color_attr color, \
 const char *key,   \
diff --git a/lib/json_writer.c b/lib/json_writer.c
index 

Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

2018-08-22 Thread Petar Penkov
On Wed, Aug 22, 2018 at 12:28 AM, Daniel Borkmann  wrote:
> "On 08/22/2018 09:22 AM, Daniel Borkmann wrote:
>> On 08/22/2018 02:19 AM, Petar Penkov wrote:
>>> On Mon, Aug 20, 2018 at 1:52 PM, Alexei Starovoitov
>>>  wrote:
 On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote:
> From: Petar Penkov 
>> [...]
> 3/ The BPF program cannot use direct packet access everywhere because it
> uses an offset, initially supplied by the flow dissector.  Because the
> initial value of this non-constant offset comes from outside of the
> program, the verifier does not know what its value is, and it cannot 
> verify
> that it is within packet bounds. Therefore, direct packet access programs
> get rejected.

 this part doesn't seem to match the code.
 direct packet access is allowed and usable even for fragmented skbs.
 in such case only linear part of skb is in "direct access".
>>>
>>> I am not sure I understand. What I meant was that I use bpf_skb_load_bytes
>>> rather than direct packet access because the offset at which I read headers,
>>> nhoff, depends on an initial value that cannot be statically verified - 
>>> namely
>>> what __skb_flow_dissect provides. Is there an alternative approach I should
>>> be taking here, and/or am I misunderstanding direct access?
>>
>> You can still use direct packet access with it, the only thing you would
>> need to make sure is that the initial offset is bounded (e.g. test if
>> larger than some const and then drop the packet, or '& ') so that
>> the verifier can make sure the alu op won't cause overflow, then you can
>> add this to pkt_data, and later on open an access range with the usual test
>> like pkt_data' +  > pkt_end.
>
> And for non-linear data, you could use the bpf_skb_pull_data() helper as
> we have in tc/BPF case 36bbef52c7eb ("bpf: direct packet write and access
> for helpers for clsact progs") to pull it into linear area and make it
> accessible for direct packet access.
>
>> Thanks,
>> Daniel

Thanks for the clarification! With direct packet access the flow
dissector in patch 2
is as fast as the in-kernel flow dissector when tested with the test in patch 3.

To bound the initial offset and use direct access I check if the
initial offset is larger
than 1500. This is sufficient for the verifier but I was wondering if there is a
better constant to use.

Thanks once again for your feedback,
Petar


[PATCH next-queue 1/2] ixgbe: disallow ipsec tx offload when in sr-iov mode

2018-08-22 Thread Shannon Nelson
There seems to be a problem in the x540's internal switch wherein if SR/IOV
mode is enabled and an offloaded IPsec packet is sent to a local VF,
the packet is silently dropped.  This might never be a problem as it is
somewhat a corner case, but if someone happens to be using IPsec offload
from the PF to a VF that just happens to get migrated to the local box,
communication will mysteriously fail.

Not good.

A simple way to protect from this is to simply not allow any IPsec offloads
for outgoing packets when num_vfs != 0.  This doesn't help any offloads that
were created before SR/IOV was enabled, but we'll get to that later.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 68395ab..24076b4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -697,6 +697,9 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
} else {
struct tx_sa tsa;
 
+   if (adapter->num_vfs)
+   return -EOPNOTSUPP;
+
/* find the first unused index */
ret = ixgbe_ipsec_find_empty_idx(ipsec, false);
if (ret < 0) {
-- 
2.7.4



[PATCH next-queue 2/2] ixgbe: fix the return value for unsupported VF offload

2018-08-22 Thread Shannon Nelson
When failing the request because we can't support that offload,
reporting EOPNOTSUPP makes much more sense than ENXIO.

Signed-off-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 24076b4..7890f4a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -898,7 +898,7 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter *adapter, 
u32 *msgbuf, u32 vf)
 * device, so block these requests for now.
 */
if (!(sam->flags & XFRM_OFFLOAD_INBOUND)) {
-   err = -ENXIO;
+   err = -EOPNOTSUPP;
goto err_out;
}
 
-- 
2.7.4



Re: is "volatile" the cause of ifconfig flags not matching sysfs flags?

2018-08-22 Thread Stephen Hemminger
On Wed, 22 Aug 2018 18:32:36 -0400 (EDT)
"Robert P. J. Day"  wrote:

>   almost certainly another dumb question, but i was poking around the
> sysfs, particularly /sys/class/net//*, to familiarize myself
> with what i can glean (or set) re interfaces under /sys, and i noticed
> "flags", but what i get there doesn't match what i get by running
> ifconfig.
> 
>   specifically, if i list the flags for my wireless interface under
> /sys:
> 
> $ cat flags
> 0x1003
> $
> 
>   but with ifconfig:
> 
> $ ifconfig wlp2s0
> wlp2s0: flags=4163  mtu 1500
>   
> 
>   do those two "flags" values represent the same set of flags? and
> does the obvious difference have to do with some of those flags being
> "volatile" as dewscribed in include/uapi/linux/if.h? or am i just
> totally misreading this?
> 
> rday
> 

sysfs reports netdevice->if_flags where as ifconfig is getting hex
value from SIOCGIFFLAGS which does:
dev_get_flags(dev)

The value in sysfs is more intended for internal debugging, where all the
normal userspace API's return a more limited set of historical values.


is "volatile" the cause of ifconfig flags not matching sysfs flags?

2018-08-22 Thread Robert P. J. Day


  almost certainly another dumb question, but i was poking around the
sysfs, particularly /sys/class/net//*, to familiarize myself
with what i can glean (or set) re interfaces under /sys, and i noticed
"flags", but what i get there doesn't match what i get by running
ifconfig.

  specifically, if i list the flags for my wireless interface under
/sys:

$ cat flags
0x1003
$

  but with ifconfig:

$ ifconfig wlp2s0
wlp2s0: flags=4163  mtu 1500
  

  do those two "flags" values represent the same set of flags? and
does the obvious difference have to do with some of those flags being
"volatile" as dewscribed in include/uapi/linux/if.h? or am i just
totally misreading this?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: [bpf PATCH 0/2] tls, sockmap, fixes for sk_wait_event

2018-08-22 Thread Daniel Borkmann
On 08/22/2018 05:37 PM, John Fastabend wrote:
> I have been testing ktls and sockmap lately and noticed that neither
> was handling sk_write_space events correctly. We need to ensure
> these events are pushed down to the lower layer in all cases to
> handle the case where the lower layer sendpage call has called
> sk_wait_event and needs to be woken up. Without this I see
> occosional stalls of sndtimeo length while we wait for the
> timeout value even though space is available.
> 
> Two fixes below. Thanks.
> 
> ---
> 
> John Fastabend (2):
>   tls: possible hang when do_tcp_sendpages hits sndbuf is full case
>   bpf: sockmap: write_space events need to be passed to TCP handler
> 
> 
>  kernel/bpf/sockmap.c |3 +++
>  net/tls/tls_main.c   |9 +++--
>  2 files changed, 10 insertions(+), 2 deletions(-)

Applied to bpf, thanks John!


[PATCH bpf] bpf: use per htab salt for bucket hash

2018-08-22 Thread Daniel Borkmann
All BPF hash and LRU maps currently have a known and global seed
we feed into jhash() which is 0. This is suboptimal, thus fix it
by generating a random seed upon hashtab setup time which we can
later on feed into jhash() on lookup, update and deletions.

Fixes: 0f8e4bd8a1fc8 ("bpf: add hashtable type of eBPF maps")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/hashtab.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 04b8eda..03cc59e 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "percpu_freelist.h"
 #include "bpf_lru_list.h"
@@ -41,6 +42,7 @@ struct bpf_htab {
atomic_t count; /* number of elements in this hashtable */
u32 n_buckets;  /* number of hash buckets */
u32 elem_size;  /* size of each element in bytes */
+   u32 hashrnd;
 };
 
 /* each htab element is struct htab_elem + key + value */
@@ -371,6 +373,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
if (!htab->buckets)
goto free_htab;
 
+   htab->hashrnd = get_random_int();
for (i = 0; i < htab->n_buckets; i++) {
INIT_HLIST_NULLS_HEAD(>buckets[i].head, i);
raw_spin_lock_init(>buckets[i].lock);
@@ -402,9 +405,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
return ERR_PTR(err);
 }
 
-static inline u32 htab_map_hash(const void *key, u32 key_len)
+static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
 {
-   return jhash(key, key_len, 0);
+   return jhash(key, key_len, hashrnd);
 }
 
 static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
@@ -470,7 +473,7 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, 
void *key)
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
 
head = select_bucket(htab, hash);
 
@@ -597,7 +600,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void 
*key, void *next_key)
if (!key)
goto find_first_elem;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
 
head = select_bucket(htab, hash);
 
@@ -824,7 +827,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
 
b = __select_bucket(htab, hash);
head = >head;
@@ -880,7 +883,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, 
void *key, void *value,
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
 
b = __select_bucket(htab, hash);
head = >head;
@@ -945,7 +948,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map 
*map, void *key,
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
 
b = __select_bucket(htab, hash);
head = >head;
@@ -998,7 +1001,7 @@ static int __htab_lru_percpu_map_update_elem(struct 
bpf_map *map, void *key,
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
 
b = __select_bucket(htab, hash);
head = >head;
@@ -1071,7 +1074,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
b = __select_bucket(htab, hash);
head = >head;
 
@@ -1103,7 +1106,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, 
void *key)
 
key_size = map->key_size;
 
-   hash = htab_map_hash(key, key_size);
+   hash = htab_map_hash(key, key_size, htab->hashrnd);
b = __select_bucket(htab, hash);
head = >head;
 
-- 
2.9.5



[PATCH net 0/3] tcp_bbr: PROBE_RTT minor bug fixes

2018-08-22 Thread Kevin Yang
From: "Kevin(Yudong) Yang" 

This series includes two minor bug fixes for the TCP BBR PROBE_RTT
mechanism, and one preparatory patch:

(1) A preparatory patch to reorganize the PROBE_RTT logic by refactoring
(into its own function) the code to exit PROBE_RTT, since the next
patch will be using that code in a new context.

(2) Fix: When BBR restarts from idle and if BBR is in PROBE_RTT mode,
BBR should check if it's time to exit PROBE_RTT. If yes, then BBR
should exit PROBE_RTT mode and restore the cwnd to its full value.

(3) Fix: Apply the PROBE_RTT cwnd cap even if the count of fully-ACKed
packets is 0.

Kevin Yang (3):
  tcp_bbr: add bbr_check_probe_rtt_done() helper
  tcp_bbr: in restart from idle, see if we should exit PROBE_RTT
  tcp_bbr: apply PROBE_RTT cwnd cap even if acked==0

 net/ipv4/tcp_bbr.c | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

-- 
2.18.0.1017.ga543ac7ca45-goog



[PATCH net 2/3] tcp_bbr: in restart from idle, see if we should exit PROBE_RTT

2018-08-22 Thread Kevin Yang
This patch fix the case where BBR does not exit PROBE_RTT mode when
it restarts from idle. When BBR restarts from idle and if BBR is in
PROBE_RTT mode, BBR should check if it's time to exit PROBE_RTT. If
yes, then BBR should exit PROBE_RTT mode and restore the cwnd to its
full value.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang 
Signed-off-by: Neal Cardwell 
Reviewed-by: Yuchung Cheng 
Reviewed-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_bbr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index fd7bccf36a263..1d4bdd3b5e4d0 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -174,6 +174,8 @@ static const u32 bbr_lt_bw_diff = 4000 / 8;
 /* If we estimate we're policed, use lt_bw for this many round trips: */
 static const u32 bbr_lt_bw_max_rtts = 48;
 
+static void bbr_check_probe_rtt_done(struct sock *sk);
+
 /* Do we estimate that STARTUP filled the pipe? */
 static bool bbr_full_bw_reached(const struct sock *sk)
 {
@@ -308,6 +310,8 @@ static void bbr_cwnd_event(struct sock *sk, enum 
tcp_ca_event event)
 */
if (bbr->mode == BBR_PROBE_BW)
bbr_set_pacing_rate(sk, bbr_bw(sk), BBR_UNIT);
+   else if (bbr->mode == BBR_PROBE_RTT)
+   bbr_check_probe_rtt_done(sk);
}
 }
 
-- 
2.18.0.1017.ga543ac7ca45-goog



[PATCH net 3/3] tcp_bbr: apply PROBE_RTT cwnd cap even if acked==0

2018-08-22 Thread Kevin Yang
This commit fixes a corner case where TCP BBR would enter PROBE_RTT
mode but not reduce its cwnd. If a TCP receiver ACKed less than one
full segment, the number of delivered/acked packets was 0, so that
bbr_set_cwnd() would short-circuit and exit early, without cutting
cwnd to the value we want for PROBE_RTT.

The fix is to instead make sure that even when 0 full packets are
ACKed, we do apply all the appropriate caps, including the cap that
applies in PROBE_RTT mode.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang 
Signed-off-by: Neal Cardwell 
Reviewed-by: Yuchung Cheng 
Reviewed-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_bbr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 1d4bdd3b5e4d0..02ff2dde96094 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -420,10 +420,10 @@ static void bbr_set_cwnd(struct sock *sk, const struct 
rate_sample *rs,
 {
struct tcp_sock *tp = tcp_sk(sk);
struct bbr *bbr = inet_csk_ca(sk);
-   u32 cwnd = 0, target_cwnd = 0;
+   u32 cwnd = tp->snd_cwnd, target_cwnd = 0;
 
if (!acked)
-   return;
+   goto done;  /* no packet fully ACKed; just apply caps */
 
if (bbr_set_cwnd_to_recover_or_restore(sk, rs, acked, ))
goto done;
-- 
2.18.0.1017.ga543ac7ca45-goog



[PATCH net 1/3] tcp_bbr: add bbr_check_probe_rtt_done() helper

2018-08-22 Thread Kevin Yang
This patch add a helper function bbr_check_probe_rtt_done() to
  1. check the condition to see if bbr should exit probe_rtt mode;
  2. process the logic of exiting probe_rtt mode.

Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Kevin Yang 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_bbr.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index 13d34427ca3dd..fd7bccf36a263 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -95,11 +95,10 @@ struct bbr {
u32 mode:3,  /* current bbr_mode in state machine */
prev_ca_state:3, /* CA state on previous ACK */
packet_conservation:1,  /* use packet conservation? */
-   restore_cwnd:1,  /* decided to revert cwnd to old value */
round_start:1,   /* start of packet-timed tx->ack round? */
idle_restart:1,  /* restarting after idle? */
probe_rtt_round_done:1,  /* a BBR_PROBE_RTT round at 4 pkts? */
-   unused:12,
+   unused:13,
lt_is_sampling:1,/* taking long-term ("LT") samples now? */
lt_rtt_cnt:7,/* round trips in long-term interval */
lt_use_bw:1; /* use lt_bw as our bw estimate? */
@@ -396,17 +395,11 @@ static bool bbr_set_cwnd_to_recover_or_restore(
cwnd = tcp_packets_in_flight(tp) + acked;
} else if (prev_state >= TCP_CA_Recovery && state < TCP_CA_Recovery) {
/* Exiting loss recovery; restore cwnd saved before recovery. */
-   bbr->restore_cwnd = 1;
+   cwnd = max(cwnd, bbr->prior_cwnd);
bbr->packet_conservation = 0;
}
bbr->prev_ca_state = state;
 
-   if (bbr->restore_cwnd) {
-   /* Restore cwnd after exiting loss recovery or PROBE_RTT. */
-   cwnd = max(cwnd, bbr->prior_cwnd);
-   bbr->restore_cwnd = 0;
-   }
-
if (bbr->packet_conservation) {
*new_cwnd = max(cwnd, tcp_packets_in_flight(tp) + acked);
return true;/* yes, using packet conservation */
@@ -748,6 +741,20 @@ static void bbr_check_drain(struct sock *sk, const struct 
rate_sample *rs)
bbr_reset_probe_bw_mode(sk);  /* we estimate queue is drained */
 }
 
+static void bbr_check_probe_rtt_done(struct sock *sk)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+   struct bbr *bbr = inet_csk_ca(sk);
+
+   if (!(bbr->probe_rtt_done_stamp &&
+ after(tcp_jiffies32, bbr->probe_rtt_done_stamp)))
+   return;
+
+   bbr->min_rtt_stamp = tcp_jiffies32;  /* wait a while until PROBE_RTT */
+   tp->snd_cwnd = max(tp->snd_cwnd, bbr->prior_cwnd);
+   bbr_reset_mode(sk);
+}
+
 /* The goal of PROBE_RTT mode is to have BBR flows cooperatively and
  * periodically drain the bottleneck queue, to converge to measure the true
  * min_rtt (unloaded propagation delay). This allows the flows to keep queues
@@ -806,12 +813,8 @@ static void bbr_update_min_rtt(struct sock *sk, const 
struct rate_sample *rs)
} else if (bbr->probe_rtt_done_stamp) {
if (bbr->round_start)
bbr->probe_rtt_round_done = 1;
-   if (bbr->probe_rtt_round_done &&
-   after(tcp_jiffies32, bbr->probe_rtt_done_stamp)) {
-   bbr->min_rtt_stamp = tcp_jiffies32;
-   bbr->restore_cwnd = 1;  /* snap to prior_cwnd */
-   bbr_reset_mode(sk);
-   }
+   if (bbr->probe_rtt_round_done)
+   bbr_check_probe_rtt_done(sk);
}
}
/* Restart after idle ends only once we process a new S/ACK for data */
@@ -862,7 +865,6 @@ static void bbr_init(struct sock *sk)
bbr->has_seen_rtt = 0;
bbr_init_pacing_rate_from_rtt(sk);
 
-   bbr->restore_cwnd = 0;
bbr->round_start = 0;
bbr->idle_restart = 0;
bbr->full_bw_reached = 0;
-- 
2.18.0.1017.ga543ac7ca45-goog



[PATCH ipsec-next] xfrm: allow driver to quietly refuse offload

2018-08-22 Thread Shannon Nelson
If the "offload" attribute is used to create an IPsec SA
and the .xdo_dev_state_add() fails, the SA creation fails.
However, if the "offload" attribute is used on a device that
doesn't offer it, the attribute is quietly ignored and the SA
is created without an offload.

Along the same line of that second case, it would be good to
have a way for the device to refuse to offload an SA without
failing the whole SA creation.  This patch adds that feature
by allowing the driver to return -EOPNOTSUPP as a signal that
the SA may be fine, it just can't be offloaded.

This allows the user a little more flexibility in requesting
offloads and not needing to know every detail at all times about
each specific NIC when trying to create SAs.

Signed-off-by: Shannon Nelson 
---

More specifically, this will help one user experience issue
with the coming ixgbevf IPsec offload.

 Documentation/networking/xfrm_device.txt | 4 
 net/xfrm/xfrm_device.c   | 6 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/xfrm_device.txt 
b/Documentation/networking/xfrm_device.txt
index 50c34ca..267f55b 100644
--- a/Documentation/networking/xfrm_device.txt
+++ b/Documentation/networking/xfrm_device.txt
@@ -68,6 +68,10 @@ and an indication of whether it is for Rx or Tx.  The driver 
should
- verify the algorithm is supported for offloads
- store the SA information (key, salt, target-ip, protocol, etc)
- enable the HW offload of the SA
+   - return status value:
+   0 success
+   -EOPNETSUPP   offload not supported, try SW IPsec
+   other fail the request
 
 The driver can also set an offload_handle in the SA, an opaque void pointer
 that can be used to convey context into the fast-path offload requests.
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 5611b75..3a1d9d6 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -192,9 +192,13 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state 
*x,
 
err = dev->xfrmdev_ops->xdo_dev_state_add(x);
if (err) {
+   xso->num_exthdrs = 0;
+   xso->flags = 0;
xso->dev = NULL;
dev_put(dev);
-   return err;
+
+   if (err != -EOPNOTSUPP)
+   return err;
}
 
return 0;
-- 
2.7.4



[PATCH net] ipv4: tcp: send zero IPID for RST and ACK sent in SYN-RECV and TIME-WAIT state

2018-08-22 Thread Eric Dumazet
tcp uses per-cpu (and per namespace) sockets (net->ipv4.tcp_sk) internally
to send some control packets.

1) RST packets, through tcp_v4_send_reset()
2) ACK packets in SYN-RECV and TIME-WAIT state, through tcp_v4_send_ack()

These packets assert IP_DF, and also use the hashed IP ident generator
to provide an IPv4 ID number.

Geoff Alexander reported this could be used to build off-path attacks.

These packets should not be fragmented, since their size is smaller than
IPV4_MIN_MTU. Only some tunneled paths could eventually have to fragment,
regardless of inner IPID.

We really can use zero IPID, to address the flaw, and as a bonus,
avoid a couple of atomic operations in ip_idents_reserve()

Signed-off-by: Eric Dumazet 
Reported-by: Geoff Alexander 
Tested-by: Geoff Alexander 
---
 net/ipv4/tcp_ipv4.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
9e041fa5c545367961f03fa8a9124aebbc1b6c69..44c09eddbb781c03da2417aaa925e360de01a6e9
 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2517,6 +2517,12 @@ static int __net_init tcp_sk_init(struct net *net)
if (res)
goto fail;
sock_set_flag(sk, SOCK_USE_WRITE_QUEUE);
+
+   /* Please enforce IP_DF and IPID==0 for RST and
+* ACK sent in SYN-RECV and TIME-WAIT state.
+*/
+   inet_sk(sk)->pmtudisc = IP_PMTUDISC_DO;
+
*per_cpu_ptr(net->ipv4.tcp_sk, cpu) = sk;
}
 
-- 
2.18.0.1017.ga543ac7ca45-goog



Re: [Patch net] addrconf: reduce unnecessary atomic allocations

2018-08-22 Thread David Ahern
On 8/22/18 1:58 PM, Cong Wang wrote:
> All the 3 callers of addrconf_add_mroute() assert RTNL
> lock, they don't take any additional lock either, so
> it is safe to convert it to GFP_KERNEL.
> 
> Same for sit_add_v4_addrs().
> 
> Cc: David Ahern 
> Signed-off-by: Cong Wang 
> ---
>  net/ipv6/addrconf.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

Not sure how I missed the double ASSERT_RTNL() check for
sit_add_v4_addrs. Thanks for following up.

Reviewed-by: David Ahern 



[Patch net] addrconf: reduce unnecessary atomic allocations

2018-08-22 Thread Cong Wang
All the 3 callers of addrconf_add_mroute() assert RTNL
lock, they don't take any additional lock either, so
it is safe to convert it to GFP_KERNEL.

Same for sit_add_v4_addrs().

Cc: David Ahern 
Signed-off-by: Cong Wang 
---
 net/ipv6/addrconf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2fac4ad74867..d51a8c0b3372 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2398,7 +2398,7 @@ static void addrconf_add_mroute(struct net_device *dev)
 
ipv6_addr_set(_dst, htonl(0xFF00), 0, 0, 0);
 
-   ip6_route_add(, GFP_ATOMIC, NULL);
+   ip6_route_add(, GFP_KERNEL, NULL);
 }
 
 static struct inet6_dev *addrconf_add_dev(struct net_device *dev)
@@ -3062,7 +3062,7 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
if (addr.s6_addr32[3]) {
add_addr(idev, , plen, scope);
addrconf_prefix_route(, plen, 0, idev->dev, 0, pflags,
- GFP_ATOMIC);
+ GFP_KERNEL);
return;
}
 
@@ -3087,7 +3087,7 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
 
add_addr(idev, , plen, flag);
addrconf_prefix_route(, plen, 0, idev->dev,
- 0, pflags, GFP_ATOMIC);
+ 0, pflags, GFP_KERNEL);
}
}
}
-- 
2.14.4



[Patch net] net_sched: fix a compiler warning for tcf_exts_for_each_action()

2018-08-22 Thread Cong Wang
When CONFIG_NET_CLS_ACT=n, tcf_exts_for_each_action()
is a nop, which leaves its parameters unused. Shut up
the compiler warning by casting them to void.

Fixes: 244cd96adb5f ("net_sched: remove list_head from tc_action")
Reported-by: Stephen Rothwell 
Signed-off-by: Cong Wang 
---
 include/net/pkt_cls.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index c17d51865469..9ec471ffaa5d 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -303,7 +303,7 @@ static inline void tcf_exts_put_net(struct tcf_exts *exts)
for (i = 0; i < TCA_ACT_MAX_PRIO && ((a) = (exts)->actions[i]); i++)
 #else
 #define tcf_exts_for_each_action(i, a, exts) \
-   for (; 0; )
+   for ((void)i, (void)a; 0; )
 #endif
 
 static inline void
-- 
2.14.4



[RFC PATCH v2 bpf-next 2/2] bpf/verifier: display non-spill stack slot types in print_verifier_state

2018-08-22 Thread Edward Cree
If a stack slot does not hold a spilled register (STACK_SPILL), then each
 of its eight bytes could potentially have a different slot_type.  This
 information can be important for debugging, and previously we either did
 not print anything for the stack slot, or just printed fp-X=0 in the case
 where its first byte was STACK_ZERO.
Instead, print eight characters with either 0 (STACK_ZERO), m (STACK_MISC)
 or ? (STACK_INVALID) for any stack slot which is neither STACK_SPILL nor
 entirely STACK_INVALID.

Signed-off-by: Edward Cree 
---
 kernel/bpf/verifier.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b11d45916fff..2f4b52cf864c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -263,6 +263,13 @@ static const char * const reg_type_str[] = {
[PTR_TO_PACKET_END] = "pkt_end",
 };
 
+static char slot_type_char[] = {
+   [STACK_INVALID] = '?',
+   [STACK_SPILL]   = 'r',
+   [STACK_MISC]= 'm',
+   [STACK_ZERO]= '0',
+};
+
 static void print_liveness(struct bpf_verifier_env *env,
   enum bpf_reg_liveness live)
 {
@@ -349,15 +356,26 @@ static void print_verifier_state(struct bpf_verifier_env 
*env,
}
}
for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) {
-   if (state->stack[i].slot_type[0] == STACK_SPILL) {
-   verbose(env, " fp%d",
-   (-i - 1) * BPF_REG_SIZE);
-   print_liveness(env, state->stack[i].spilled_ptr.live);
+   char types_buf[BPF_REG_SIZE + 1];
+   bool valid = false;
+   int j;
+
+   for (j = 0; j < BPF_REG_SIZE; j++) {
+   if (state->stack[i].slot_type[j] != STACK_INVALID)
+   valid = true;
+   types_buf[j] = slot_type_char[
+   state->stack[i].slot_type[j]];
+   }
+   types_buf[BPF_REG_SIZE] = 0;
+   if (!valid)
+   continue;
+   verbose(env, " fp%d", (-i - 1) * BPF_REG_SIZE);
+   print_liveness(env, state->stack[i].spilled_ptr.live);
+   if (state->stack[i].slot_type[0] == STACK_SPILL)
verbose(env, "=%s",
reg_type_str[state->stack[i].spilled_ptr.type]);
-   }
-   if (state->stack[i].slot_type[0] == STACK_ZERO)
-   verbose(env, " fp%d=0", (-i - 1) * BPF_REG_SIZE);
+   else
+   verbose(env, "=%s", types_buf);
}
verbose(env, "\n");
 }


[RFC PATCH v2 bpf-next 1/2] bpf/verifier: per-register parent pointers

2018-08-22 Thread Edward Cree
By giving each register its own liveness chain, we elide the skip_callee()
 logic.  Instead, each register's parent is the state it inherits from;
 both check_func_call() and prepare_func_exit() automatically connect
 reg states to the correct chain since when they copy the reg state across
 (r1-r5 into the callee as args, and r0 out as the return value) they also
 copy the parent pointer.

Signed-off-by: Edward Cree 
---
 include/linux/bpf_verifier.h |   8 +-
 kernel/bpf/verifier.c| 184 +++
 2 files changed, 47 insertions(+), 145 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 38b04f559ad3..b42b60a83e19 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -41,6 +41,7 @@ enum bpf_reg_liveness {
 };
 
 struct bpf_reg_state {
+   /* Ordering of fields matters.  See states_equal() */
enum bpf_reg_type type;
union {
/* valid when type == PTR_TO_PACKET */
@@ -59,7 +60,6 @@ struct bpf_reg_state {
 * came from, when one is tested for != NULL.
 */
u32 id;
-   /* Ordering of fields matters.  See states_equal() */
/* For scalar types (SCALAR_VALUE), this represents our knowledge of
 * the actual value.
 * For pointer types, this represents the variable part of the offset
@@ -76,15 +76,15 @@ struct bpf_reg_state {
s64 smax_value; /* maximum possible (s64)value */
u64 umin_value; /* minimum possible (u64)value */
u64 umax_value; /* maximum possible (u64)value */
+   /* parentage chain for liveness checking */
+   struct bpf_reg_state *parent;
/* Inside the callee two registers can be both PTR_TO_STACK like
 * R1=fp-8 and R2=fp-8, but one of them points to this function stack
 * while another to the caller's stack. To differentiate them 'frameno'
 * is used which is an index in bpf_verifier_state->frame[] array
 * pointing to bpf_func_state.
-* This field must be second to last, for states_equal() reasons.
 */
u32 frameno;
-   /* This field must be last, for states_equal() reasons. */
enum bpf_reg_liveness live;
 };
 
@@ -107,7 +107,6 @@ struct bpf_stack_state {
  */
 struct bpf_func_state {
struct bpf_reg_state regs[MAX_BPF_REG];
-   struct bpf_verifier_state *parent;
/* index of call instruction that called into this func */
int callsite;
/* stack frame number of this function state from pov of
@@ -129,7 +128,6 @@ struct bpf_func_state {
 struct bpf_verifier_state {
/* call stack tracking */
struct bpf_func_state *frame[MAX_CALL_FRAMES];
-   struct bpf_verifier_state *parent;
u32 curframe;
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ca90679a7fe5..b11d45916fff 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -380,9 +380,9 @@ static int copy_stack_state(struct bpf_func_state *dst,
 /* do_check() starts with zero-sized stack in struct bpf_verifier_state to
  * make it consume minimal amount of memory. check_stack_write() access from
  * the program calls into realloc_func_state() to grow the stack size.
- * Note there is a non-zero 'parent' pointer inside bpf_verifier_state
- * which this function copies over. It points to previous bpf_verifier_state
- * which is never reallocated
+ * Note there is a non-zero parent pointer inside each reg of 
bpf_verifier_state
+ * which this function copies over. It points to corresponding reg in previous
+ * bpf_verifier_state which is never reallocated
  */
 static int realloc_func_state(struct bpf_func_state *state, int size,
  bool copy_old)
@@ -466,7 +466,6 @@ static int copy_verifier_state(struct bpf_verifier_state 
*dst_state,
dst_state->frame[i] = NULL;
}
dst_state->curframe = src->curframe;
-   dst_state->parent = src->parent;
for (i = 0; i <= src->curframe; i++) {
dst = dst_state->frame[i];
if (!dst) {
@@ -732,6 +731,7 @@ static void init_reg_state(struct bpf_verifier_env *env,
for (i = 0; i < MAX_BPF_REG; i++) {
mark_reg_not_init(env, regs, i);
regs[i].live = REG_LIVE_NONE;
+   regs[i].parent = NULL;
}
 
/* frame pointer */
@@ -876,74 +876,21 @@ static int check_subprogs(struct bpf_verifier_env *env)
return 0;
 }
 
-static
-struct bpf_verifier_state *skip_callee(struct bpf_verifier_env *env,
-  const struct bpf_verifier_state *state,
-  struct bpf_verifier_state *parent,
-  u32 regno)
-{
-   struct bpf_verifier_state *tmp = NULL;
-
-   /* 'parent' could be a state of caller and
-* 'state' could be a state of callee. In such case
-* parent->curframe < 

[RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

2018-08-22 Thread Edward Cree
The first patch is a simplification of register liveness tracking by using
 a separate parentage chain for each register and stack slot, thus avoiding
 the need for logic to handle callee-saved registers when applying read
 marks.  In the future this idea may be extended to form use-def chains.
The second patch adds information about misc/zero data on the stack to the
 state dumps emitted to the log at various points; this information was
 found essential in debugging the first patch, and may be useful elsewhere.

Edward Cree (2):
  bpf/verifier: per-register parent pointers
  bpf/verifier: display non-spill stack slot types in
print_verifier_state

 include/linux/bpf_verifier.h |   8 +-
 kernel/bpf/verifier.c| 216 ++-
 2 files changed, 72 insertions(+), 152 deletions(-)



Re: [bpf PATCH 0/2] tls, sockmap, fixes for sk_wait_event

2018-08-22 Thread Alexei Starovoitov
On Wed, Aug 22, 2018 at 08:37:27AM -0700, John Fastabend wrote:
> I have been testing ktls and sockmap lately and noticed that neither
> was handling sk_write_space events correctly. We need to ensure
> these events are pushed down to the lower layer in all cases to
> handle the case where the lower layer sendpage call has called
> sk_wait_event and needs to be woken up. Without this I see
> occosional stalls of sndtimeo length while we wait for the
> timeout value even though space is available.
> 
> Two fixes below. Thanks.

for the set
Acked-by: Alexei Starovoitov 



Re: [bpf PATCH 1/2] tls: possible hang when do_tcp_sendpages hits sndbuf is full case

2018-08-22 Thread Dave Watson
On 08/22/18 08:37 AM, John Fastabend wrote:
> Currently, the lower protocols sk_write_space handler is not called if
> TLS is sending a scatterlist via  tls_push_sg. However, normally
> tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
> that in turn may trigger a wait via sk_wait_event. Typically, this
> happens when the in-flight bytes exceed the sdnbuf size. In the normal
> case when enough ACKs are received sk_write_space() will be called and
> the sk_wait_event will be woken up allowing it to send more data
> and/or return to the user.
> 
> But, in the TLS case because the sk_write_space() handler does not
> wake up the events the above send will wait until the sndtimeo is
> exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
> hang to the user (especially this impatient user). To fix this pass
> the sk_write_space event to the lower layers sk_write_space event
> which in the TCP case will wake any pending events.
> 
> I observed the above while integrating sockmap and ktls. It
> initially appeared as test_sockmap (modified to use ktls) occasionally
> hanging. To reliably reproduce this reduce the sndbuf size and stress
> the tls layer by sending many 1B sends. This results in every byte
> needing a header and each byte individually being sent to the crypto
> layer.
> 
> Signed-off-by: John Fastabend 

Super, thanks!

Acked-by: Dave Watson 


[PATCH iproute2 3/3] testsuite: run dmesg with sudo

2018-08-22 Thread Luca Boccassi
Some distributions like Debian nowadays restrict the dmesg command to
root-only. Run it with sudo in the testsuite.

Signed-off-by: Luca Boccassi 
---
 testsuite/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testsuite/Makefile b/testsuite/Makefile
index 5e269877..ef45d5a7 100644
--- a/testsuite/Makefile
+++ b/testsuite/Makefile
@@ -79,5 +79,5 @@ endif
echo "PASS"; \
fi; \
rm "$$TMP_ERR" "$$TMP_OUT"; \
-   dmesg > $(RESULTS_DIR)/$@.$$o.dmesg; \
+   sudo dmesg > $(RESULTS_DIR)/$@.$$o.dmesg; \
done
-- 
2.18.0



[PATCH iproute2 2/3] testsuite: let make compile build the netlink helper

2018-08-22 Thread Luca Boccassi
The generate_nlmsg binary is required but make -C testsuite compile
does not build it. Add the necessary includes and C*FLAGS to the tools
Makefile and have the compile target build it.

Signed-off-by: Luca Boccassi 
---
 testsuite/Makefile   | 1 +
 testsuite/tools/Makefile | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/testsuite/Makefile b/testsuite/Makefile
index 2acd0427..5e269877 100644
--- a/testsuite/Makefile
+++ b/testsuite/Makefile
@@ -32,6 +32,7 @@ configure:
 
 compile: configure
echo "Entering iproute2" && cd iproute2 && $(MAKE) && cd ..;
+   $(MAKE) -C tools
 
 listtests:
@for t in $(TESTS); do \
diff --git a/testsuite/tools/Makefile b/testsuite/tools/Makefile
index f0ce4ee2..c936af71 100644
--- a/testsuite/tools/Makefile
+++ b/testsuite/tools/Makefile
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
+include ../../config.mk
+
 generate_nlmsg: generate_nlmsg.c ../../lib/libnetlink.c
-   $(CC) -o $@ $^
+   $(CC) $(CPPFLAGS) $(CFLAGS) $(LDLIBS) $(EXTRA_CFLAGS) -I../../include 
-include../../include/uapi/linux/netlink.h -o $@ $^
 
 clean:
rm -f generate_nlmsg
-- 
2.18.0



[PATCH iproute2 1/3] testsuite: remove all temp files and implement make clean

2018-08-22 Thread Luca Boccassi
Some generated test files were not removed, including one executable in
the testsuite/tools directory.
Ensure make clean from the top level directory works for the testsuite
subdirs too, and that all the files are removed.

Signed-off-by: Luca Boccassi 
---
 Makefile | 2 +-
 testsuite/Makefile   | 3 +++
 testsuite/tools/Makefile | 3 +++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 651d2a50..ea2f797c 100644
--- a/Makefile
+++ b/Makefile
@@ -96,7 +96,7 @@ snapshot:
> include/SNAPSHOT.h
 
 clean:
-   @for i in $(SUBDIRS); \
+   @for i in $(SUBDIRS) testsuite; \
do $(MAKE) $(MFLAGS) -C $$i clean; done
 
 clobber:
diff --git a/testsuite/Makefile b/testsuite/Makefile
index 8fcbc557..2acd0427 100644
--- a/testsuite/Makefile
+++ b/testsuite/Makefile
@@ -43,6 +43,9 @@ alltests: $(TESTS)
 clean:
@echo "Removing $(RESULTS_DIR) dir ..."
@rm -rf $(RESULTS_DIR)
+   @rm -f iproute2/iproute2-this
+   @rm -f tests/ip/link/dev_wo_vf_rate.nl
+   $(MAKE) -C tools clean
 
 distclean: clean
echo "Entering iproute2" && cd iproute2 && $(MAKE) distclean && cd ..;
diff --git a/testsuite/tools/Makefile b/testsuite/tools/Makefile
index f2cdc980..f0ce4ee2 100644
--- a/testsuite/tools/Makefile
+++ b/testsuite/tools/Makefile
@@ -1,3 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 generate_nlmsg: generate_nlmsg.c ../../lib/libnetlink.c
$(CC) -o $@ $^
+
+clean:
+   rm -f generate_nlmsg
-- 
2.18.0



[PATCH bpf] bpf, sockmap: fix sock hash count in alloc_sock_hash_elem

2018-08-22 Thread Daniel Borkmann
When we try to allocate a new sock hash entry and the allocation
fails, then sock hash map fails to reduce the map element counter,
meaning we keep accounting this element although it was never used.
Fix it by dropping the element counter on error.

Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Signed-off-by: Daniel Borkmann 
Acked-by: John Fastabend 
---
 kernel/bpf/sockmap.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 60ceb0e..40c6ef9 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2269,8 +2269,10 @@ static struct htab_elem *alloc_sock_hash_elem(struct 
bpf_htab *htab,
}
l_new = kmalloc_node(htab->elem_size, GFP_ATOMIC | __GFP_NOWARN,
 htab->map.numa_node);
-   if (!l_new)
+   if (!l_new) {
+   atomic_dec(>count);
return ERR_PTR(-ENOMEM);
+   }
 
memcpy(l_new->key, key, key_size);
l_new->sk = sk;
-- 
2.9.5



Re: ixgbe hangs when XDP_TX is enabled

2018-08-22 Thread Jeff Kirsher
On Tue, 2018-08-21 at 11:13 -0700, Alexander Duyck wrote:
> On Tue, Aug 21, 2018 at 9:59 AM Nikita V. Shirokov <
> tehn...@tehnerd.com> wrote:
> > 
> > On Tue, Aug 21, 2018 at 08:58:15AM -0700, Alexander Duyck wrote:
> > > On Mon, Aug 20, 2018 at 12:32 PM Nikita V. Shirokov <
> > > tehn...@tehnerd.com> wrote:
> > > > 
> > > > we are getting such errors:
> > > > 
> > > > [  408.737313] ixgbe :03:00.0 eth0: Detected Tx Unit Hang
> > > > (XDP)
> > > >   Tx Queue <46>
> > > >   TDH, TDT <0>, <2>
> > > >   next_to_use  <2>
> > > >   next_to_clean<0>
> > > > tx_buffer_info[next_to_clean]
> > > >   time_stamp   <0>
> > > >   jiffies  <1000197c0>
> > > > [  408.804438] ixgbe :03:00.0 eth0: tx hang 1 detected on
> > > > queue 46, resetting adapter
> > > > [  408.804440] ixgbe :03:00.0 eth0: initiating reset due to
> > > > tx timeout
> > > > [  408.817679] ixgbe :03:00.0 eth0: Reset adapter
> > > > [  408.866091] ixgbe :03:00.0 eth0: TXDCTL.ENABLE for one
> > > > or more queues not cleared within the polling period
> > > > [  409.345289] ixgbe :03:00.0 eth0: detected SFP+: 3
> > > > [  409.497232] ixgbe :03:00.0 eth0: NIC Link is Up 10 Gbps,
> > > > Flow Control: RX/TX
> > > > 
> > > > while running XDP prog on ixgbe nic.
> > > > right now i'm seing this on bpfnext kernel
> > > > (latest commit from Wed Aug 15 15:04:25 2018 -0700 ;
> > > > 9a76aba02a37718242d7cdc294f0a3901928aa57)
> > > > 
> > > > looks like this is the same issue as reported by Brenden in
> > > > https://www.spinics.net/lists/netdev/msg439438.html
> > > > 
> > > > --
> > > > Nikita V. Shirokov
> > > 
> > > Could you provide some additional information about your setup.
> > > Specifically useful would be "ethtool -i", "ethtool -l", and
> > > lspci
> > > -vvv info for your device. The total number of CPUs on the system
> > > would be useful to know as well. In addition could you try
> > > reproducing
> > 
> > sure:
> > 
> > ethtool -l eth0
> > Channel parameters for eth0:
> > Pre-set maximums:
> > RX: 0
> > TX: 0
> > Other:  1
> > Combined:   63
> > Current hardware settings:
> > RX: 0
> > TX: 0
> > Other:  1
> > Combined:   48
> > 
> > # ethtool -i eth0
> > driver: ixgbe
> > version: 5.1.0-k
> > firmware-version: 0x86f1
> > expansion-rom-version:
> > bus-info: :03:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: yes
> > 
> > 
> > # nproc
> > 48
> > 
> > lspci:
> > 
> > 03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> >  Subsystem: Intel Corporation Device 000d
> >  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
> > VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
> >  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > >TAbort- SERR-  >  Latency: 0, Cache Line Size: 32 bytes
> >  Interrupt: pin A routed to IRQ 30
> >  NUMA node: 0
> >  Region 0: Memory at c7d0 (64-bit, non-prefetchable)
> > [size=1M]
> >  Region 2: I/O ports at 6000 [size=32]
> >  Region 4: Memory at c7e8 (64-bit, non-prefetchable)
> > [size=16K]
> >  Expansion ROM at c7e0 [disabled] [size=512K]
> >  Capabilities: [40] Power Management version 3
> >  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1
> > PME-
> >  Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> >  Address:   Data: 
> >  Masking:   Pending: 
> >  Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
> >  Vector table: BAR=4 offset=
> >  PBA: BAR=4 offset=2000
> >  Capabilities: [a0] Express (v2) Endpoint, MSI 00
> >  DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency
> > L0s <512ns, L1 <64us
> >  ExtTag- AttnBtn- AttnInd- PwrInd- RBE+
> > FLReset+ SlotPowerLimit 0.000W
> >  DevCtl: Report errors: Correctable+ Non-Fatal+
> > Fatal+ Unsupported+
> >  RlxdOrd- ExtTag- PhantFunc- AuxPwr-
> > NoSnoop+ FLReset-
> >  MaxPayload 256 bytes, MaxReadReq 512 bytes
> >  DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+
> > AuxPwr+ TransPend+
> >  LnkCap: Port #2, Speed 5GT/s, Width x8, ASPM L0s,
> > Exit Latency L0s unlimited, L1 <8us
> >  ClockPM- Surprise- LLActRep- BwNot-
> > ASPMOptComp-
> >  LnkCtl: ASPM Disabled; RCB 64 bytes Disabled-
> > CommClk+
> >  

[bpf PATCH 2/2] bpf: sockmap: write_space events need to be passed to TCP handler

2018-08-22 Thread John Fastabend
When sockmap code is using the stream parser it also handles the write
space events in order to handle the case where (a) verdict redirects
skb to another socket and (b) the sockmap then sends the skb but due
to memory constraints (or other EAGAIN errors) needs to do a retry.

But the initial code missed a third case where the
skb_send_sock_locked() triggers an sk_wait_event(). A typically case
would be when sndbuf size is exceeded. If this happens because we
do not pass the write_space event to the lower layers we never wake
up the event and it will wait for sndtimeo. Which as noted in ktls
fix may be rather large and look like a hang to the user.

To reproduce the best test is to reduce the sndbuf size and send
1B data chunks to stress the memory handling.

To fix this pass the event from the upper layer to the lower layer.

Signed-off-by: John Fastabend 
---
 kernel/bpf/sockmap.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 98e621a..1d092f3 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1427,12 +1427,15 @@ static void smap_tx_work(struct work_struct *w)
 static void smap_write_space(struct sock *sk)
 {
struct smap_psock *psock;
+   void (*write_space)(struct sock *sk);
 
rcu_read_lock();
psock = smap_psock_sk(sk);
if (likely(psock && test_bit(SMAP_TX_RUNNING, >state)))
schedule_work(>tx_work);
+   write_space = psock->save_write_space;
rcu_read_unlock();
+   write_space(sk);
 }
 
 static void smap_stop_sock(struct smap_psock *psock, struct sock *sk)



[bpf PATCH 0/2] tls, sockmap, fixes for sk_wait_event

2018-08-22 Thread John Fastabend
I have been testing ktls and sockmap lately and noticed that neither
was handling sk_write_space events correctly. We need to ensure
these events are pushed down to the lower layer in all cases to
handle the case where the lower layer sendpage call has called
sk_wait_event and needs to be woken up. Without this I see
occosional stalls of sndtimeo length while we wait for the
timeout value even though space is available.

Two fixes below. Thanks.

---

John Fastabend (2):
  tls: possible hang when do_tcp_sendpages hits sndbuf is full case
  bpf: sockmap: write_space events need to be passed to TCP handler


 kernel/bpf/sockmap.c |3 +++
 net/tls/tls_main.c   |9 +++--
 2 files changed, 10 insertions(+), 2 deletions(-)

--
Signature


[bpf PATCH 1/2] tls: possible hang when do_tcp_sendpages hits sndbuf is full case

2018-08-22 Thread John Fastabend
Currently, the lower protocols sk_write_space handler is not called if
TLS is sending a scatterlist via  tls_push_sg. However, normally
tls_push_sg calls do_tcp_sendpage, which may be under memory pressure,
that in turn may trigger a wait via sk_wait_event. Typically, this
happens when the in-flight bytes exceed the sdnbuf size. In the normal
case when enough ACKs are received sk_write_space() will be called and
the sk_wait_event will be woken up allowing it to send more data
and/or return to the user.

But, in the TLS case because the sk_write_space() handler does not
wake up the events the above send will wait until the sndtimeo is
exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a
hang to the user (especially this impatient user). To fix this pass
the sk_write_space event to the lower layers sk_write_space event
which in the TCP case will wake any pending events.

I observed the above while integrating sockmap and ktls. It
initially appeared as test_sockmap (modified to use ktls) occasionally
hanging. To reliably reproduce this reduce the sndbuf size and stress
the tls layer by sending many 1B sends. This results in every byte
needing a header and each byte individually being sent to the crypto
layer.

Signed-off-by: John Fastabend 
---
 net/tls/tls_main.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 93c0c22..180b664 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -213,9 +213,14 @@ static void tls_write_space(struct sock *sk)
 {
struct tls_context *ctx = tls_get_ctx(sk);
 
-   /* We are already sending pages, ignore notification */
-   if (ctx->in_tcp_sendpages)
+   /* If in_tcp_sendpages call lower protocol write space handler
+* to ensure we wake up any waiting operations there. For example
+* if do_tcp_sendpages where to call sk_wait_event.
+*/
+   if (ctx->in_tcp_sendpages) {
+   ctx->sk_write_space(sk);
return;
+   }
 
if (!sk->sk_write_pending && tls_is_pending_closed_record(ctx)) {
gfp_t sk_allocation = sk->sk_allocation;



Re: [PATCH] net: macb: do not disable MDIO bus at open/close time

2018-08-22 Thread Claudiu Beznea



On 20.08.2018 17:55, Anssi Hannula wrote:
> macb_reset_hw() is called from macb_close() and indirectly from
> macb_open(). macb_reset_hw() zeroes the NCR register, including the MPE
> (Management Port Enable) bit.
> 
> This will prevent accessing any other PHYs for other Ethernet MACs on
> the MDIO bus, which remains registered at macb_reset_hw() time, until
> macb_init_hw() is called from macb_open() which sets the MPE bit again.
> 
> I.e. currently the MDIO bus has a short disruption at open time and is
> disabled at close time until the interface is opened again.
> 
> Fix that by only touching the RE and TE bits when enabling and disabling
> RX/TX.
> 
> Fixes: 6c36a7074436 ("macb: Use generic PHY layer")
> Signed-off-by: Anssi Hannula 
> ---
> 
> Claudiu Beznea wrote:
>> On 10.08.2018 09:22, Anssi Hannula wrote:
>>>
>>> macb_reset_hw() is called in init path too,
>>
>> I only see it in macb_close() and macb_open() called from macb_init_hw().
> 
> Yeah, macb_init_hw() is what I meant :)
> 
> 
>  drivers/net/ethernet/cadence/macb_main.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c 
> b/drivers/net/ethernet/cadence/macb_main.c
> index dc09f9a8a49b..6501e9b3785a 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -2028,14 +2028,17 @@ static void macb_reset_hw(struct macb *bp)
>  {
>   struct macb_queue *queue;
>   unsigned int q;
> + u32 ctrl = macb_readl(bp, NCR);
>  
>   /* Disable RX and TX (XXX: Should we halt the transmission
>* more gracefully?)
>*/
> - macb_writel(bp, NCR, 0);
> + ctrl &= ~(MACB_BIT(RE) | MACB_BIT(TE));
>  
>   /* Clear the stats registers (XXX: Update stats first?) */
> - macb_writel(bp, NCR, MACB_BIT(CLRSTAT));
> + ctrl |= MACB_BIT(CLRSTAT);
> +
> + macb_writel(bp, NCR, ctrl);
>  
>   /* Clear all status flags */
>   macb_writel(bp, TSR, -1);
> @@ -2170,6 +2173,7 @@ static void macb_init_hw(struct macb *bp)
>   unsigned int q;
>  
>   u32 config;
> + u32 ctrl;
>  
>   macb_reset_hw(bp);
>   macb_set_hwaddr(bp);
> @@ -2223,7 +2227,9 @@ static void macb_init_hw(struct macb *bp)
>   }
>  
>   /* Enable TX and RX */
> - macb_writel(bp, NCR, MACB_BIT(RE) | MACB_BIT(TE) | MACB_BIT(MPE));
> + ctrl = macb_readl(bp, NCR);
> + ctrl |= MACB_BIT(RE) | MACB_BIT(TE);
> + macb_writel(bp, NCR, ctrl);

I would keep it as:

macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(RE) | MACB_BIT(TE));

>  }
>  
>  /* The hash address register is 64 bits long and takes up two
> 


[PATCH lora-next 3/3] net: lora: sx125x sx1301: allow radio to register as a clk provider

2018-08-22 Thread Ben Whitten
From: Ben Whitten 

The 32M is run from the radio, before we just enabled it based on
the radio number but now we can use the clk framework to request the
clk is started when we need it.

The 32M clock produced from the radio is really a gated version of
tcxo which is a fixed clock provided by hardware, and isn't captured
in this patch.

The sx1301 brings the clock up prior to calibration once the radios
have probed themselves.

A sample dts showing the clk link:
sx1301: sx1301@0 {
...
clocks = < 0>;
clock-names = "clk32m";

radio-spi {
radio0: radio-a@0 {
...
};

radio1: radio-b@1 {
#clock-cells = <0>;
clock-output-names = "clk32m";
};
};
};

Signed-off-by: Ben Whitten 
---
 drivers/net/lora/sx125x.c | 112 ++
 drivers/net/lora/sx1301.c |  13 ++
 drivers/net/lora/sx1301.h |   2 +
 3 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/drivers/net/lora/sx125x.c b/drivers/net/lora/sx125x.c
index b5517e4..5989157 100644
--- a/drivers/net/lora/sx125x.c
+++ b/drivers/net/lora/sx125x.c
@@ -9,6 +9,8 @@
  * Copyright (c) 2013 Semtech-Cycleo
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -42,10 +44,16 @@ static const struct reg_field sx125x_regmap_fields[] = {
 };
 
 struct sx125x_priv {
+   struct clk  *clkout;
+   struct clk_hw   clkout_hw;
+
+   struct device   *dev;
struct regmap   *regmap;
struct regmap_field 
*regmap_fields[ARRAY_SIZE(sx125x_regmap_fields)];
 };
 
+#define to_clkout(_hw) container_of(_hw, struct sx125x_priv, clkout_hw)
+
 static struct regmap_config __maybe_unused sx125x_regmap_config = {
.reg_bits = 8,
.val_bits = 8,
@@ -64,6 +72,96 @@ static int sx125x_field_write(struct sx125x_priv *priv,
return regmap_field_write(priv->regmap_fields[field_id], val);
 }
 
+static int sx125x_field_read(struct sx125x_priv *priv,
+   enum sx125x_fields field_id, unsigned int *val)
+{
+   return regmap_field_read(priv->regmap_fields[field_id], val);
+}
+
+static int sx125x_clkout_enable(struct clk_hw *hw)
+{
+   struct sx125x_priv *priv = to_clkout(hw);
+
+   dev_info(priv->dev, "enabling clkout\n");
+   return sx125x_field_write(priv, F_CLK_OUT, 1);
+}
+
+static void sx125x_clkout_disable(struct clk_hw *hw)
+{
+   struct sx125x_priv *priv = to_clkout(hw);
+   int ret;
+
+   dev_info(priv->dev, "disabling clkout\n");
+   ret = sx125x_field_write(priv, F_CLK_OUT, 0);
+   if (ret)
+   dev_err(priv->dev, "error disabling clkout\n");
+}
+
+static int sx125x_clkout_is_enabled(struct clk_hw *hw)
+{
+   struct sx125x_priv *priv = to_clkout(hw);
+   unsigned int enabled;
+   int ret;
+
+   ret = sx125x_field_read(priv, F_CLK_OUT, );
+   if (ret) {
+   dev_err(priv->dev, "error reading clk enable\n");
+   return 0;
+   }
+   return enabled;
+}
+
+static const struct clk_ops sx125x_clkout_ops = {
+   .enable = sx125x_clkout_enable,
+   .disable = sx125x_clkout_disable,
+   .is_enabled = sx125x_clkout_is_enabled,
+};
+
+static int sx125x_register_clock_provider(struct sx125x_priv *priv)
+{
+   struct device *dev = priv->dev;
+   struct clk_init_data init;
+   const char *parent;
+   int ret;
+
+   /* Disable CLKOUT */
+   ret = sx125x_field_write(priv, F_CLK_OUT, 0);
+   if (ret) {
+   dev_err(dev, "unable to disable clkout\n");
+   return ret;
+   }
+
+   /* Register clock provider if expected in DTB */
+   if (!of_find_property(dev->of_node, "#clock-cells", NULL))
+   return 0;
+
+   dev_info(dev, "registering clkout\n");
+
+   parent = of_clk_get_parent_name(dev->of_node, 0);
+   if (!parent) {
+   dev_err(dev, "Unable to find parent clk\n");
+   return -ENODEV;
+   }
+
+   init.ops = _clkout_ops;
+   init.flags = CLK_IS_BASIC;
+   init.parent_names = 
+   init.num_parents = 1;
+   priv->clkout_hw.init = 
+
+   of_property_read_string_index(dev->of_node, "clock-output-names", 0,
+   );
+
+   priv->clkout = devm_clk_register(dev, >clkout_hw);
+   if (IS_ERR(priv->clkout)) {
+   dev_err(dev, "failed to register clkout\n");
+   return PTR_ERR(priv->clkout);
+   }
+   ret = of_clk_add_hw_provider(dev->of_node, of_clk_hw_simple_get,
+   >clkout_hw);
+   return ret;
+}
+
 static int __maybe_unused sx125x_regmap_probe(struct device *dev, struct 
regmap *regmap, unsigned int radio)
 {
struct sx125x_priv *priv;
@@ -76,6 +174,7 @@ 

[PATCH lora-next 1/3] net: lora: sx1301: convert to using regmap fields for bit ops

2018-08-22 Thread Ben Whitten
From: Ben Whitten 

We convert to using regmap fields to allow bit access to the registers
where regmap handles the read update write.

Signed-off-by: Ben Whitten 
---
 drivers/net/lora/sx1301.c | 240 +-
 drivers/net/lora/sx1301.h |  46 +
 2 files changed, 113 insertions(+), 173 deletions(-)

diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
index 971d234..8aad331 100644
--- a/drivers/net/lora/sx1301.c
+++ b/drivers/net/lora/sx1301.c
@@ -24,27 +24,6 @@
 
 #include "sx1301.h"
 
-#define REG_PAGE_RESET_SOFT_RESET  BIT(7)
-
-#define REG_16_GLOBAL_EN   BIT(3)
-
-#define REG_17_CLK32M_EN   BIT(0)
-
-#define REG_0_105_FORCE_HOST_RADIO_CTRLBIT(1)
-#define REG_0_105_FORCE_HOST_FE_CTRL   BIT(2)
-#define REG_0_105_FORCE_DEC_FILTER_GAINBIT(3)
-
-#define REG_0_MCU_RST_0BIT(0)
-#define REG_0_MCU_RST_1BIT(1)
-#define REG_0_MCU_SELECT_MUX_0 BIT(2)
-#define REG_0_MCU_SELECT_MUX_1 BIT(3)
-
-#define REG_2_43_RADIO_A_ENBIT(0)
-#define REG_2_43_RADIO_B_ENBIT(1)
-#define REG_2_43_RADIO_RST BIT(2)
-
-#define REG_EMERGENCY_FORCE_HOST_CTRL  BIT(0)
-
 static const struct regmap_range_cfg sx1301_regmap_ranges[] = {
{
.name = "Pages",
@@ -74,6 +53,12 @@ static struct regmap_config sx1301_regmap_config = {
.max_register = SX1301_MAX_REGISTER,
 };
 
+static int sx1301_field_write(struct sx1301_priv *priv,
+   enum sx1301_fields field_id, u8 val)
+{
+   return regmap_field_write(priv->regmap_fields[field_id], val);
+}
+
 static int sx1301_read_burst(struct sx1301_priv *priv, u8 reg, u8 *val, size_t 
len)
 {
u8 addr = reg & 0x7f;
@@ -91,11 +76,6 @@ static int sx1301_write_burst(struct sx1301_priv *priv, u8 
reg, const u8 *val, s
return spi_sync_transfer(priv->spi, xfr, 2);
 }
 
-static int sx1301_soft_reset(struct sx1301_priv *priv)
-{
-   return regmap_write(priv->regmap, SX1301_PAGE, 
REG_PAGE_RESET_SOFT_RESET);
-}
-
 static int sx1301_agc_ram_read(struct sx1301_priv *priv, u8 addr, unsigned int 
*val)
 {
int ret;
@@ -137,7 +117,7 @@ static int sx1301_arb_ram_read(struct sx1301_priv *priv, u8 
addr, unsigned int *
 static int sx1301_load_firmware(struct sx1301_priv *priv, int mcu, const 
struct firmware *fw)
 {
u8 *buf;
-   u8 rst, select_mux;
+   enum sx1301_fields rst, select_mux;
unsigned int val;
int ret;
 
@@ -148,29 +128,26 @@ static int sx1301_load_firmware(struct sx1301_priv *priv, 
int mcu, const struct
 
switch (mcu) {
case 0:
-   rst = REG_0_MCU_RST_0;
-   select_mux = REG_0_MCU_SELECT_MUX_0;
+   rst = F_MCU_RST_0;
+   select_mux = F_MCU_SELECT_MUX_0;
break;
case 1:
-   rst = REG_0_MCU_RST_1;
-   select_mux = REG_0_MCU_SELECT_MUX_1;
+   rst = F_MCU_RST_1;
+   select_mux = F_MCU_SELECT_MUX_1;
break;
default:
return -EINVAL;
}
 
-   ret = regmap_read(priv->regmap, SX1301_MCU_CTRL, );
+   ret = sx1301_field_write(priv, rst, 1);
if (ret) {
-   dev_err(priv->dev, "MCU read failed\n");
+   dev_err(priv->dev, "MCU reset failed\n");
return ret;
}
 
-   val |= rst;
-   val &= ~select_mux;
-
-   ret = regmap_write(priv->regmap, SX1301_MCU_CTRL, val);
+   ret = sx1301_field_write(priv, select_mux, 0);
if (ret) {
-   dev_err(priv->dev, "MCU reset / select mux write failed\n");
+   dev_err(priv->dev, "MCU RAM select mux failed\n");
return ret;
}
 
@@ -211,17 +188,9 @@ static int sx1301_load_firmware(struct sx1301_priv *priv, 
int mcu, const struct
 
kfree(buf);
 
-   ret = regmap_read(priv->regmap, SX1301_MCU_CTRL, );
+   ret = sx1301_field_write(priv, select_mux, 1);
if (ret) {
-   dev_err(priv->dev, "MCU read (1) failed\n");
-   return ret;
-   }
-
-   val |= select_mux;
-
-   ret = regmap_write(priv->regmap, SX1301_MCU_CTRL, val);
-   if (ret) {
-   dev_err(priv->dev, "MCU reset / select mux write (1) failed\n");
+   dev_err(priv->dev, "MCU RAM release mux failed\n");
return ret;
}
 
@@ -247,17 +216,9 @@ static int sx1301_agc_calibrate(struct sx1301_priv *priv)
return ret;
}
 
-   ret = regmap_read(priv->regmap, SX1301_FORCE_CTRL, );
+   ret = sx1301_field_write(priv, F_FORCE_HOST_RADIO_CTRL, 0);
if (ret) {
-   dev_err(priv->dev, "0|105 read failed\n");
-   return ret;
-   }
-
-   val &= ~REG_0_105_FORCE_HOST_RADIO_CTRL;
-
-   ret = regmap_write(priv->regmap, SX1301_FORCE_CTRL, val);
-   if 

[PATCH lora-next 2/3] net: lora: sx125x: convert to regmap fields

2018-08-22 Thread Ben Whitten
From: Ben Whitten 

We convert to using regmap fields to allow regmap to take care of read
modify writes and bit shifting for ofset fields.

Signed-off-by: Ben Whitten 
---
 drivers/net/lora/sx125x.c | 59 ---
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/net/lora/sx125x.c b/drivers/net/lora/sx125x.c
index 3476a59..b5517e4 100644
--- a/drivers/net/lora/sx125x.c
+++ b/drivers/net/lora/sx125x.c
@@ -25,11 +25,25 @@
 
 #include "sx125x.h"
 
-#define REG_CLK_SELECT_TX_DAC_CLK_SELECT_CLK_INBIT(0)
-#define REG_CLK_SELECT_CLK_OUT BIT(1)
+enum sx125x_fields {
+   F_CLK_OUT,
+   F_TX_DAC_CLK_SEL,
+   F_SX1257_XOSC_GM_STARTUP,
+   F_SX1257_XOSC_DISABLE_CORE,
+};
+
+static const struct reg_field sx125x_regmap_fields[] = {
+   /* CLK_SELECT */
+   [F_CLK_OUT]= REG_FIELD(SX125X_CLK_SELECT, 1, 1),
+   [F_TX_DAC_CLK_SEL] = REG_FIELD(SX125X_CLK_SELECT, 0, 0),
+   /* XOSC */ /* TODO maybe make this dynamic */
+   [F_SX1257_XOSC_GM_STARTUP]  = REG_FIELD(SX1257_XOSC, 0, 3),
+   [F_SX1257_XOSC_DISABLE_CORE]  = REG_FIELD(SX1257_XOSC, 5, 5),
+};
 
 struct sx125x_priv {
struct regmap   *regmap;
+   struct regmap_field 
*regmap_fields[ARRAY_SIZE(sx125x_regmap_fields)];
 };
 
 static struct regmap_config __maybe_unused sx125x_regmap_config = {
@@ -44,11 +58,18 @@ static struct regmap_config __maybe_unused 
sx125x_regmap_config = {
.max_register = SX125X_MAX_REGISTER,
 };
 
+static int sx125x_field_write(struct sx125x_priv *priv,
+   enum sx125x_fields field_id, u8 val)
+{
+   return regmap_field_write(priv->regmap_fields[field_id], val);
+}
+
 static int __maybe_unused sx125x_regmap_probe(struct device *dev, struct 
regmap *regmap, unsigned int radio)
 {
struct sx125x_priv *priv;
unsigned int val;
int ret;
+   int i;
 
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
@@ -56,6 +77,18 @@ static int __maybe_unused sx125x_regmap_probe(struct device 
*dev, struct regmap
 
dev_set_drvdata(dev, priv);
priv->regmap = regmap;
+   for (i = 0; i < ARRAY_SIZE(sx125x_regmap_fields); i++) {
+   const struct reg_field *reg_fields = sx125x_regmap_fields;
+
+   priv->regmap_fields[i] = devm_regmap_field_alloc(dev,
+   priv->regmap,
+   reg_fields[i]);
+   if (IS_ERR(priv->regmap_fields[i])) {
+   ret = PTR_ERR(priv->regmap_fields[i]);
+   dev_err(dev, "Cannot allocate regmap field: %d\n", ret);
+   return ret;
+   }
+   }
 
if (false) {
ret = regmap_read(priv->regmap, SX1255_VERSION, );
@@ -66,24 +99,34 @@ static int __maybe_unused sx125x_regmap_probe(struct device 
*dev, struct regmap
dev_info(dev, "SX125x version: %02x\n", val);
}
 
-   val = REG_CLK_SELECT_TX_DAC_CLK_SELECT_CLK_IN;
if (radio == 1) { /* HACK */
-   val |= REG_CLK_SELECT_CLK_OUT;
+   ret = sx125x_field_write(priv, F_CLK_OUT, 1);
+   if (ret) {
+   dev_err(dev, "enabling clock output failed\n");
+   return ret;
+   }
+
dev_info(dev, "enabling clock output\n");
}
 
-   ret = regmap_write(priv->regmap, SX125X_CLK_SELECT, val);
+   ret = sx125x_field_write(priv, F_TX_DAC_CLK_SEL, 1);
if (ret) {
-   dev_err(dev, "clk write failed\n");
+   dev_err(dev, "clock select failed\n");
return ret;
}
 
dev_dbg(dev, "clk written\n");
 
if (true) {
-   ret = regmap_write(priv->regmap, SX1257_XOSC, 13 + 2 * 16);
+   ret = sx125x_field_write(priv, F_SX1257_XOSC_DISABLE_CORE, 1);
+   if (ret) {
+   dev_err(dev, "xosc disable failed\n");
+   return ret;
+   }
+
+   ret = sx125x_field_write(priv, F_SX1257_XOSC_GM_STARTUP, 13);
if (ret) {
-   dev_err(dev, "xosc write failed\n");
+   dev_err(dev, "xosc startup adjust failed\n");
return ret;
}
}
-- 
2.7.4



Re: [PATCH] testsuite: Handle large number of kernel options

2018-08-22 Thread Luca Boccassi
On Wed, 2018-08-22 at 10:31 +0200, Stefan Bader wrote:
> Once there are more than a certain number of kernel config options
> set (this happened for us with kernel 4.17), the method of passing
> those as command line arguments exceeds the maximum number of
> arguments the shell supports. This causes the whole testsuite to
> fail.
> Instead, create a temporary file and modify its contents so that
> the config option variables are exported. Then this file can be
> sourced in before running the tests.
> 
> Signed-off-by: Stefan Bader 
> ---
>  testsuite/Makefile | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/testsuite/Makefile b/testsuite/Makefile
> index 8fcbc55..f9f3b19 100644

Acked-by: Luca Boccassi 

Looks good to me, thanks.

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part


Re: [endianness bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

2018-08-22 Thread Ganesh Goudar
Hi Al,

All the issues you have mentioned make sense to me, I will fix them
and try to have them tested on big endian machine.
I got all the patches from net-endian.chelsio they all look good,
But I am yet to go through
(struct cxgb4_next_header .match_val/.match_mask/mask should be net-endian).

regarding "le64_to_cpu(*src)", I think we have not tested our vf driver on
a big endian machine. will address this as well.

Thanks
Ganesh


[PATCH] testsuite: Handle large number of kernel options

2018-08-22 Thread Stefan Bader
Once there are more than a certain number of kernel config options
set (this happened for us with kernel 4.17), the method of passing
those as command line arguments exceeds the maximum number of
arguments the shell supports. This causes the whole testsuite to
fail.
Instead, create a temporary file and modify its contents so that
the config option variables are exported. Then this file can be
sourced in before running the tests.

Signed-off-by: Stefan Bader 
---
 testsuite/Makefile | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/testsuite/Makefile b/testsuite/Makefile
index 8fcbc55..f9f3b19 100644
--- a/testsuite/Makefile
+++ b/testsuite/Makefile
@@ -14,15 +14,13 @@ TESTS_DIR := $(dir $(TESTS))
 
 IPVERS := $(filter-out iproute2/Makefile,$(wildcard iproute2/*))
 
+KENVFN := $(shell mktemp /tmp/tc_testkenv.XX)
 ifneq (,$(wildcard /proc/config.gz))
-   KENV := $(shell cat /proc/config.gz | gunzip | grep ^CONFIG)
+   KCPATH := /proc/config.gz
 else
 KVER := $(shell uname -r)
 KCPATHS := /lib/modules/$(KVER)/config /boot/config-$(KVER)
 KCPATH := $(firstword $(wildcard $(KCPATHS)))
-ifneq (,$(KCPATH))
-   KENV := $(shell cat ${KCPATH} | grep ^CONFIG)
-endif
 endif
 
 .PHONY: compile listtests alltests configure $(TESTS)
@@ -59,14 +57,22 @@ endif
mkdir -p $(RESULTS_DIR)/$$d; \
done

+   @if [ "$(KCPATH)" = "/proc/config.gz" ]; then \
+   gunzip -c $(KCPATH) >$(KENVFN); \
+   elif [ "$(KCPATH)" != "" ]; then \
+   cat $(KCPATH) >$(KENVFN); \
+   fi
+   @sed -i -e 's/^CONFIG_/export CONFIG_/' $(KENVFN)
+
@for i in $(IPVERS); do \
o=`echo $$i | sed -e 's/iproute2\///'`; \
echo -n "Running $@ [$$o/`uname -r`]: "; \
TMP_ERR=`mktemp /tmp/tc_testsuite.XX`; \
TMP_OUT=`mktemp /tmp/tc_testsuite.XX`; \
+   . $(KENVFN); \
STD_ERR="$$TMP_ERR" STD_OUT="$$TMP_OUT" \
TC="$$i/tc/tc" IP="$$i/ip/ip" SS=$$i/misc/ss DEV="$(DEV)" 
IPVER="$@" SNAME="$$i" \
-   ERRF="$(RESULTS_DIR)/$@.$$o.err" $(KENV) $(PREFIX) tests/$@ > 
$(RESULTS_DIR)/$@.$$o.out; \
+   ERRF="$(RESULTS_DIR)/$@.$$o.err" $(PREFIX) tests/$@ > 
$(RESULTS_DIR)/$@.$$o.out; \
if [ "$$?" = "127" ]; then \
echo "SKIPPED"; \
elif [ -e "$(RESULTS_DIR)/$@.$$o.err" ]; then \
-- 
2.7.4



Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

2018-08-22 Thread Daniel Borkmann
"On 08/22/2018 09:22 AM, Daniel Borkmann wrote:
> On 08/22/2018 02:19 AM, Petar Penkov wrote:
>> On Mon, Aug 20, 2018 at 1:52 PM, Alexei Starovoitov
>>  wrote:
>>> On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote:
 From: Petar Penkov 
> [...]
 3/ The BPF program cannot use direct packet access everywhere because it
 uses an offset, initially supplied by the flow dissector.  Because the
 initial value of this non-constant offset comes from outside of the
 program, the verifier does not know what its value is, and it cannot verify
 that it is within packet bounds. Therefore, direct packet access programs
 get rejected.
>>>
>>> this part doesn't seem to match the code.
>>> direct packet access is allowed and usable even for fragmented skbs.
>>> in such case only linear part of skb is in "direct access".
>>
>> I am not sure I understand. What I meant was that I use bpf_skb_load_bytes
>> rather than direct packet access because the offset at which I read headers,
>> nhoff, depends on an initial value that cannot be statically verified - 
>> namely
>> what __skb_flow_dissect provides. Is there an alternative approach I should
>> be taking here, and/or am I misunderstanding direct access?
> 
> You can still use direct packet access with it, the only thing you would
> need to make sure is that the initial offset is bounded (e.g. test if
> larger than some const and then drop the packet, or '& ') so that
> the verifier can make sure the alu op won't cause overflow, then you can
> add this to pkt_data, and later on open an access range with the usual test
> like pkt_data' +  > pkt_end.

And for non-linear data, you could use the bpf_skb_pull_data() helper as
we have in tc/BPF case 36bbef52c7eb ("bpf: direct packet write and access
for helpers for clsact progs") to pull it into linear area and make it
accessible for direct packet access.

> Thanks,
> Daniel


Re: virtio_net failover and initramfs

2018-08-22 Thread Siwei Liu
On Wed, Aug 22, 2018 at 12:23 AM, Harald Hoyer  wrote:
> On 22.08.2018 09:17, Siwei Liu wrote:
>> On Tue, Aug 21, 2018 at 6:44 AM, Harald Hoyer  wrote:
>>> On 17.08.2018 21:09, Samudrala, Sridhar wrote:
 On 8/17/2018 2:56 AM, Harald Hoyer wrote:
> On 17.08.2018 11:51, Harald Hoyer wrote:
>> On 16.08.2018 00:17, Siwei Liu wrote:
>>> On Wed, Aug 15, 2018 at 12:05 PM, Samudrala, Sridhar
>>>  wrote:
 On 8/14/2018 5:03 PM, Siwei Liu wrote:
> Are we sure all userspace apps skip and ignore slave interfaces by
> just looking at "IFLA_MASTER" attribute?
>
> When STANDBY is enabled on virtio-net, a failover master interface
> will appear, which automatically enslaves the virtio device. But it is
> found out that iSCSI (or any network boot) cannot boot strap over the
> new failover interface together with a standby virtio (without any VF
> or PT device in place).
>
> Dracut (initramfs) ends up with timeout and dropping into emergency 
> shell:
>
> [  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
> timeout - starting timeout scripts
> [  228.171788] dracut-initqueue[377]: Warning: Could not boot.
>Starting Dracut Emergency Shell...
> Generating "/run/initramfs/rdsosreport.txt"
> Entering emergency mode. Exit the shell to continue.
> Type "journalctl" to view system logs.
> You might want to save "/run/initramfs/rdsosreport.txt" to a USB 
> stick or
> /boot
> after mounting them and attach it to a bug report.
> dracut:/# ip l sh
> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
> mode DEFAULT group default qlen 1000
>   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> 2: eth0:  mtu 1500 qdisc noqueue
> state UP mode DEFAULT group default qlen 1000
>   link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
> 3: eth1:  mtu 1500 qdisc pfifo_fast
> master eth0 state UP mode DEFAULT group default qlen 1000
>   link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
> dracut:/#
>
> If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
> network boot starts to work.

 Does dracut by default tries to use all the interfaces that are UP?

>>> Yes. The specific dracut cmdline of our case is "ip=dhcp
>>> netroot=iscsi:... ", but it's not specific to iscsi boot. And because
>>> of same MAC address for failover and standby, while dracut tries to
>>> run DHCP on all interfaces that are up it eventually gets same route
>>> for each interface. Those conflict route entries kill off the network
>>> connection.
>>>
> The reason is that dracut has its own means to differentiate virtual
> interfaces for network boot: it does not look at IFLA_MASTER and
> ignores slave interfaces. Instead, users have to provide explicit
> option e.g. bond=eth0,eth1 in the boot line, then dracut would know
> the config and ignore the slave interfaces.

 Isn't it possible to specify the interface that should be used for 
 network
 boot?
>>> As I understand it, one can only specify interface name for running
>>> DHCP but not select interface for network boot.  We want DHCP to run
>>> on every NIC that is up (excluding the enslaved interfaces), and only
>>> one of them can get a route entry to the network boot server (ie.g.
>>> iSCSI target).
>>>

> However, with automatic creation of failover interface that assumption
> is no longer true. Can we change dracut to ignore all slave interface
> by checking  IFLA_MASTER? I don't think so. It has a large impact to
> existing configs.

 What is the issue with checking for IFLA_MASTER? I guess this is used 
 with
 team/bonding setups.
>>> That should be discussed within and determined by the dracut
>>> community. But the current dracut code doesn't check IFLA_MASTER for
>>> team or bonding specifically. I guess this change might have broader
>>> impact to existing userspace that might be already relying on the
>>> current behaviour.
>>>
>>> Thanks,
>>> -Siwei
>> Is there a sysfs flag for IFF_SLAVE? Or any "ip" output I can use to 
>> detect, that it is a IFF_SLAVE?
>>
> Oh, it's the other way around.. dracut should ignore "master" (eth1).
 In the above example eth0 is the net_failover device and eth1 is the lower 
 virtio_net device.
 "ip" output of eth1 shows "master eth0". It indicates that eth0 is its 
 upper/master device.
 This information can also be obtained via sysfs too. 
 /sys/class/net/eth1/upper_eth0
>
> Can the master enslave the "eth0", if it is 

Re: virtio_net failover and initramfs

2018-08-22 Thread Harald Hoyer
On 22.08.2018 09:17, Siwei Liu wrote:
> On Tue, Aug 21, 2018 at 6:44 AM, Harald Hoyer  wrote:
>> On 17.08.2018 21:09, Samudrala, Sridhar wrote:
>>> On 8/17/2018 2:56 AM, Harald Hoyer wrote:
 On 17.08.2018 11:51, Harald Hoyer wrote:
> On 16.08.2018 00:17, Siwei Liu wrote:
>> On Wed, Aug 15, 2018 at 12:05 PM, Samudrala, Sridhar
>>  wrote:
>>> On 8/14/2018 5:03 PM, Siwei Liu wrote:
 Are we sure all userspace apps skip and ignore slave interfaces by
 just looking at "IFLA_MASTER" attribute?

 When STANDBY is enabled on virtio-net, a failover master interface
 will appear, which automatically enslaves the virtio device. But it is
 found out that iSCSI (or any network boot) cannot boot strap over the
 new failover interface together with a standby virtio (without any VF
 or PT device in place).

 Dracut (initramfs) ends up with timeout and dropping into emergency 
 shell:

 [  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
 timeout - starting timeout scripts
 [  228.171788] dracut-initqueue[377]: Warning: Could not boot.
Starting Dracut Emergency Shell...
 Generating "/run/initramfs/rdsosreport.txt"
 Entering emergency mode. Exit the shell to continue.
 Type "journalctl" to view system logs.
 You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick 
 or
 /boot
 after mounting them and attach it to a bug report.
 dracut:/# ip l sh
 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
 mode DEFAULT group default qlen 1000
   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 2: eth0:  mtu 1500 qdisc noqueue
 state UP mode DEFAULT group default qlen 1000
   link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
 3: eth1:  mtu 1500 qdisc pfifo_fast
 master eth0 state UP mode DEFAULT group default qlen 1000
   link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
 dracut:/#

 If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
 network boot starts to work.
>>>
>>> Does dracut by default tries to use all the interfaces that are UP?
>>>
>> Yes. The specific dracut cmdline of our case is "ip=dhcp
>> netroot=iscsi:... ", but it's not specific to iscsi boot. And because
>> of same MAC address for failover and standby, while dracut tries to
>> run DHCP on all interfaces that are up it eventually gets same route
>> for each interface. Those conflict route entries kill off the network
>> connection.
>>
 The reason is that dracut has its own means to differentiate virtual
 interfaces for network boot: it does not look at IFLA_MASTER and
 ignores slave interfaces. Instead, users have to provide explicit
 option e.g. bond=eth0,eth1 in the boot line, then dracut would know
 the config and ignore the slave interfaces.
>>>
>>> Isn't it possible to specify the interface that should be used for 
>>> network
>>> boot?
>> As I understand it, one can only specify interface name for running
>> DHCP but not select interface for network boot.  We want DHCP to run
>> on every NIC that is up (excluding the enslaved interfaces), and only
>> one of them can get a route entry to the network boot server (ie.g.
>> iSCSI target).
>>
>>>
 However, with automatic creation of failover interface that assumption
 is no longer true. Can we change dracut to ignore all slave interface
 by checking  IFLA_MASTER? I don't think so. It has a large impact to
 existing configs.
>>>
>>> What is the issue with checking for IFLA_MASTER? I guess this is used 
>>> with
>>> team/bonding setups.
>> That should be discussed within and determined by the dracut
>> community. But the current dracut code doesn't check IFLA_MASTER for
>> team or bonding specifically. I guess this change might have broader
>> impact to existing userspace that might be already relying on the
>> current behaviour.
>>
>> Thanks,
>> -Siwei
> Is there a sysfs flag for IFF_SLAVE? Or any "ip" output I can use to 
> detect, that it is a IFF_SLAVE?
>
 Oh, it's the other way around.. dracut should ignore "master" (eth1).
>>> In the above example eth0 is the net_failover device and eth1 is the lower 
>>> virtio_net device.
>>> "ip" output of eth1 shows "master eth0". It indicates that eth0 is its 
>>> upper/master device.
>>> This information can also be obtained via sysfs too. 
>>> /sys/class/net/eth1/upper_eth0

 Can the master enslave the "eth0", if it is already "UP" and busy later on?
>>> eth0 is the master/failover device and eth1 gets registered as its slave 
>>> via NETDEV_REGISTER event.
>>> dracut 

Re: [bpf-next RFC 0/3] Introduce eBPF flow dissector

2018-08-22 Thread Daniel Borkmann
On 08/22/2018 02:19 AM, Petar Penkov wrote:
> On Mon, Aug 20, 2018 at 1:52 PM, Alexei Starovoitov
>  wrote:
>> On Thu, Aug 16, 2018 at 09:44:20AM -0700, Petar Penkov wrote:
>>> From: Petar Penkov 
[...]
>>> 3/ The BPF program cannot use direct packet access everywhere because it
>>> uses an offset, initially supplied by the flow dissector.  Because the
>>> initial value of this non-constant offset comes from outside of the
>>> program, the verifier does not know what its value is, and it cannot verify
>>> that it is within packet bounds. Therefore, direct packet access programs
>>> get rejected.
>>
>> this part doesn't seem to match the code.
>> direct packet access is allowed and usable even for fragmented skbs.
>> in such case only linear part of skb is in "direct access".
> 
> I am not sure I understand. What I meant was that I use bpf_skb_load_bytes
> rather than direct packet access because the offset at which I read headers,
> nhoff, depends on an initial value that cannot be statically verified - namely
> what __skb_flow_dissect provides. Is there an alternative approach I should
> be taking here, and/or am I misunderstanding direct access?

You can still use direct packet access with it, the only thing you would
need to make sure is that the initial offset is bounded (e.g. test if
larger than some const and then drop the packet, or '& ') so that
the verifier can make sure the alu op won't cause overflow, then you can
add this to pkt_data, and later on open an access range with the usual test
like pkt_data' +  > pkt_end.

Thanks,
Daniel


Re: virtio_net failover and initramfs

2018-08-22 Thread Siwei Liu
On Tue, Aug 21, 2018 at 6:44 AM, Harald Hoyer  wrote:
> On 17.08.2018 21:09, Samudrala, Sridhar wrote:
>> On 8/17/2018 2:56 AM, Harald Hoyer wrote:
>>> On 17.08.2018 11:51, Harald Hoyer wrote:
 On 16.08.2018 00:17, Siwei Liu wrote:
> On Wed, Aug 15, 2018 at 12:05 PM, Samudrala, Sridhar
>  wrote:
>> On 8/14/2018 5:03 PM, Siwei Liu wrote:
>>> Are we sure all userspace apps skip and ignore slave interfaces by
>>> just looking at "IFLA_MASTER" attribute?
>>>
>>> When STANDBY is enabled on virtio-net, a failover master interface
>>> will appear, which automatically enslaves the virtio device. But it is
>>> found out that iSCSI (or any network boot) cannot boot strap over the
>>> new failover interface together with a standby virtio (without any VF
>>> or PT device in place).
>>>
>>> Dracut (initramfs) ends up with timeout and dropping into emergency 
>>> shell:
>>>
>>> [  228.170425] dracut-initqueue[377]: Warning: dracut-initqueue
>>> timeout - starting timeout scripts
>>> [  228.171788] dracut-initqueue[377]: Warning: Could not boot.
>>>Starting Dracut Emergency Shell...
>>> Generating "/run/initramfs/rdsosreport.txt"
>>> Entering emergency mode. Exit the shell to continue.
>>> Type "journalctl" to view system logs.
>>> You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick 
>>> or
>>> /boot
>>> after mounting them and attach it to a bug report.
>>> dracut:/# ip l sh
>>> 1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
>>> mode DEFAULT group default qlen 1000
>>>   link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>> 2: eth0:  mtu 1500 qdisc noqueue
>>> state UP mode DEFAULT group default qlen 1000
>>>   link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff\
>>> 3: eth1:  mtu 1500 qdisc pfifo_fast
>>> master eth0 state UP mode DEFAULT group default qlen 1000
>>>   link/ether 9a:46:22:ae:33:54 brd ff:ff:ff:ff:ff:ff
>>> dracut:/#
>>>
>>> If changing dracut code to ignore eth1 (with IFLA_MASTER attr),
>>> network boot starts to work.
>>
>> Does dracut by default tries to use all the interfaces that are UP?
>>
> Yes. The specific dracut cmdline of our case is "ip=dhcp
> netroot=iscsi:... ", but it's not specific to iscsi boot. And because
> of same MAC address for failover and standby, while dracut tries to
> run DHCP on all interfaces that are up it eventually gets same route
> for each interface. Those conflict route entries kill off the network
> connection.
>
>>> The reason is that dracut has its own means to differentiate virtual
>>> interfaces for network boot: it does not look at IFLA_MASTER and
>>> ignores slave interfaces. Instead, users have to provide explicit
>>> option e.g. bond=eth0,eth1 in the boot line, then dracut would know
>>> the config and ignore the slave interfaces.
>>
>> Isn't it possible to specify the interface that should be used for 
>> network
>> boot?
> As I understand it, one can only specify interface name for running
> DHCP but not select interface for network boot.  We want DHCP to run
> on every NIC that is up (excluding the enslaved interfaces), and only
> one of them can get a route entry to the network boot server (ie.g.
> iSCSI target).
>
>>
>>> However, with automatic creation of failover interface that assumption
>>> is no longer true. Can we change dracut to ignore all slave interface
>>> by checking  IFLA_MASTER? I don't think so. It has a large impact to
>>> existing configs.
>>
>> What is the issue with checking for IFLA_MASTER? I guess this is used 
>> with
>> team/bonding setups.
> That should be discussed within and determined by the dracut
> community. But the current dracut code doesn't check IFLA_MASTER for
> team or bonding specifically. I guess this change might have broader
> impact to existing userspace that might be already relying on the
> current behaviour.
>
> Thanks,
> -Siwei
 Is there a sysfs flag for IFF_SLAVE? Or any "ip" output I can use to 
 detect, that it is a IFF_SLAVE?

>>> Oh, it's the other way around.. dracut should ignore "master" (eth1).
>> In the above example eth0 is the net_failover device and eth1 is the lower 
>> virtio_net device.
>> "ip" output of eth1 shows "master eth0". It indicates that eth0 is its 
>> upper/master device.
>> This information can also be obtained via sysfs too. 
>> /sys/class/net/eth1/upper_eth0
>>>
>>> Can the master enslave the "eth0", if it is already "UP" and busy later on?
>> eth0 is the master/failover device and eth1 gets registered as its slave via 
>> NETDEV_REGISTER event.
>> dracut should ignore eth1 in this setup.
>
>
> Care to test, if that fixes your case?
> https://github.com/dracutdevs/dracut/pull/450/files

Sorry, I 

Re: Experimental fix for MSI-X issue on r8169

2018-08-22 Thread Steve Dodd
On 22 August 2018 at 05:24, David Miller  wrote:
> From: Jian-Hong Pan 

>> [   56.462464] r8169 :02:00.0: MSI-X entry: context resume:
>>    
>  ...
>> uh!  The MSI-X entry seems missed after resume on this laptop!
>
> Yeah, having all of the MSI-X entry values be all-1's is not a good
> sign.

I'm slightly confused as to why my machine doesn't even get to
printing the debugging message on resume..?

S.


Re: [PATCH] datapath.c: fix missing return value check of nla_nest_start()

2018-08-22 Thread Pravin Shelar
On Tue, Aug 21, 2018 at 4:38 PM David Miller  wrote:
>
> From: Pravin Shelar 
> Date: Tue, 21 Aug 2018 15:38:28 -0700
>
> > On Fri, Aug 17, 2018 at 1:15 AM Jiecheng Wu  wrote:
> >>
> >> Function queue_userspace_packet() defined in net/openvswitch/datapath.c 
> >> calls nla_nest_start() to allocate memory for struct nlattr which is 
> >> dereferenced immediately. As nla_nest_start() may return NULL on failure, 
> >> this code piece may cause NULL pointer dereference bug.
> >> ---
> >>  net/openvswitch/datapath.c | 4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> >> index 0f5ce77..ff4457d 100644
> >> --- a/net/openvswitch/datapath.c
> >> +++ b/net/openvswitch/datapath.c
> >> @@ -460,6 +460,8 @@ static int queue_userspace_packet(struct datapath *dp, 
> >> struct sk_buff *skb,
> >>
> >> if (upcall_info->egress_tun_info) {
> >> nla = nla_nest_start(user_skb, 
> >> OVS_PACKET_ATTR_EGRESS_TUN_KEY);
> >> +   if (!nla)
> >> +   return -EMSGSIZE;
> > It is not possible, since user_skb is allocated to accommodate all
> > netlink attributes.
>
> Pravin, common practice is to always check nla_*() return values even if the
> SKB is allocated with "enough space".
>
> Those calculations can have bugs, and these checks are therefore helpful to
> avoid crashes and memory corruption in such cases.
>
OK, in that case this patch needs to proper error handling.


Re: Experimental fix for MSI-X issue on r8169

2018-08-22 Thread Heiner Kallweit
On 22.08.2018 06:24, David Miller wrote:
> From: Jian-Hong Pan 
> Date: Wed, 22 Aug 2018 11:01:02 +0800
> 
>  ...
>> [   56.462464] r8169 :02:00.0: MSI-X entry: context resume:
>>    
>  ...
>> uh!  The MSI-X entry seems missed after resume on this laptop!
> 
> Yeah, having all of the MSI-X entry values be all-1's is not a good
> sign.
> 
all-1's seems to indicate that PCI access to the MSI-X table
BAR/region fails. Because falling back to MSI helps, accessing
the other BAR/region with the memory-mapped registers works.
I'll check with Realtek whether this symptom rings any bell.


> But this is quite a curious set of debugging traces we now have.
> 
> In the working case, the vector number in the DATA field seems
> to change, which suggests that something is assigning new values
> and programming them into these fields at resume time.
> 
> But in the failing cases, all of the values are garbage.
> 
> I would expect, given what the working trace looks like, that in the
> failing case some values would be wrong and the DATA value would have
> some new yet valid value.  But that is not what we are seeing here.
> 
> Weird.
> 
>