Re: [RFC PATCH v2 net-next 3/7] tcp: Merge tx_flags/tskey/txstamp_ack in tcp_shifted_skb

2016-04-18 Thread Soheil Hassas Yeganeh
On Mon, Apr 18, 2016 at 6:46 PM, Martin KaFai Lau  wrote:
> After receiving sacks, tcp_shifted_skb() will collapse
> skbs if possible.  tx_flags/tskey/txstamp_ack also has
> to be merged in this case.
>
> This patch resues the tcp_skb_collapse_tstamp() to handle
> them.
>
> BPF Output Before:
> ~
> 
>
> BPF Output After:
> ~
> <...>-2024  [007] d.s.88.644374: : ee_data:14599
>
> Packetdrill Script:
> ~
> +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
> +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 
> 0.100 > S. 0:0(0) ack 1 
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>
> 0.200 write(4, ..., 1460) = 1460
> +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
> 0.200 write(4, ..., 13140) = 13140
> +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
>
> 0.200 > P. 1:1461(1460) ack 1
> 0.200 > . 1461:8761(7300) ack 1
> 0.200 > P. 8761:14601(5840) ack 1
>
> 0.300 < . 1:1(0) ack 1 win 257 
> 0.300 > P. 1:1461(1460) ack 1
> 0.400 < . 1:1(0) ack 14601 win 257
>
> 0.400 close(4) = 0
> 0.400 > F. 14601:14601(0) ack 1
> 0.500 < F. 1:1(0) ack 14602 win 257
> 0.500 > . 14602:14602(0) ack 2
>
> Signed-off-by: Martin KaFai Lau 
> Cc: Eric Dumazet 
> Cc: Neal Cardwell 
> Cc: Soheil Hassas Yeganeh 

Acked-by: Soheil Hassas Yeganeh 

> Cc: Willem de Bruijn 
> Cc: Yuchung Cheng 
> ---
>  include/net/tcp.h | 2 ++
>  net/ipv4/tcp_input.c  | 1 +
>  net/ipv4/tcp_output.c | 4 ++--
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index fd40f8c..c0ef054 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -557,6 +557,8 @@ void tcp_send_ack(struct sock *sk);
>  void tcp_send_delayed_ack(struct sock *sk);
>  void tcp_send_loss_probe(struct sock *sk);
>  bool tcp_schedule_loss_probe(struct sock *sk);
> +void tcp_skb_collapse_tstamp(struct sk_buff *skb,
> +const struct sk_buff *next_skb);
>
>  /* tcp_input.c */
>  void tcp_resume_early_retransmit(struct sock *sk);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 5e45a9c..75e8336 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -1309,6 +1309,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct 
> sk_buff *skb,
> if (skb == tcp_highest_sack(sk))
> tcp_advance_highest_sack(sk, skb);
>
> +   tcp_skb_collapse_tstamp(prev, skb);
> tcp_unlink_write_queue(skb, sk);
> sk_wmem_free_skb(sk, skb);
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 889ed96..d21a78f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2443,8 +2443,8 @@ u32 __tcp_select_window(struct sock *sk)
> return window;
>  }
>
> -static void tcp_skb_collapse_tstamp(struct sk_buff *skb,
> -   const struct sk_buff *next_skb)
> +void tcp_skb_collapse_tstamp(struct sk_buff *skb,
> +const struct sk_buff *next_skb)
>  {
> const struct skb_shared_info *next_shinfo = skb_shinfo(next_skb);
>

nice, thanks for the fix!

> --
> 2.5.1
>


Re: [RFC PATCH v2 net-next 2/7] tcp: Merge tx_flags/tskey/txstamp_ack in tcp_collapse_retrans

2016-04-18 Thread Soheil Hassas Yeganeh
On Mon, Apr 18, 2016 at 6:46 PM, Martin KaFai Lau  wrote:
> If two skbs are merged/collapsed during retransmission, the current
> logic does not merge the tx_flags, tskey and txstamp_ack.  The end
> result is the SCM_TSTAMP_ACK timestamp could be missing for a
> packet that the end-user has specifically turned on
> SOF_TIMESTAMPING_TX_ACK (e.g. by cmsg).
>
> The patch:
> 1. Merge the tx_flags and txstamp_ack
> 2. Overwrite the tskey with the later skb (next_skb)
>
> BPF Output Before:
> ~~
> 
>
> BPF Output After:
> ~~
> packetdrill-2092  [001] d.s.   453.998486: : ee_data:1459
>
> Packetdrill Script:
> ~~
> +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
> +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 
> 0.100 > S. 0:0(0) ack 1 
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>
> 0.200 write(4, ..., 730) = 730
> +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
> 0.200 write(4, ..., 730) = 730
> +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
> 0.200 write(4, ..., 11680) = 11680
>
> 0.200 > P. 1:731(730) ack 1
> 0.200 > P. 731:1461(730) ack 1
> 0.200 > . 1461:8761(7300) ack 1
> 0.200 > P. 8761:13141(4380) ack 1
>
> 0.300 < . 1:1(0) ack 1 win 257 
> 0.300 < . 1:1(0) ack 1 win 257 
> 0.300 < . 1:1(0) ack 1 win 257 
> 0.300 > P. 1:1461(1460) ack 1
> 0.400 < . 1:1(0) ack 13141 win 257
>
> 0.400 close(4) = 0
> 0.400 > F. 13141:13141(0) ack 1
> 0.500 < F. 1:1(0) ack 13142 win 257
> 0.500 > . 13142:13142(0) ack 2
>
> Signed-off-by: Martin KaFai Lau 
> Cc: Eric Dumazet 
> Cc: Neal Cardwell 
> Cc: Soheil Hassas Yeganeh 

Cc:  Soheil Hassas Yeganeh 

> Cc: Willem de Bruijn 
> Cc: Yuchung Cheng 
> ---
>  net/ipv4/tcp_output.c | 18 ++
>  1 file changed, 18 insertions(+)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 0527ce9..889ed96 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2443,6 +2443,22 @@ u32 __tcp_select_window(struct sock *sk)
> return window;
>  }
>
> +static void tcp_skb_collapse_tstamp(struct sk_buff *skb,
> +   const struct sk_buff *next_skb)
> +{
> +   const struct skb_shared_info *next_shinfo = skb_shinfo(next_skb);
> +
> +   if (unlikely(next_shinfo->tx_flags & SKBTX_ANY_TSTAMP)) {
> +   struct skb_shared_info *shinfo = skb_shinfo(skb);
> +   u8 tsflags = next_shinfo->tx_flags & SKBTX_ANY_TSTAMP;

nit: maybe move this local variable out of the if block?

  tsflags = ...
  if (unlikely(tsflags)) { ... }

> +
> +   shinfo->tx_flags |= tsflags;
> +   shinfo->tskey = next_shinfo->tskey;
> +   TCP_SKB_CB(skb)->txstamp_ack =
> +   !!(shinfo->tx_flags & SKBTX_ACK_TSTAMP);

Maybe we can skip a conditional jump here (because of !!), by simply
using the cached bit in next_skb:
TCP_SKB_CB(skb)->txstamp_ack = TCP_SKB_CB(next_skb)->txstamp_ack;

> +   }
> +}
> +
>  /* Collapses two adjacent SKB's during retransmission. */
>  static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
>  {
> @@ -2486,6 +2502,8 @@ static void tcp_collapse_retrans(struct sock *sk, 
> struct sk_buff *skb)
>
> tcp_adjust_pcount(sk, next_skb, tcp_skb_pcount(next_skb));
>
> +   tcp_skb_collapse_tstamp(skb, next_skb);
> +
> sk_wmem_free_skb(sk, next_skb);
>  }

Really nice fixes! thanks.

> --
> 2.5.1
>


Re: [RFC PATCH v2 net-next 1/7] tcp: Carry txstamp_ack in tcp_fragment_tstamp

2016-04-18 Thread Soheil Hassas Yeganeh
On Mon, Apr 18, 2016 at 6:46 PM, Martin KaFai Lau  wrote:
> When a tcp skb is sliced into two smaller skbs (e.g. in
> tcp_fragment() and tso_fragment()),  it does not carry
> the txstamp_ack bit to the newly created skb if it is needed.
> The end result is a timestamping event (SCM_TSTAMP_ACK) will
> be missing from the sk->sk_error_queue.
>
> This patch carries this bit to the new skb2 (if needed)
> in tcp_fragment_tstamp().
>
> BPF Output Before:
> ~~
> 
>
> BPF Output After:
> ~~
> <...>-2050  [000] d.s.   100.928763: : ee_data:14599
>
> Packetdrill Script:
> ~~
> +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
> +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
> +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0 bind(3, ..., ...) = 0
> +0 listen(3, 1) = 0
>
> 0.100 < S 0:0(0) win 32792 
> 0.100 > S. 0:0(0) ack 1 
> 0.200 < . 1:1(0) ack 1 win 257
> 0.200 accept(3, ..., ...) = 4
> +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
>
> +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
> 0.200 write(4, ..., 14600) = 14600
> +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
>
> 0.200 > . 1:7301(7300) ack 1
> 0.200 > P. 7301:14601(7300) ack 1
>
> 0.300 < . 1:1(0) ack 14601 win 257
>
> 0.300 close(4) = 0
> 0.300 > F. 14601:14601(0) ack 1
> 0.400 < F. 1:1(0) ack 16062 win 257
> 0.400 > . 14602:14602(0) ack 2
>
> Signed-off-by: Martin KaFai Lau 
> Cc: Eric Dumazet 
> Cc: Neal Cardwell 
> Cc: Soheil Hassas Yeganeh 
Cc:  Soheil Hassas Yeganeh 
Acked-by: Soheil Hassas Yeganeh 

> Cc: Willem de Bruijn 
> Cc: Yuchung Cheng 
> ---
>  net/ipv4/tcp_output.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 6451b83..0527ce9 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1123,6 +1123,8 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, 
> struct sk_buff *skb2)
> shinfo->tx_flags &= ~tsflags;
> shinfo2->tx_flags |= tsflags;
> swap(shinfo->tskey, shinfo2->tskey);
> +   TCP_SKB_CB(skb2)->txstamp_ack = TCP_SKB_CB(skb)->txstamp_ack;
> +   TCP_SKB_CB(skb)->txstamp_ack = 0;
> }
>  }

Thanks for the fixes! I was going to submit similar patches. :-)

Could you please submit the timestamping patches separately as non RFCs? Thanks!

> --
> 2.5.1
>


[PATCH v3 00/18] wcn36xx fixes

2016-04-18 Thread Bjorn Andersson
The bulk of the following patches have been sitting in Eugene's Github tree for
quite some time. They fix various issues existing in the mainline drivers, so
they should be merged there too.

Also included are two new fixes, of my own; the important one being the
reordering of deletion of the bss, as this crashes the firmware on the
Dragonbaord 410c (apq8016 with pronto & wcn3620).

Lastly is a patch that adds a bunch of new capabilities found in the downstream
driver.

Changes since v2:
- Restore BEACON_TEMPLATE_SIZE to not break UPDATE_PROBE_RSP_TEMPLATE_REQ
- Added patch to correct WCN36XX_HAL_RMV_BSSKEY_RSP decoder
- Added patch with missing capabilities from downstream

Changes since v1:
- Reorder patch 6 and 7 to not break the build temporarily
- Inline fix from Jason Mobarak in the TIM PVM padding

Bjorn Andersson (3):
  wcn36xx: Delete BSS before idling link
  wcn36xx: Correct remove bss key response encoding
  wcn36xx: Fill in capability list

Pontus Fuchs (15):
  wcn36xx: Clean up wcn36xx_smd_send_beacon
  wcn36xx: Pad TIM PVM if needed
  wcn36xx: Add helper macros to cast vif to private vif and vice versa
  wcn36xx: Use consistent name for private vif
  wcn36xx: Use define for invalid index and fix typo
  wcn36xx: Add helper macros to cast sta to priv
  wcn36xx: Fetch private sta data from sta entry instead of from vif
  wcn36xx: Remove sta pointer in private vif struct
  wcn36xx: Parse trigger_ba response properly
  wcn36xx: Copy all members in config_sta v1 conversion
  wcn36xx: Use allocated self sta index instead of hard coded
  wcn36xx: Clear encrypt_type when deleting bss key
  wcn36xx: Track association state
  wcn36xx: Implement multicast filtering
  wcn36xx: Use correct command struct for EXIT_BMPS_REQ

 drivers/net/wireless/ath/wcn36xx/debug.c   |  12 +-
 drivers/net/wireless/ath/wcn36xx/hal.h |  55 ++-
 drivers/net/wireless/ath/wcn36xx/main.c| 133 +
 drivers/net/wireless/ath/wcn36xx/pmc.c |   4 +-
 drivers/net/wireless/ath/wcn36xx/smd.c | 224 +++--
 drivers/net/wireless/ath/wcn36xx/smd.h |  12 +-
 drivers/net/wireless/ath/wcn36xx/txrx.c|   8 +-
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  20 ++-
 8 files changed, 336 insertions(+), 132 deletions(-)

-- 
2.5.0



[PATCH v3 01/18] wcn36xx: Clean up wcn36xx_smd_send_beacon

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Needed for coming improvements. No functional changes.

Signed-off-by: Pontus Fuchs 
[bjorn: restored BEACON_TEMPLATE_SIZE define to 0x180]
Signed-off-by: Bjorn Andersson 
---

Changes since v2:
- Subtract sizeof(beacon_length) instead of modifying BEACON_TEMPLATE_SIZE,
  which is used in other places as well.

 drivers/net/wireless/ath/wcn36xx/hal.h |  5 -
 drivers/net/wireless/ath/wcn36xx/smd.c | 12 +---
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index b947de0fb2e5..d713204f755d 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -2884,11 +2884,14 @@ struct update_beacon_rsp_msg {
 struct wcn36xx_hal_send_beacon_req_msg {
struct wcn36xx_hal_msg_header header;
 
+   /* length of the template + 6. Only qcom knows why */
+   u32 beacon_length6;
+
/* length of the template. */
u32 beacon_length;
 
/* Beacon data. */
-   u8 beacon[BEACON_TEMPLATE_SIZE];
+   u8 beacon[BEACON_TEMPLATE_SIZE - sizeof(u32)];
 
u8 bssid[ETH_ALEN];
 
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 74f56a81ad9a..ff3ed2461a69 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -1380,19 +1380,17 @@ int wcn36xx_smd_send_beacon(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
mutex_lock(>hal_mutex);
INIT_HAL_MSG(msg_body, WCN36XX_HAL_SEND_BEACON_REQ);
 
-   /* TODO need to find out why this is needed? */
-   msg_body.beacon_length = skb_beacon->len + 6;
+   msg_body.beacon_length = skb_beacon->len;
+   /* TODO need to find out why + 6 is needed */
+   msg_body.beacon_length6 = msg_body.beacon_length + 6;
 
-   if (BEACON_TEMPLATE_SIZE > msg_body.beacon_length) {
-   memcpy(_body.beacon, _beacon->len, sizeof(u32));
-   memcpy(&(msg_body.beacon[4]), skb_beacon->data,
-  skb_beacon->len);
-   } else {
+   if (msg_body.beacon_length > BEACON_TEMPLATE_SIZE) {
wcn36xx_err("Beacon is to big: beacon size=%d\n",
  msg_body.beacon_length);
ret = -ENOMEM;
goto out;
}
+   memcpy(msg_body.beacon, skb_beacon->data, skb_beacon->len);
memcpy(msg_body.bssid, vif->addr, ETH_ALEN);
 
/* TODO need to find out why this is needed? */
-- 
2.5.0



[PATCH v3 02/18] wcn36xx: Pad TIM PVM if needed

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

The wcn36xx FW expects a fixed size TIM PVM in the beacon template. If
supplied with a shorter than expected PVM it will overwrite the IE
following the TIM.

Squashed with fix from Jason Mobarak :
Patch "wcn36xx: Pad TIM PVM if needed" has caused a regression in mesh
beaconing.  The field tim_off is always 0 for mesh mode, and thus
pvm_len (referring to the TIM length field) and pad are both incorrectly
calculated.  Thus, msg_body.beacon_length is incorrectly calculated for
mesh mode. Fix this.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Jason Mobarak 
[bjorn: squashed in Jason's fixup]
Signed-off-by: Bjorn Andersson 
---

Changes since v1:
- Merged in Jason's patch for skipping the padding in mesh mode.

 drivers/net/wireless/ath/wcn36xx/hal.h |  3 +++
 drivers/net/wireless/ath/wcn36xx/smd.c | 27 +--
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index d713204f755d..3af16cba3d12 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -54,6 +54,9 @@
 /* Default Beacon template size */
 #define BEACON_TEMPLATE_SIZE 0x180
 
+/* Minimum PVM size that the FW expects. See comment in smd.c for details. */
+#define TIM_MIN_PVM_SIZE 6
+
 /* Param Change Bitmap sent to HAL */
 #define PARAM_BCN_INTERVAL_CHANGED  (1 << 0)
 #define PARAM_SHORT_PREAMBLE_CHANGED (1 << 1)
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index ff3ed2461a69..089a7e445cd6 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -1375,12 +1375,19 @@ int wcn36xx_smd_send_beacon(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
u16 p2p_off)
 {
struct wcn36xx_hal_send_beacon_req_msg msg_body;
-   int ret = 0;
+   int ret = 0, pad, pvm_len;
 
mutex_lock(>hal_mutex);
INIT_HAL_MSG(msg_body, WCN36XX_HAL_SEND_BEACON_REQ);
 
-   msg_body.beacon_length = skb_beacon->len;
+   pvm_len = skb_beacon->data[tim_off + 1] - 3;
+   pad = TIM_MIN_PVM_SIZE - pvm_len;
+
+   /* Padding is irrelevant to mesh mode since tim_off is always 0. */
+   if (vif->type == NL80211_IFTYPE_MESH_POINT)
+   pad = 0;
+
+   msg_body.beacon_length = skb_beacon->len + pad;
/* TODO need to find out why + 6 is needed */
msg_body.beacon_length6 = msg_body.beacon_length + 6;
 
@@ -1393,6 +1400,22 @@ int wcn36xx_smd_send_beacon(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
memcpy(msg_body.beacon, skb_beacon->data, skb_beacon->len);
memcpy(msg_body.bssid, vif->addr, ETH_ALEN);
 
+   if (pad > 0) {
+   /*
+* The wcn36xx FW has a fixed size for the PVM in the TIM. If
+* given the beacon template from mac80211 with a PVM shorter
+* than the FW expectes it will overwrite the data after the
+* TIM.
+*/
+   wcn36xx_dbg(WCN36XX_DBG_HAL, "Pad TIM PVM. %d bytes at %d\n",
+   pad, pvm_len);
+   memmove(_body.beacon[tim_off + 5 + pvm_len + pad],
+   _body.beacon[tim_off + 5 + pvm_len],
+   skb_beacon->len - (tim_off + 5 + pvm_len));
+   memset(_body.beacon[tim_off + 5 + pvm_len], 0, pad);
+   msg_body.beacon[tim_off + 1] += pad;
+   }
+
/* TODO need to find out why this is needed? */
if (vif->type == NL80211_IFTYPE_MESH_POINT)
/* mesh beacon don't need this, so push further down */
-- 
2.5.0



[PATCH v3 05/18] wcn36xx: Use define for invalid index and fix typo

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/hal.h  | 2 +-
 drivers/net/wireless/ath/wcn36xx/main.c | 4 ++--
 drivers/net/wireless/ath/wcn36xx/smd.c  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index 3af16cba3d12..433d9801a0ae 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -48,7 +48,7 @@
 
 #define WCN36XX_HAL_IPV4_ADDR_LEN   4
 
-#define WALN_HAL_STA_INVALID_IDX 0xFF
+#define WCN36XX_HAL_STA_INVALID_IDX 0xFF
 #define WCN36XX_HAL_BSS_INVALID_IDX 0xFF
 
 /* Default Beacon template size */
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 62cb9ffd854c..4781b5e8deb3 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -618,7 +618,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
 
if (!is_zero_ether_addr(bss_conf->bssid)) {
vif_priv->is_joining = true;
-   vif_priv->bss_index = 0xff;
+   vif_priv->bss_index = WCN36XX_HAL_BSS_INVALID_IDX;
wcn36xx_smd_join(wcn, bss_conf->bssid,
 vif->addr, WCN36XX_HW_CHANNEL(wcn));
wcn36xx_smd_config_bss(wcn, vif, NULL,
@@ -711,7 +711,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
 
if (bss_conf->enable_beacon) {
vif_priv->dtim_period = bss_conf->dtim_period;
-   vif_priv->bss_index = 0xff;
+   vif_priv->bss_index = WCN36XX_HAL_BSS_INVALID_IDX;
wcn36xx_smd_config_bss(wcn, vif, NULL,
   vif->addr, false);
skb = ieee80211_beacon_get_tim(hw, vif, _off,
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 170440ed5d85..6d4aa9250ca8 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -197,7 +197,7 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx *wcn,
vif->type == NL80211_IFTYPE_AP ||
vif->type == NL80211_IFTYPE_MESH_POINT) {
sta_params->type = 1;
-   sta_params->sta_index = 0xFF;
+   sta_params->sta_index = WCN36XX_HAL_STA_INVALID_IDX;
} else {
sta_params->type = 0;
sta_params->sta_index = 1;
-- 
2.5.0



[PATCH v3 03/18] wcn36xx: Add helper macros to cast vif to private vif and vice versa

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Makes the code a little easier to read.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/debug.c   | 12 +++-
 drivers/net/wireless/ath/wcn36xx/main.c| 16 +++-
 drivers/net/wireless/ath/wcn36xx/pmc.c |  4 ++--
 drivers/net/wireless/ath/wcn36xx/smd.c | 24 ++--
 drivers/net/wireless/ath/wcn36xx/txrx.c|  8 ++--
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 12 
 6 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/debug.c 
b/drivers/net/wireless/ath/wcn36xx/debug.c
index ef44a2da644d..2a6bb62e785c 100644
--- a/drivers/net/wireless/ath/wcn36xx/debug.c
+++ b/drivers/net/wireless/ath/wcn36xx/debug.c
@@ -33,9 +33,7 @@ static ssize_t read_file_bool_bmps(struct file *file, char 
__user *user_buf,
char buf[3];
 
list_for_each_entry(vif_priv, >vif_list, list) {
-   vif = container_of((void *)vif_priv,
-  struct ieee80211_vif,
-  drv_priv);
+   vif = wcn36xx_priv_to_vif(vif_priv);
if (NL80211_IFTYPE_STATION == vif->type) {
if (vif_priv->pw_state == WCN36XX_BMPS)
buf[0] = '1';
@@ -70,9 +68,7 @@ static ssize_t write_file_bool_bmps(struct file *file,
case 'Y':
case '1':
list_for_each_entry(vif_priv, >vif_list, list) {
-   vif = container_of((void *)vif_priv,
-  struct ieee80211_vif,
-  drv_priv);
+   vif = wcn36xx_priv_to_vif(vif_priv);
if (NL80211_IFTYPE_STATION == vif->type) {
wcn36xx_enable_keep_alive_null_packet(wcn, vif);
wcn36xx_pmc_enter_bmps_state(wcn, vif);
@@ -83,9 +79,7 @@ static ssize_t write_file_bool_bmps(struct file *file,
case 'N':
case '0':
list_for_each_entry(vif_priv, >vif_list, list) {
-   vif = container_of((void *)vif_priv,
-  struct ieee80211_vif,
-  drv_priv);
+   vif = wcn36xx_priv_to_vif(vif_priv);
if (NL80211_IFTYPE_STATION == vif->type)
wcn36xx_pmc_exit_bmps_state(wcn, vif);
}
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index a27279c2c695..62cb9ffd854c 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -346,9 +346,7 @@ static int wcn36xx_config(struct ieee80211_hw *hw, u32 
changed)
wcn36xx_dbg(WCN36XX_DBG_MAC, "wcn36xx_config channel 
switch=%d\n",
ch);
list_for_each_entry(tmp, >vif_list, list) {
-   vif = container_of((void *)tmp,
-  struct ieee80211_vif,
-  drv_priv);
+   vif = wcn36xx_priv_to_vif(tmp);
wcn36xx_smd_switch_channel(wcn, vif, ch);
}
}
@@ -387,7 +385,7 @@ static int wcn36xx_set_key(struct ieee80211_hw *hw, enum 
set_key_cmd cmd,
   struct ieee80211_key_conf *key_conf)
 {
struct wcn36xx *wcn = hw->priv;
-   struct wcn36xx_vif *vif_priv = (struct wcn36xx_vif *)vif->drv_priv;
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
struct wcn36xx_sta *sta_priv = vif_priv->sta;
int ret = 0;
u8 key[WLAN_MAX_KEY_LEN];
@@ -590,7 +588,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
struct sk_buff *skb = NULL;
u16 tim_off, tim_len;
enum wcn36xx_hal_link_state link_state;
-   struct wcn36xx_vif *vif_priv = (struct wcn36xx_vif *)vif->drv_priv;
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
 
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac bss info changed vif %p changed 
0x%08x\n",
vif, changed);
@@ -757,7 +755,7 @@ static void wcn36xx_remove_interface(struct ieee80211_hw 
*hw,
 struct ieee80211_vif *vif)
 {
struct wcn36xx *wcn = hw->priv;
-   struct wcn36xx_vif *vif_priv = (struct wcn36xx_vif *)vif->drv_priv;
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac remove interface vif %p\n", vif);
 
list_del(_priv->list);
@@ -768,7 +766,7 @@ static int wcn36xx_add_interface(struct ieee80211_hw *hw,
 struct ieee80211_vif *vif)
 {
struct wcn36xx *wcn = hw->priv;
-   struct 

Re: [PATCH] net: w5100: don't build spi driver without w5100

2016-04-18 Thread David Miller
From: Arnd Bergmann 
Date: Mon, 18 Apr 2016 23:58:30 +0200

> The w5100-spi driver front-end only makes sense when the w5100
> core driver is enabled, not for a configuration that only has w5300:
 ...
> This adds an appropriate Kconfig dependency.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: 630cf09751fe ("net: w5100: support SPI interface mode")

Applied, thanks a lot.


[PATCH v3 06/18] wcn36xx: Add helper macros to cast sta to priv

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

While poking at this I also change two related things. I rename one
variable to make the names consistent. I also move one assignment of
priv_sta to the declaration to save a few lines.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/main.c| 14 ++
 drivers/net/wireless/ath/wcn36xx/smd.c | 12 ++--
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  6 ++
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 4781b5e8deb3..30f015d3a9e6 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -373,7 +373,7 @@ static void wcn36xx_tx(struct ieee80211_hw *hw,
struct wcn36xx_sta *sta_priv = NULL;
 
if (control->sta)
-   sta_priv = (struct wcn36xx_sta *)control->sta->drv_priv;
+   sta_priv = wcn36xx_sta_to_priv(control->sta);
 
if (wcn36xx_start_tx(wcn, sta_priv, skb))
ieee80211_free_txskb(wcn->hw, skb);
@@ -518,7 +518,7 @@ static void wcn36xx_update_allowed_rates(struct 
ieee80211_sta *sta,
 {
int i, size;
u16 *rates_table;
-   struct wcn36xx_sta *sta_priv = (struct wcn36xx_sta *)sta->drv_priv;
+   struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
u32 rates = sta->supp_rates[band];
 
memset(_priv->supported_rates, 0,
@@ -661,7 +661,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
rcu_read_unlock();
goto out;
}
-   sta_priv = (struct wcn36xx_sta *)sta->drv_priv;
+   sta_priv = wcn36xx_sta_to_priv(sta);
 
wcn36xx_update_allowed_rates(sta, WCN36XX_BAND(wcn));
 
@@ -791,7 +791,7 @@ static int wcn36xx_sta_add(struct ieee80211_hw *hw, struct 
ieee80211_vif *vif,
 {
struct wcn36xx *wcn = hw->priv;
struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
-   struct wcn36xx_sta *sta_priv = (struct wcn36xx_sta *)sta->drv_priv;
+   struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac sta add vif %p sta %pM\n",
vif, sta->addr);
 
@@ -816,7 +816,7 @@ static int wcn36xx_sta_remove(struct ieee80211_hw *hw,
 {
struct wcn36xx *wcn = hw->priv;
struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
-   struct wcn36xx_sta *sta_priv = (struct wcn36xx_sta *)sta->drv_priv;
+   struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
 
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac sta remove vif %p sta %pM index %d\n",
vif, sta->addr, sta_priv->sta_index);
@@ -858,7 +858,7 @@ static int wcn36xx_ampdu_action(struct ieee80211_hw *hw,
struct ieee80211_ampdu_params *params)
 {
struct wcn36xx *wcn = hw->priv;
-   struct wcn36xx_sta *sta_priv = NULL;
+   struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(params->sta);
struct ieee80211_sta *sta = params->sta;
enum ieee80211_ampdu_mlme_action action = params->action;
u16 tid = params->tid;
@@ -867,8 +867,6 @@ static int wcn36xx_ampdu_action(struct ieee80211_hw *hw,
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac ampdu action action %d tid %d\n",
action, tid);
 
-   sta_priv = (struct wcn36xx_sta *)sta->drv_priv;
-
switch (action) {
case IEEE80211_AMPDU_RX_START:
sta_priv->tid = tid;
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 6d4aa9250ca8..ff56138528b6 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -192,7 +192,7 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx *wcn,
struct wcn36xx_hal_config_sta_params *sta_params)
 {
struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
-   struct wcn36xx_sta *priv_sta = NULL;
+   struct wcn36xx_sta *sta_priv = NULL;
if (vif->type == NL80211_IFTYPE_ADHOC ||
vif->type == NL80211_IFTYPE_AP ||
vif->type == NL80211_IFTYPE_MESH_POINT) {
@@ -228,17 +228,17 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx 
*wcn,
sta_params->p2p = 0;
 
if (sta) {
-   priv_sta = (struct wcn36xx_sta *)sta->drv_priv;
+   sta_priv = wcn36xx_sta_to_priv(sta);
if (NL80211_IFTYPE_STATION == vif->type)
memcpy(_params->bssid, sta->addr, ETH_ALEN);
else
memcpy(_params->mac, sta->addr, ETH_ALEN);
sta_params->wmm_enabled = sta->wme;
sta_params->max_sp_len = sta->max_sp;
-   sta_params->aid = priv_sta->aid;
+

[PATCH v3 08/18] wcn36xx: Remove sta pointer in private vif struct

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

This does not work with multiple sta's in a vif.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/main.c|  3 ---
 drivers/net/wireless/ath/wcn36xx/smd.c | 28 +++-
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  1 -
 3 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index a23738deb5b3..7c06ca9fdd2c 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -796,7 +796,6 @@ static int wcn36xx_sta_add(struct ieee80211_hw *hw, struct 
ieee80211_vif *vif,
vif, sta->addr);
 
spin_lock_init(_priv->ampdu_lock);
-   vif_priv->sta = sta_priv;
sta_priv->vif = vif_priv;
/*
 * For STA mode HW will be configured on BSS_CHANGED_ASSOC because
@@ -815,14 +814,12 @@ static int wcn36xx_sta_remove(struct ieee80211_hw *hw,
  struct ieee80211_sta *sta)
 {
struct wcn36xx *wcn = hw->priv;
-   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
 
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac sta remove vif %p sta %pM index %d\n",
vif, sta->addr, sta_priv->sta_index);
 
wcn36xx_smd_delete_sta(wcn, sta_priv->sta_index);
-   vif_priv->sta = NULL;
sta_priv->vif = NULL;
return 0;
 }
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index ff56138528b6..76c6856ed932 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -1170,6 +1170,7 @@ static int wcn36xx_smd_config_bss_v1(struct wcn36xx *wcn,
 
 static int wcn36xx_smd_config_bss_rsp(struct wcn36xx *wcn,
  struct ieee80211_vif *vif,
+ struct ieee80211_sta *sta,
  void *buf,
  size_t len)
 {
@@ -1200,9 +1201,10 @@ static int wcn36xx_smd_config_bss_rsp(struct wcn36xx 
*wcn,
 
vif_priv->bss_index = params->bss_index;
 
-   if (vif_priv->sta) {
-   vif_priv->sta->bss_sta_index =  params->bss_sta_index;
-   vif_priv->sta->bss_dpu_desc_index = params->dpu_desc_index;
+   if (sta) {
+   struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
+   sta_priv->bss_sta_index = params->bss_sta_index;
+   sta_priv->bss_dpu_desc_index = params->dpu_desc_index;
}
 
vif_priv->self_ucast_dpu_sign = params->ucast_dpu_signature;
@@ -1329,6 +1331,7 @@ int wcn36xx_smd_config_bss(struct wcn36xx *wcn, struct 
ieee80211_vif *vif,
}
ret = wcn36xx_smd_config_bss_rsp(wcn,
 vif,
+sta,
 wcn->hal_buf,
 wcn->hal_rsp_len);
if (ret) {
@@ -2058,25 +2061,24 @@ static int wcn36xx_smd_delete_sta_context_ind(struct 
wcn36xx *wcn,
 {
struct wcn36xx_hal_delete_sta_context_ind_msg *rsp = buf;
struct wcn36xx_vif *tmp;
-   struct ieee80211_sta *sta = NULL;
+   struct ieee80211_sta *sta;
 
if (len != sizeof(*rsp)) {
wcn36xx_warn("Corrupted delete sta indication\n");
return -EIO;
}
 
+   wcn36xx_dbg(WCN36XX_DBG_HAL, "delete station indication %pM index %d\n",
+   rsp->addr2, rsp->sta_id);
+
list_for_each_entry(tmp, >vif_list, list) {
-   if (sta && (tmp->sta->sta_index == rsp->sta_id)) {
-   sta = container_of((void *)tmp->sta,
-struct ieee80211_sta,
-drv_priv);
-   wcn36xx_dbg(WCN36XX_DBG_HAL,
-   "delete station indication %pM index %d\n",
-   rsp->addr2,
-   rsp->sta_id);
+   rcu_read_lock();
+   sta = ieee80211_find_sta(wcn36xx_priv_to_vif(tmp), rsp->addr2);
+   if (sta)
ieee80211_report_low_ack(sta, 0);
+   rcu_read_unlock();
+   if (sta)
return 0;
-   }
}
 
wcn36xx_warn("STA with addr %pM and index %d not found\n",
diff --git a/drivers/net/wireless/ath/wcn36xx/wcn36xx.h 
b/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
index c368a34c8de7..54000db0af1a 100644
--- a/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
+++ b/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
@@ -125,7 +125,6 @@ struct wcn36xx_platform_ctrl_ops {
  */
 struct wcn36xx_vif {

[PATCH v3 07/18] wcn36xx: Fetch private sta data from sta entry instead of from vif

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

For consistency with other code.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---

Changes since v1:
- Reordered after the now previous patch, to make wcn36xx_sta_to_priv()
  available before we use it

 drivers/net/wireless/ath/wcn36xx/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 30f015d3a9e6..a23738deb5b3 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -386,7 +386,7 @@ static int wcn36xx_set_key(struct ieee80211_hw *hw, enum 
set_key_cmd cmd,
 {
struct wcn36xx *wcn = hw->priv;
struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
-   struct wcn36xx_sta *sta_priv = vif_priv->sta;
+   struct wcn36xx_sta *sta_priv = wcn36xx_sta_to_priv(sta);
int ret = 0;
u8 key[WLAN_MAX_KEY_LEN];
 
-- 
2.5.0



[PATCH v3 13/18] wcn36xx: Track association state

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Knowing the association state is needed for mc filtering.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/main.c| 2 ++
 drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index f9c77de94583..253cece1b660 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -655,6 +655,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
 vif->addr,
 bss_conf->aid);
 
+   vif_priv->sta_assoc = true;
rcu_read_lock();
sta = ieee80211_find_sta(vif, bss_conf->bssid);
if (!sta) {
@@ -686,6 +687,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
bss_conf->bssid,
vif->addr,
bss_conf->aid);
+   vif_priv->sta_assoc = false;
wcn36xx_smd_set_link_st(wcn,
bss_conf->bssid,
vif->addr,
diff --git a/drivers/net/wireless/ath/wcn36xx/wcn36xx.h 
b/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
index 54000db0af1a..7433d67a5929 100644
--- a/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
+++ b/drivers/net/wireless/ath/wcn36xx/wcn36xx.h
@@ -128,6 +128,7 @@ struct wcn36xx_vif {
u8 dtim_period;
enum ani_ed_type encrypt_type;
bool is_joining;
+   bool sta_assoc;
struct wcn36xx_hal_mac_ssid ssid;
 
/* Power management */
-- 
2.5.0



[PATCH v3 09/18] wcn36xx: Parse trigger_ba response properly

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

This message does not follow the canonical format and needs it's own
parser.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 76c6856ed932..7f315d098f52 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -1968,6 +1968,17 @@ out:
return ret;
 }
 
+static int wcn36xx_smd_trigger_ba_rsp(void *buf, int len)
+{
+   struct wcn36xx_hal_trigger_ba_rsp_msg *rsp;
+
+   if (len < sizeof(*rsp))
+   return -EINVAL;
+
+   rsp = (struct wcn36xx_hal_trigger_ba_rsp_msg *) buf;
+   return rsp->status;
+}
+
 int wcn36xx_smd_trigger_ba(struct wcn36xx *wcn, u8 sta_index)
 {
struct wcn36xx_hal_trigger_ba_req_msg msg_body;
@@ -1992,8 +2003,7 @@ int wcn36xx_smd_trigger_ba(struct wcn36xx *wcn, u8 
sta_index)
wcn36xx_err("Sending hal_trigger_ba failed\n");
goto out;
}
-   ret = wcn36xx_smd_rsp_status_check_v2(wcn, wcn->hal_buf,
-   wcn->hal_rsp_len);
+   ret = wcn36xx_smd_trigger_ba_rsp(wcn->hal_buf, wcn->hal_rsp_len);
if (ret) {
wcn36xx_err("hal_trigger_ba response failed err=%d\n", ret);
goto out;
-- 
2.5.0



[PATCH v3 14/18] wcn36xx: Implement multicast filtering

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Pass the multicast list to FW.

This patch also adds a way to build the smd command in place. This is
needed because the MC list command is too big for the stack.

Signed-off-by: Pontus Fuchs 
[bjorn: dropped FIF_PROMISC_IN_BSS usage]
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/hal.h  |  6 ++--
 drivers/net/wireless/ath/wcn36xx/main.c | 50 ++--
 drivers/net/wireless/ath/wcn36xx/smd.c  | 51 +
 drivers/net/wireless/ath/wcn36xx/smd.h  |  3 ++
 4 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index 433d9801a0ae..ec64c47f918b 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -4267,9 +4267,9 @@ struct wcn36xx_hal_rcv_flt_mc_addr_list_type {
u8 data_offset;
 
u32 mc_addr_count;
-   u8 mc_addr[ETH_ALEN][WCN36XX_HAL_MAX_NUM_MULTICAST_ADDRESS];
+   u8 mc_addr[WCN36XX_HAL_MAX_NUM_MULTICAST_ADDRESS][ETH_ALEN];
u8 bss_index;
-};
+} __packed;
 
 struct wcn36xx_hal_set_pkt_filter_rsp_msg {
struct wcn36xx_hal_msg_header header;
@@ -4323,7 +4323,7 @@ struct wcn36xx_hal_rcv_flt_pkt_clear_rsp_msg {
 struct wcn36xx_hal_rcv_flt_pkt_set_mc_list_req_msg {
struct wcn36xx_hal_msg_header header;
struct wcn36xx_hal_rcv_flt_mc_addr_list_type mc_addr_list;
-};
+} __packed;
 
 struct wcn36xx_hal_rcv_flt_pkt_set_mc_list_rsp_msg {
struct wcn36xx_hal_msg_header header;
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 253cece1b660..c0ba7b0775b3 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -287,6 +287,7 @@ static int wcn36xx_start(struct ieee80211_hw *hw)
}
 
wcn36xx_detect_chip_version(wcn);
+   wcn36xx_smd_update_cfg(wcn, WCN36XX_HAL_CFG_ENABLE_MC_ADDR_LIST, 1);
 
/* DMA channel initialization */
ret = wcn36xx_dxe_init(wcn);
@@ -354,15 +355,57 @@ static int wcn36xx_config(struct ieee80211_hw *hw, u32 
changed)
return 0;
 }
 
-#define WCN36XX_SUPPORTED_FILTERS (0)
-
 static void wcn36xx_configure_filter(struct ieee80211_hw *hw,
 unsigned int changed,
 unsigned int *total, u64 multicast)
 {
+   struct wcn36xx_hal_rcv_flt_mc_addr_list_type *fp;
+   struct wcn36xx *wcn = hw->priv;
+   struct wcn36xx_vif *tmp;
+   struct ieee80211_vif *vif = NULL;
+
wcn36xx_dbg(WCN36XX_DBG_MAC, "mac configure filter\n");
 
-   *total &= WCN36XX_SUPPORTED_FILTERS;
+   *total &= FIF_ALLMULTI;
+
+   fp = (void *)(unsigned long)multicast;
+   list_for_each_entry(tmp, >vif_list, list) {
+   vif = wcn36xx_priv_to_vif(tmp);
+
+   /* FW handles MC filtering only when connected as STA */
+   if (*total & FIF_ALLMULTI)
+   wcn36xx_smd_set_mc_list(wcn, vif, NULL);
+   else if (NL80211_IFTYPE_STATION == vif->type && tmp->sta_assoc)
+   wcn36xx_smd_set_mc_list(wcn, vif, fp);
+   }
+   kfree(fp);
+}
+
+static u64 wcn36xx_prepare_multicast(struct ieee80211_hw *hw,
+struct netdev_hw_addr_list *mc_list)
+{
+   struct wcn36xx_hal_rcv_flt_mc_addr_list_type *fp;
+   struct netdev_hw_addr *ha;
+
+   wcn36xx_dbg(WCN36XX_DBG_MAC, "mac prepare multicast list\n");
+   fp = kzalloc(sizeof(*fp), GFP_ATOMIC);
+   if (!fp) {
+   wcn36xx_err("Out of memory setting filters.\n");
+   return 0;
+   }
+
+   fp->mc_addr_count = 0;
+   /* update multicast filtering parameters */
+   if (netdev_hw_addr_list_count(mc_list) <=
+   WCN36XX_HAL_MAX_NUM_MULTICAST_ADDRESS) {
+   netdev_hw_addr_list_for_each(ha, mc_list) {
+   memcpy(fp->mc_addr[fp->mc_addr_count],
+   ha->addr, ETH_ALEN);
+   fp->mc_addr_count++;
+   }
+   }
+
+   return (u64)(unsigned long)fp;
 }
 
 static void wcn36xx_tx(struct ieee80211_hw *hw,
@@ -920,6 +963,7 @@ static const struct ieee80211_ops wcn36xx_ops = {
.resume = wcn36xx_resume,
 #endif
.config = wcn36xx_config,
+   .prepare_multicast  = wcn36xx_prepare_multicast,
.configure_filter   = wcn36xx_configure_filter,
.tx = wcn36xx_tx,
.set_key= wcn36xx_set_key,
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index e0d5631657c1..b1bdc229e560 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -271,6 +271,16 @@ out:
 

[PATCH v3 11/18] wcn36xx: Use allocated self sta index instead of hard coded

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index ebb446272d21..e0d5631657c1 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -200,7 +200,7 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx *wcn,
sta_params->sta_index = WCN36XX_HAL_STA_INVALID_IDX;
} else {
sta_params->type = 0;
-   sta_params->sta_index = 1;
+   sta_params->sta_index = vif_priv->self_sta_index;
}
 
sta_params->listen_interval = WCN36XX_LISTEN_INTERVAL(wcn);
-- 
2.5.0



Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread David Miller
From: Eric Dumazet 
Date: Mon, 18 Apr 2016 21:32:04 -0700

> On Mon, 2016-04-18 at 23:41 -0400, David Miller wrote:
>>  
>> +/* Add a zero length NOP attribute so that the nla_data()
>> + * of the IFLA_STATS64 will be 64-bit aligned.
>> + */
>> +#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +attr = nla_reserve(skb, IFLA_PAD, 0);
>> +if (!attr)
>> +return -EMSGSIZE;
>> +#endif
> 
> You must do this only if current skb->data alignment is not correct.

I'll put an assertion there if it makes you happy. :-)



[PATCH v3 15/18] wcn36xx: Use correct command struct for EXIT_BMPS_REQ

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

EXIT_BMPS_REQ was using the command struct for ENTER_BMPS_REQ. I
spotted this when looking at command dumps.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index b1bdc229e560..c15501c06eb2 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -1690,7 +1690,7 @@ out:
 
 int wcn36xx_smd_exit_bmps(struct wcn36xx *wcn, struct ieee80211_vif *vif)
 {
-   struct wcn36xx_hal_enter_bmps_req_msg msg_body;
+   struct wcn36xx_hal_exit_bmps_req_msg msg_body;
struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
int ret = 0;
 
-- 
2.5.0



[PATCH v3 04/18] wcn36xx: Use consistent name for private vif

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

Some code used priv_vif and some used vif_priv. Convert all to vif_priv
for consistency.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index cc1b3b7a4ff9..170440ed5d85 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -191,7 +191,7 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx *wcn,
struct ieee80211_sta *sta,
struct wcn36xx_hal_config_sta_params *sta_params)
 {
-   struct wcn36xx_vif *priv_vif = wcn36xx_vif_to_priv(vif);
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
struct wcn36xx_sta *priv_sta = NULL;
if (vif->type == NL80211_IFTYPE_ADHOC ||
vif->type == NL80211_IFTYPE_AP ||
@@ -215,7 +215,7 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx *wcn,
else
memcpy(_params->bssid, vif->addr, ETH_ALEN);
 
-   sta_params->encrypt_type = priv_vif->encrypt_type;
+   sta_params->encrypt_type = vif_priv->encrypt_type;
sta_params->short_preamble_supported = true;
 
sta_params->rifs_mode = 0;
@@ -224,7 +224,7 @@ static void wcn36xx_smd_set_sta_params(struct wcn36xx *wcn,
sta_params->uapsd = 0;
sta_params->mimo_ps = WCN36XX_HAL_HT_MIMO_PS_STATIC;
sta_params->max_ampdu_duration = 0;
-   sta_params->bssid_index = priv_vif->bss_index;
+   sta_params->bssid_index = vif_priv->bss_index;
sta_params->p2p = 0;
 
if (sta) {
@@ -726,7 +726,7 @@ static int wcn36xx_smd_add_sta_self_rsp(struct wcn36xx *wcn,
size_t len)
 {
struct wcn36xx_hal_add_sta_self_rsp_msg *rsp;
-   struct wcn36xx_vif *priv_vif = wcn36xx_vif_to_priv(vif);
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
 
if (len < sizeof(*rsp))
return -EINVAL;
@@ -743,8 +743,8 @@ static int wcn36xx_smd_add_sta_self_rsp(struct wcn36xx *wcn,
"hal add sta self status %d self_sta_index %d dpu_index 
%d\n",
rsp->status, rsp->self_sta_index, rsp->dpu_index);
 
-   priv_vif->self_sta_index = rsp->self_sta_index;
-   priv_vif->self_dpu_desc_index = rsp->dpu_index;
+   vif_priv->self_sta_index = rsp->self_sta_index;
+   vif_priv->self_dpu_desc_index = rsp->dpu_index;
 
return 0;
 }
@@ -1175,7 +1175,7 @@ static int wcn36xx_smd_config_bss_rsp(struct wcn36xx *wcn,
 {
struct wcn36xx_hal_config_bss_rsp_msg *rsp;
struct wcn36xx_hal_config_bss_rsp_params *params;
-   struct wcn36xx_vif *priv_vif = wcn36xx_vif_to_priv(vif);
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
 
if (len < sizeof(*rsp))
return -EINVAL;
@@ -1198,14 +1198,14 @@ static int wcn36xx_smd_config_bss_rsp(struct wcn36xx 
*wcn,
params->bss_bcast_sta_idx, params->mac,
params->tx_mgmt_power, params->ucast_dpu_signature);
 
-   priv_vif->bss_index = params->bss_index;
+   vif_priv->bss_index = params->bss_index;
 
-   if (priv_vif->sta) {
-   priv_vif->sta->bss_sta_index =  params->bss_sta_index;
-   priv_vif->sta->bss_dpu_desc_index = params->dpu_desc_index;
+   if (vif_priv->sta) {
+   vif_priv->sta->bss_sta_index =  params->bss_sta_index;
+   vif_priv->sta->bss_dpu_desc_index = params->dpu_desc_index;
}
 
-   priv_vif->self_ucast_dpu_sign = params->ucast_dpu_signature;
+   vif_priv->self_ucast_dpu_sign = params->ucast_dpu_signature;
 
return 0;
 }
@@ -1343,13 +1343,13 @@ out:
 int wcn36xx_smd_delete_bss(struct wcn36xx *wcn, struct ieee80211_vif *vif)
 {
struct wcn36xx_hal_delete_bss_req_msg msg_body;
-   struct wcn36xx_vif *priv_vif = wcn36xx_vif_to_priv(vif);
+   struct wcn36xx_vif *vif_priv = wcn36xx_vif_to_priv(vif);
int ret = 0;
 
mutex_lock(>hal_mutex);
INIT_HAL_MSG(msg_body, WCN36XX_HAL_DELETE_BSS_REQ);
 
-   msg_body.bss_index = priv_vif->bss_index;
+   msg_body.bss_index = vif_priv->bss_index;
 
PREPARE_HAL_BUF(wcn->hal_buf, msg_body);
 
-- 
2.5.0



[PATCH v3 10/18] wcn36xx: Copy all members in config_sta v1 conversion

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

When converting to version 1 of the config_sta struct not all
members where copied. This fixes the problem of multicast frames
not being delivered on an encrypted network.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/smd.c | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 7f315d098f52..ebb446272d21 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -949,17 +949,32 @@ static void wcn36xx_smd_convert_sta_to_v1(struct wcn36xx 
*wcn,
memcpy(>mac, orig->mac, ETH_ALEN);
v1->aid = orig->aid;
v1->type = orig->type;
+   v1->short_preamble_supported = orig->short_preamble_supported;
v1->listen_interval = orig->listen_interval;
+   v1->wmm_enabled = orig->wmm_enabled;
v1->ht_capable = orig->ht_capable;
-
+   v1->tx_channel_width_set = orig->tx_channel_width_set;
+   v1->rifs_mode = orig->rifs_mode;
+   v1->lsig_txop_protection = orig->lsig_txop_protection;
v1->max_ampdu_size = orig->max_ampdu_size;
v1->max_ampdu_density = orig->max_ampdu_density;
v1->sgi_40mhz = orig->sgi_40mhz;
v1->sgi_20Mhz = orig->sgi_20Mhz;
-
+   v1->rmf = orig->rmf;
+   v1->encrypt_type = orig->encrypt_type;
+   v1->action = orig->action;
+   v1->uapsd = orig->uapsd;
+   v1->max_sp_len = orig->max_sp_len;
+   v1->green_field_capable = orig->green_field_capable;
+   v1->mimo_ps = orig->mimo_ps;
+   v1->delayed_ba_support = orig->delayed_ba_support;
+   v1->max_ampdu_duration = orig->max_ampdu_duration;
+   v1->dsss_cck_mode_40mhz = orig->dsss_cck_mode_40mhz;
memcpy(>supported_rates, >supported_rates,
   sizeof(orig->supported_rates));
v1->sta_index = orig->sta_index;
+   v1->bssid_index = orig->bssid_index;
+   v1->p2p = orig->p2p;
 }
 
 static int wcn36xx_smd_config_sta_rsp(struct wcn36xx *wcn,
-- 
2.5.0



[PATCH v3 17/18] wcn36xx: Correct remove bss key response encoding

2016-04-18 Thread Bjorn Andersson
The WCN36XX_HAL_RMV_BSSKEY_RSP carries a single u32 with "status", so we
can use the standard status check function for decoding the result.

This is the last user of the v2 status checker, so remove the struct and
helper function.

Signed-off-by: Bjorn Andersson 
---

Changes since v2:
- Added this patch to the series

 drivers/net/wireless/ath/wcn36xx/smd.c | 19 +--
 drivers/net/wireless/ath/wcn36xx/smd.h |  9 -
 2 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index c15501c06eb2..5f6ca3124bd8 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -312,22 +312,6 @@ static int wcn36xx_smd_rsp_status_check(void *buf, size_t 
len)
return 0;
 }
 
-static int wcn36xx_smd_rsp_status_check_v2(struct wcn36xx *wcn, void *buf,
-size_t len)
-{
-   struct wcn36xx_fw_msg_status_rsp_v2 *rsp;
-
-   if (len < sizeof(struct wcn36xx_hal_msg_header) + sizeof(*rsp))
-   return wcn36xx_smd_rsp_status_check(buf, len);
-
-   rsp = buf + sizeof(struct wcn36xx_hal_msg_header);
-
-   if (WCN36XX_FW_MSG_RESULT_SUCCESS != rsp->status)
-   return rsp->status;
-
-   return 0;
-}
-
 int wcn36xx_smd_load_nv(struct wcn36xx *wcn)
 {
struct nv_data *nv_d;
@@ -1647,8 +1631,7 @@ int wcn36xx_smd_remove_bsskey(struct wcn36xx *wcn,
wcn36xx_err("Sending hal_remove_bsskey failed\n");
goto out;
}
-   ret = wcn36xx_smd_rsp_status_check_v2(wcn, wcn->hal_buf,
- wcn->hal_rsp_len);
+   ret = wcn36xx_smd_rsp_status_check(wcn->hal_buf, wcn->hal_rsp_len);
if (ret) {
wcn36xx_err("hal_remove_bsskey response failed err=%d\n", ret);
goto out;
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.h 
b/drivers/net/wireless/ath/wcn36xx/smd.h
index c1b76d75cf85..d74d781f4c8d 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.h
+++ b/drivers/net/wireless/ath/wcn36xx/smd.h
@@ -44,15 +44,6 @@ struct wcn36xx_fw_msg_status_rsp {
u32 status;
 } __packed;
 
-/* wcn3620 returns this for tigger_ba */
-
-struct wcn36xx_fw_msg_status_rsp_v2 {
-   u8  bss_id[6];
-   u32 status __packed;
-   u16 count_following_candidates __packed;
-   /* candidate list follows */
-};
-
 struct wcn36xx_hal_ind_msg {
struct list_head list;
u8 *msg;
-- 
2.5.0



[PATCH v3 16/18] wcn36xx: Delete BSS before idling link

2016-04-18 Thread Bjorn Andersson
When disabling the beacon we must delete the bss before idling the link.

Signed-off-by: Bjorn Andersson 
---

Changes since v2:
- Added this patch to the series

 drivers/net/wireless/ath/wcn36xx/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index c0ba7b0775b3..680217506b3d 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -779,9 +779,9 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
wcn36xx_smd_set_link_st(wcn, vif->addr, vif->addr,
link_state);
} else {
+   wcn36xx_smd_delete_bss(wcn, vif);
wcn36xx_smd_set_link_st(wcn, vif->addr, vif->addr,
WCN36XX_HAL_LINK_IDLE_STATE);
-   wcn36xx_smd_delete_bss(wcn, vif);
}
}
 out:
-- 
2.5.0



[PATCH v3 18/18] wcn36xx: Fill in capability list

2016-04-18 Thread Bjorn Andersson
Fill in the capability list with more values from the downstream driver.

Signed-off-by: Bjorn Andersson 
---

Changes since v2:
- Added this patch to the series

 drivers/net/wireless/ath/wcn36xx/hal.h  | 39 
 drivers/net/wireless/ath/wcn36xx/main.c | 40 -
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index ec64c47f918b..658bfb8baabe 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -4389,6 +4389,45 @@ enum place_holder_in_cap_bitmap {
RTT = 20,
RATECTRL = 21,
WOW = 22,
+   WLAN_ROAM_SCAN_OFFLOAD = 23,
+   SPECULATIVE_PS_POLL = 24,
+   SCAN_SCH = 25,
+   IBSS_HEARTBEAT_OFFLOAD = 26,
+   WLAN_SCAN_OFFLOAD = 27,
+   WLAN_PERIODIC_TX_PTRN = 28,
+   ADVANCE_TDLS = 29,
+   BATCH_SCAN = 30,
+   FW_IN_TX_PATH = 31,
+   EXTENDED_NSOFFLOAD_SLOT = 32,
+   CH_SWITCH_V1 = 33,
+   HT40_OBSS_SCAN = 34,
+   UPDATE_CHANNEL_LIST = 35,
+   WLAN_MCADDR_FLT = 36,
+   WLAN_CH144 = 37,
+   NAN = 38,
+   TDLS_SCAN_COEXISTENCE = 39,
+   LINK_LAYER_STATS_MEAS = 40,
+   MU_MIMO = 41,
+   EXTENDED_SCAN = 42,
+   DYNAMIC_WMM_PS = 43,
+   MAC_SPOOFED_SCAN = 44,
+   BMU_ERROR_GENERIC_RECOVERY = 45,
+   DISA = 46,
+   FW_STATS = 47,
+   WPS_PRBRSP_TMPL = 48,
+   BCN_IE_FLT_DELTA = 49,
+   TDLS_OFF_CHANNEL = 51,
+   RTT3 = 52,
+   MGMT_FRAME_LOGGING = 53,
+   ENHANCED_TXBD_COMPLETION = 54,
+   LOGGING_ENHANCEMENT = 55,
+   EXT_SCAN_ENHANCED = 56,
+   MEMORY_DUMP_SUPPORTED = 57,
+   PER_PKT_STATS_SUPPORTED = 58,
+   EXT_LL_STAT = 60,
+   WIFI_CONFIG = 61,
+   ANTENNA_DIVERSITY_SELECTION = 62,
+
MAX_FEATURE_SUPPORTED = 128,
 };
 
diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 680217506b3d..fe81b2a7c8d9 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -201,7 +201,45 @@ static const char * const wcn36xx_caps_names[] = {
"BCN_FILTER",   /* 19 */
"RTT",  /* 20 */
"RATECTRL", /* 21 */
-   "WOW"   /* 22 */
+   "WOW",  /* 22 */
+   "WLAN_ROAM_SCAN_OFFLOAD",   /* 23 */
+   "SPECULATIVE_PS_POLL",  /* 24 */
+   "SCAN_SCH", /* 25 */
+   "IBSS_HEARTBEAT_OFFLOAD",   /* 26 */
+   "WLAN_SCAN_OFFLOAD",/* 27 */
+   "WLAN_PERIODIC_TX_PTRN",/* 28 */
+   "ADVANCE_TDLS", /* 29 */
+   "BATCH_SCAN",   /* 30 */
+   "FW_IN_TX_PATH",/* 31 */
+   "EXTENDED_NSOFFLOAD_SLOT",  /* 32 */
+   "CH_SWITCH_V1", /* 33 */
+   "HT40_OBSS_SCAN",   /* 34 */
+   "UPDATE_CHANNEL_LIST",  /* 35 */
+   "WLAN_MCADDR_FLT",  /* 36 */
+   "WLAN_CH144",   /* 37 */
+   "NAN",  /* 38 */
+   "TDLS_SCAN_COEXISTENCE",/* 39 */
+   "LINK_LAYER_STATS_MEAS",/* 40 */
+   "MU_MIMO",  /* 41 */
+   "EXTENDED_SCAN",/* 42 */
+   "DYNAMIC_WMM_PS",   /* 43 */
+   "MAC_SPOOFED_SCAN", /* 44 */
+   "BMU_ERROR_GENERIC_RECOVERY",   /* 45 */
+   "DISA", /* 46 */
+   "FW_STATS", /* 47 */
+   "WPS_PRBRSP_TMPL",  /* 48 */
+   "BCN_IE_FLT_DELTA", /* 49 */
+   "TDLS_OFF_CHANNEL", /* 51 */
+   "RTT3", /* 52 */
+   "MGMT_FRAME_LOGGING",   /* 53 */
+   "ENHANCED_TXBD_COMPLETION", /* 54 */
+   "LOGGING_ENHANCEMENT",  /* 55 */
+   "EXT_SCAN_ENHANCED",/* 56 */
+   "MEMORY_DUMP_SUPPORTED",/* 57 */
+   "PER_PKT_STATS_SUPPORTED",  /* 58 */
+   "EXT_LL_STAT",  /* 60 */
+   "WIFI_CONFIG",  /* 61 */
+   "ANTENNA_DIVERSITY_SELECTION",  /* 62 */
 };
 
 static const char *wcn36xx_get_cap_name(enum place_holder_in_cap_bitmap x)
-- 
2.5.0



[PATCH v3 12/18] wcn36xx: Clear encrypt_type when deleting bss key

2016-04-18 Thread Bjorn Andersson
From: Pontus Fuchs 

This fixes a problem connecting to an open network after being
connected to an encrypted network.

Signed-off-by: Pontus Fuchs 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 7c06ca9fdd2c..f9c77de94583 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -471,6 +471,7 @@ static int wcn36xx_set_key(struct ieee80211_hw *hw, enum 
set_key_cmd cmd,
break;
case DISABLE_KEY:
if (!(IEEE80211_KEY_FLAG_PAIRWISE & key_conf->flags)) {
+   vif_priv->encrypt_type = WCN36XX_HAL_ED_NONE;
wcn36xx_smd_remove_bsskey(wcn,
vif_priv->encrypt_type,
key_conf->keyidx);
@@ -626,6 +627,7 @@ static void wcn36xx_bss_info_changed(struct ieee80211_hw 
*hw,
} else {
vif_priv->is_joining = false;
wcn36xx_smd_delete_bss(wcn, vif);
+   vif_priv->encrypt_type = WCN36XX_HAL_ED_NONE;
}
}
 
-- 
2.5.0



Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Roopa Prabhu
On 4/18/16, 8:41 PM, David Miller wrote:
> From: Roopa Prabhu 
> Date: Mon, 18 Apr 2016 14:10:19 -0700
>
>> This patch adds a new RTM_GETSTATS message to query link stats via
>> netlink from the kernel. RTM_NEWLINK also dumps stats today, but
>> RTM_NEWLINK returns a lot more than just stats and is expensive in
>> some cases when frequent polling for stats from userspace is a
>> common operation.
> I'm holding off on this until we sort out the 64-bit netlink
> attribute alignment issue.
sure,
>
> Meanwhile, I'll some kind of a fix into the tree for the
> rtnl_fill_stats() change so that it doesn't cause unaligned
> accesses.
>
> I just tested out a clever idea, where for architectures where
> unaligned accesses is a problem, we insert a zero length NOP attribute
> before the 64-bit stats.  This makes it properly aligned.  A quick
> hack patch just passed testing on my sparc64 box, but I'll go over it
> some more.
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index bb3a90b..5ffdcb3 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -155,6 +155,7 @@ enum {
>   IFLA_PROTO_DOWN,
>   IFLA_GSO_MAX_SEGS,
>   IFLA_GSO_MAX_SIZE,
> + IFLA_PAD,
>   __IFLA_MAX
>  };
>  
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index a7a3d34..b192576 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1052,6 +1052,15 @@ static noinline_for_stack int rtnl_fill_stats(struct 
> sk_buff *skb,
>   struct rtnl_link_stats64 *sp;
>   struct nlattr *attr;
>  
> + /* Add a zero length NOP attribute so that the nla_data()
> +  * of the IFLA_STATS64 will be 64-bit aligned.
> +  */
> +#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
> + attr = nla_reserve(skb, IFLA_PAD, 0);
> + if (!attr)
> + return -EMSGSIZE;
> +#endif
> +
>   attr = nla_reserve(skb, IFLA_STATS64,
>  sizeof(struct rtnl_link_stats64));
>   if (!attr)
>
that is cleaver :)

thanks!






Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 23:41 -0400, David Miller wrote:
>  
> + /* Add a zero length NOP attribute so that the nla_data()
> +  * of the IFLA_STATS64 will be 64-bit aligned.
> +  */
> +#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
> + attr = nla_reserve(skb, IFLA_PAD, 0);
> + if (!attr)
> + return -EMSGSIZE;
> +#endif

You must do this only if current skb->data alignment is not correct.

(IFLA_ALIAS for example could shift your expectations)

Also you probably want to change if_nlmsg_size() to add
 + nla_total_size(0) /* IFLA_PAD */






Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 23:41 -0400, David Miller wrote:
> From: Roopa Prabhu 
> Date: Mon, 18 Apr 2016 14:10:19 -0700
> 
> > This patch adds a new RTM_GETSTATS message to query link stats via
> > netlink from the kernel. RTM_NEWLINK also dumps stats today, but
> > RTM_NEWLINK returns a lot more than just stats and is expensive in
> > some cases when frequent polling for stats from userspace is a
> > common operation.
> 
> I'm holding off on this until we sort out the 64-bit netlink
> attribute alignment issue.
> 
> Meanwhile, I'll some kind of a fix into the tree for the
> rtnl_fill_stats() change so that it doesn't cause unaligned
> accesses.

Also it looks like my previous feedback for sctp_get_sctp_info() has
been ignored.

http://www.spinics.net/lists/netdev/msg372523.html >

info is not guaranteed to be aligned on 8 bytes.

You need to use put_unaligned()

Check commit ff5d749772018 ("tcp: beware of alignments in
tcp_get_info()") for details.






Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread David Miller
From: David Miller 
Date: Mon, 18 Apr 2016 21:48:51 -0400 (EDT)

> I think the time has probably come to have a new netlink attribute
> format that doesn't have this multi-decade old problem.
 ...

Scratch this whole idea, I think the padding attribute works a lot
better and is backwards compatible with every properly written
netlink application.


Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread David Miller
From: Roopa Prabhu 
Date: Mon, 18 Apr 2016 19:40:08 -0700

> On 4/18/16, 7:22 PM, Eric Dumazet wrote:
>> On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote:
>>
>>> And anyways, I get unaligned accesses without Roopa's changes :-/
>>>
>>> davem@patience:~$ ip l l
>>> [3391066.656729] Kernel unaligned access at TPC[7d6c14] 
>>> loopback_get_stats64+0x74/0xa0
>>> [3391066.672020] Kernel unaligned access at TPC[7d6c18] 
>>> loopback_get_stats64+0x78/0xa0
>>> [3391066.687282] Kernel unaligned access at TPC[7d6c1c] 
>>> loopback_get_stats64+0x7c/0xa0
>>> [3391066.702573] Kernel unaligned access at TPC[7d6c20] 
>>> loopback_get_stats64+0x80/0xa0
>>> [3391066.717858] Kernel unaligned access at TPC[8609dc] 
>>> dev_get_stats+0x3c/0xe0
>> Yes, rtnl_fill_stats() probably has the same mistake.
>>
>> commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c
>> Author: Roopa Prabhu 
>> Date:   Fri Apr 15 20:36:25 2016 -0700
>>
>> rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy
>> 
>> This patch passes netlink attr data ptr directly to dev_get_stats
>> thus elimiating a stats copy.
>> 
>> Suggested-by: David Miller 
>> Signed-off-by: Roopa Prabhu 
>> Signed-off-by: David S. Miller 
>>
>>
>>
> David, if you revert the one in rtnl_fill_stats, i will take care of the 
> dev_get_stats in RTM_GETSTATS in v6.

Don't need to revert, see my idea with the IFLA_PAD attribute. :-)



Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread David Miller
From: Roopa Prabhu 
Date: Mon, 18 Apr 2016 14:10:19 -0700

> This patch adds a new RTM_GETSTATS message to query link stats via
> netlink from the kernel. RTM_NEWLINK also dumps stats today, but
> RTM_NEWLINK returns a lot more than just stats and is expensive in
> some cases when frequent polling for stats from userspace is a
> common operation.

I'm holding off on this until we sort out the 64-bit netlink
attribute alignment issue.

Meanwhile, I'll some kind of a fix into the tree for the
rtnl_fill_stats() change so that it doesn't cause unaligned
accesses.

I just tested out a clever idea, where for architectures where
unaligned accesses is a problem, we insert a zero length NOP attribute
before the 64-bit stats.  This makes it properly aligned.  A quick
hack patch just passed testing on my sparc64 box, but I'll go over it
some more.

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index bb3a90b..5ffdcb3 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -155,6 +155,7 @@ enum {
IFLA_PROTO_DOWN,
IFLA_GSO_MAX_SEGS,
IFLA_GSO_MAX_SIZE,
+   IFLA_PAD,
__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a7a3d34..b192576 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1052,6 +1052,15 @@ static noinline_for_stack int rtnl_fill_stats(struct 
sk_buff *skb,
struct rtnl_link_stats64 *sp;
struct nlattr *attr;
 
+   /* Add a zero length NOP attribute so that the nla_data()
+* of the IFLA_STATS64 will be 64-bit aligned.
+*/
+#ifndef HAVE_EFFICIENT_UNALIGNED_ACCESS
+   attr = nla_reserve(skb, IFLA_PAD, 0);
+   if (!attr)
+   return -EMSGSIZE;
+#endif
+
attr = nla_reserve(skb, IFLA_STATS64,
   sizeof(struct rtnl_link_stats64));
if (!attr)



Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 20:18 -0700, Martin KaFai Lau wrote:
> On Mon, Apr 18, 2016 at 07:50:41PM -0700, Eric Dumazet wrote:
> > I believe it is slightly wrong (to do the goto new_segment if there is
> > no data to send)
> Aha. Thanks for pointing it out.
> 
> >
> > I would instead use this fast path, doing the test _when_ we already
> > have an skb to test for.
> The v1 was doing a check in the loop but the feedback was, instead
> of doing this unlikely(test) repeatedly in the loop, do it before
> entering the loop and do a goto new_segment if needed.
> 
> I agree that doing it in the loop is easier to follow/read
> and checking TCP_SKB_CB(skb)->eor is cheaper than my v1.
> I will respin with your suggestion.

I do not remember the suggestion. It is better to do the test in the
loop, as you could have multiple threads doing a sendmsg() on the same
socket, and eventually sleeping in sk_stream_wait_memory()

A bit convoluted, but who knows what can be done by some applications ;)





Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Martin KaFai Lau
On Mon, Apr 18, 2016 at 07:50:41PM -0700, Eric Dumazet wrote:
> I believe it is slightly wrong (to do the goto new_segment if there is
> no data to send)
Aha. Thanks for pointing it out.

>
> I would instead use this fast path, doing the test _when_ we already
> have an skb to test for.
The v1 was doing a check in the loop but the feedback was, instead
of doing this unlikely(test) repeatedly in the loop, do it before
entering the loop and do a goto new_segment if needed.

I agree that doing it in the loop is easier to follow/read
and checking TCP_SKB_CB(skb)->eor is cheaper than my v1.
I will respin with your suggestion.

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 6451b83d81e9..acfbff81ef47 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1171,6 +1171,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, 
> u32 len,
>   TCP_SKB_CB(skb)->tcp_flags = flags & ~(TCPHDR_FIN | TCPHDR_PSH);
>   TCP_SKB_CB(buff)->tcp_flags = flags;
>   TCP_SKB_CB(buff)->sacked = TCP_SKB_CB(skb)->sacked;
> + TCP_SKB_CB(buff)->eor = TCP_SKB_CB(skb)->eor;
> + TCP_SKB_CB(skb)->eor = 0;
>
>   if (!skb_shinfo(skb)->nr_frags && skb->ip_summed != CHECKSUM_PARTIAL) {
>   /* Copy and checksum data tail into the new buffer. */
> @@ -1730,6 +1732,8 @@ static int tso_fragment(struct sock *sk, struct sk_buff 
> *skb, unsigned int len,
>
>   /* This packet was never sent out yet, so no SACK bits. */
>   TCP_SKB_CB(buff)->sacked = 0;
> + TCP_SKB_CB(buff)->eor = TCP_SKB_CB(skb)->eor;
> + TCP_SKB_CB(skb)->eor = 0;
>
>   buff->ip_summed = skb->ip_summed = CHECKSUM_PARTIAL;
>   skb_split(skb, buff, len);
> @@ -2471,6 +2475,7 @@ static void tcp_collapse_retrans(struct sock *sk, 
> struct sk_buff *skb)
>
>   /* Merge over control information. This moves PSH/FIN etc. over */
>   TCP_SKB_CB(skb)->tcp_flags |= TCP_SKB_CB(next_skb)->tcp_flags;
> + TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor;
>
>   /* All done, get rid of second SKB and account for it so
>* packet counting does not break.
> @@ -2502,7 +2507,8 @@ static bool tcp_can_collapse(const struct sock *sk, 
> const struct sk_buff *skb)
>   /* Some heurestics for collapsing over SACK'd could be invented */
>   if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
>   return false;
> -
> + if (TCP_SKB_CB(skb)->eor)
> + return false;
>   return true;
>  }
Thanks for this diff.  It confirms that I probably understand your last
suggestion correctly.  I also have similar diff for the sacks handling.


[PATCH net-next] perf, bpf: minimize the size of perf_trace_() tracepoint handler

2016-04-18 Thread Alexei Starovoitov
move trace_call_bpf() into helper function to minimize the size
of perf_trace_*() tracepoint handlers.
text   data bss dechex  filename
105416795526646 2945024 190133491221ee5 vmlinux_before
105094225526646 2945024 18981092121a0e4 vmlinux_after

It may seem that perf_fetch_caller_regs() can also be moved,
but that is incorrect, since ip/sp will be wrong.

bpf+tracepoint performance is not affected, since
perf_swevent_put_recursion_context() is now inlined.
export_symbol_gpl can also be dropped.

No measurable change in normal perf tracepoints.

Suggested-by: Steven Rostedt 
Signed-off-by: Alexei Starovoitov 
---
 include/linux/trace_events.h |  5 +
 include/trace/perf.h | 13 +++--
 kernel/events/core.c | 20 +++-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index fe6441203b59..222f6aa0418f 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -609,6 +609,11 @@ extern void ftrace_profile_free_filter(struct perf_event 
*event);
 void perf_trace_buf_update(void *record, u16 type);
 void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
+void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
+  struct trace_event_call *call, u64 count,
+  struct pt_regs *regs, struct hlist_head *head,
+  struct task_struct *task);
+
 static inline void
 perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
   u64 count, struct pt_regs *regs, void *head,
diff --git a/include/trace/perf.h b/include/trace/perf.h
index a182306eefd7..88de5c205e86 100644
--- a/include/trace/perf.h
+++ b/include/trace/perf.h
@@ -64,16 +64,9 @@ perf_trace_##call(void *__data, proto)   
\
\
{ assign; } \
\
-   if (prog) { \
-   *(struct pt_regs **)entry = __regs; \
-   if (!trace_call_bpf(prog, entry) || hlist_empty(head)) { \
-   perf_swevent_put_recursion_context(rctx);   \
-   return; \
-   }   \
-   }   \
-   perf_trace_buf_submit(entry, __entry_size, rctx,\
- event_call->event.type, __count, __regs,  \
- head, __task);\
+   perf_trace_run_bpf_submit(entry, __entry_size, rctx,\
+ event_call, __count, __regs,  \
+ head, __task);\
 }
 
 /*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5056abffef27..9eb23dc27462 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6741,7 +6741,6 @@ void perf_swevent_put_recursion_context(int rctx)
 
put_recursion_context(swhash->recursion, rctx);
 }
-EXPORT_SYMBOL_GPL(perf_swevent_put_recursion_context);
 
 void ___perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 {
@@ -6998,6 +6997,25 @@ static int perf_tp_event_match(struct perf_event *event,
return 1;
 }
 
+void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
+  struct trace_event_call *call, u64 count,
+  struct pt_regs *regs, struct hlist_head *head,
+  struct task_struct *task)
+{
+   struct bpf_prog *prog = call->prog;
+
+   if (prog) {
+   *(struct pt_regs **)raw_data = regs;
+   if (!trace_call_bpf(prog, raw_data) || hlist_empty(head)) {
+   perf_swevent_put_recursion_context(rctx);
+   return;
+   }
+   }
+   perf_tp_event(call->event.type, count, raw_data, size, regs, head,
+ rctx, task);
+}
+EXPORT_SYMBOL_GPL(perf_trace_run_bpf_submit);
+
 void perf_tp_event(u16 event_type, u64 count, void *record, int entry_size,
   struct pt_regs *regs, struct hlist_head *head, int rctx,
   struct task_struct *task)
-- 
2.8.0



Re: [PATCH net-next 2/8] perf, bpf: allow bpf programs attach to tracepoints

2016-04-18 Thread Steven Rostedt
On Mon, 18 Apr 2016 18:15:04 -0700
Alexei Starovoitov  wrote:

> On 4/18/16 3:16 PM, Steven Rostedt wrote:

> Yes. That what I referred to in below 'a struct to pass args'...
> But, fine, will try to optimize the size further.
> Frankly much bigger .text savings will come from combining
> trace_event_raw_event_*() with perf_trace_*()
> Especially if you're ok with copying tp args into perf's percpu
> buffer first and then copying into ftrace's ring buffer.
> Then we can half the number of such auto-generated functions.

I'm only fine with that when we filter. Otherwise we just lost all the
benefits of zero copy in the first place.

> 
> >> Passing more args or creating a struct to pass args only going to
> >> hurt performance without much reduction in .text size.
> >> tinyfication folks will disable tracepoints anyway.
> >> Note that the most common case is bpf returning 0 and not even
> >> calling perf_trace_buf_submit() which is already slow due
> >> to so many args passed on stack.
> >> This stuff is called million times a second, so every instruction
> >> counts.  
> >
> > Note, that doesn't matter if you are bloating the kernel for the 99.9%
> > of those that don't use bpf.
> >
> > Please remember this! Us tracing folks are second class citizens! If
> > there's a way to speed up tracing by 10%, but in doing so we cause
> > mainline to be hurt by over 1%, we shouldn't be doing it. Tracing and
> > hooks on tracepoints are really not used by many people. Don't fall
> > into Linus's category of "my code is the most important". That's
> > especially true for tracing.  
> 
> tracing was indeed not used that often in the past, but
> bpf+tracing completely changed the picture. It's no longer just
> debugging. It is the first class citizen that runs 24/7 in production
> and its performance and lowest overhead are crucial.

Still, 99.9% of users don't use it.

-- Steve



Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 19:27 -0700, Martin KaFai Lau wrote:
> On Mon, Apr 18, 2016 at 05:06:57PM -0700, Eric Dumazet wrote:
> > It should only be a request from user space to ask TCP to not aggregate
> > stuff on future sendpage()/sendmsg() on the skb carrying this new flag.
> >
> How about something like this.  Please advise if tcp_sendmsg_noappend can
> be simpler.
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index c0ef054..ac31798 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -762,7 +762,8 @@ struct tcp_skb_cb {
> 
>   __u8ip_dsfield; /* IPv4 tos or IPv6 dsfield */
>   __u8txstamp_ack:1,  /* Record TX timestamp for ack? */
> - unused:7;
> + eor:1,  /* Is skb MSG_EOR marked */
> + unused:6;
>   __u32   ack_seq;/* Sequence number ACK'd*/
>   union {
>   struct inet_skb_parmh4;
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 4d73858..12772be 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -874,6 +874,13 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, 
> int flags)
>   return mss_now;
>  }
> 
> +static bool tcp_sendmsg_noappend(const struct sock *sk)
> +{
> + const struct sk_buff *skb = tcp_write_queue_tail(sk);
> +
> + return (!skb || TCP_SKB_CB(skb)->eor);
> +}
> +
>  static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int 
> offset,
>   size_t size, int flags)
>  {
> @@ -903,6 +910,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
> page *page, int offset,
>   if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
>   goto out_err;
> 
> + if (tcp_sendmsg_noappend(sk))
> + goto new_segment;
> +
>   while (size > 0) {
>   struct sk_buff *skb = tcp_write_queue_tail(sk);
>   int copy, i;
> @@ -960,6 +970,7 @@ new_segment:
>   size -= copy;
>   if (!size) {
>   tcp_tx_timestamp(sk, sk->sk_tsflags, skb);
> + TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR);
>   goto out;
>   }
> 
> @@ -1145,6 +1156,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
> size_t size)
> 
>   sg = !!(sk->sk_route_caps & NETIF_F_SG);
> 
> + if (tcp_sendmsg_noappend(sk))
> + goto new_segment;
> +
>   while (msg_data_left(msg)) {
>   int copy = 0;
>   int max = size_goal;
> @@ -1250,6 +1264,7 @@ new_segment:
>   copied += copy;
>   if (!msg_data_left(msg)) {
>   tcp_tx_timestamp(sk, sockc.tsflags, skb);
> + TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR);
>   goto out;
>   }

I believe it is slightly wrong (to do the goto new_segment if there is
no data to send)

I would instead use this fast path, doing the test _when_ we already
have an skb to test for.

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fd40f8c64d5f..015851634195 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -760,7 +760,8 @@ struct tcp_skb_cb {
 
__u8ip_dsfield; /* IPv4 tos or IPv6 dsfield */
__u8txstamp_ack:1,  /* Record TX timestamp for ack? */
-   unused:7;
+   eor:1,  /* Is skb MSG_EOR marked */
+   unused:6;
__u32   ack_seq;/* Sequence number ACK'd*/
union {
struct inet_skb_parmh4;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4d73858991af..1944c19885ab 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -908,7 +908,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
int copy, i;
bool can_coalesce;
 
-   if (!tcp_send_head(sk) || (copy = size_goal - skb->len) <= 0) {
+   if (!tcp_send_head(sk) ||
+   (copy = size_goal - skb->len) <= 0 ||
+   TCP_SKB_CB(skb)->eor) {
 new_segment:
if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf;
@@ -960,6 +962,7 @@ new_segment:
size -= copy;
if (!size) {
tcp_tx_timestamp(sk, sk->sk_tsflags, skb);
+   TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR);
goto out;
}
 
@@ -1156,7 +1159,7 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
copy = max - skb->len;
}
 
-   if (copy <= 0) {
+   if (copy <= 0 || TCP_SKB_CB(skb)->eor) {
 new_segment:
/* Allocate new segment. If the interface is SG,
 * allocate skb fitting to single page.
@@ -1250,6 +1253,7 @@ 

Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Roopa Prabhu
On 4/18/16, 7:22 PM, Eric Dumazet wrote:
> On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote:
>
>> And anyways, I get unaligned accesses without Roopa's changes :-/
>>
>> davem@patience:~$ ip l l
>> [3391066.656729] Kernel unaligned access at TPC[7d6c14] 
>> loopback_get_stats64+0x74/0xa0
>> [3391066.672020] Kernel unaligned access at TPC[7d6c18] 
>> loopback_get_stats64+0x78/0xa0
>> [3391066.687282] Kernel unaligned access at TPC[7d6c1c] 
>> loopback_get_stats64+0x7c/0xa0
>> [3391066.702573] Kernel unaligned access at TPC[7d6c20] 
>> loopback_get_stats64+0x80/0xa0
>> [3391066.717858] Kernel unaligned access at TPC[8609dc] 
>> dev_get_stats+0x3c/0xe0
> Yes, rtnl_fill_stats() probably has the same mistake.
>
> commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c
> Author: Roopa Prabhu 
> Date:   Fri Apr 15 20:36:25 2016 -0700
>
> rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy
> 
> This patch passes netlink attr data ptr directly to dev_get_stats
> thus elimiating a stats copy.
> 
> Suggested-by: David Miller 
> Signed-off-by: Roopa Prabhu 
> Signed-off-by: David S. Miller 
>
>
>
David, if you revert the one in rtnl_fill_stats, i will take care of the 
dev_get_stats in RTM_GETSTATS in v6.

thanks,
Roopa




Re: [PATCH] bpf: avoid warning for wrong pointer cast

2016-04-18 Thread Fengguang Wu
Hi Alexei,

On Sat, Apr 16, 2016 at 05:47:42PM -0700, Alexei Starovoitov wrote:
> On Sat, Apr 16, 2016 at 10:29:33PM +0200, Arnd Bergmann wrote:
> > Two new functions in bpf contain a cast from a 'u64' to a
> > pointer. This works on 64-bit architectures but causes a warning
> > on all 32-bit architectures:
> > 
> > kernel/trace/bpf_trace.c: In function 'bpf_perf_event_output_tp':
> > kernel/trace/bpf_trace.c:350:13: error: cast to pointer from integer of 
> > different size [-Werror=int-to-pointer-cast]
> >   u64 ctx = *(long *)r1;
> > 
> > This changes the cast to first convert the u64 argument into a uintptr_t,
> > which is guaranteed to be the same size as a pointer.
> > 
> > Signed-off-by: Arnd Bergmann 
> > Fixes: 9940d67c93b5 ("bpf: support bpf_get_stackid() and 
> > bpf_perf_event_output() in tracepoint programs")
> 
> Thanks.
> Acked-by: Alexei Starovoitov 
> 
> I guess I started to rely on 0-day build-bot too much.
> This patch has been in my tree for 2+ weeks and then in net-next and
> I didn't receive a single email from build-bot about this warning,
> though I do receive them for my other work-in-progress stuff. Odd.
> Fengguang, any idea why build-bot sometimes silent?

Sorry I went off for some time.. Philip, would you help have a check?

Thanks,
Fengguang


Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread roopa
On 4/18/16, 5:57 PM, David Miller wrote:
> From: Eric Dumazet 
> Date: Mon, 18 Apr 2016 14:35:56 -0700
>
>> On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote:
>>
>>> +   if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
>>> +   struct rtnl_link_stats64 *sp;
>>> +
>>> +   attr = nla_reserve(skb, IFLA_STATS_LINK_64,
>>> +  sizeof(struct rtnl_link_stats64));
>>> +   if (!attr)
>>> +   goto nla_put_failure;
>>> +
>>> +   sp = nla_data(attr);
>> Are you sure we have a guarantee that sp is aligned to u64 fields ?
>>
>> x86 does not care, but some arches would have a potential misalign
>> access here.
> I'll do some testing on sparc and deal with any fallout.
Thanks David. I will be happy to respin if you see problems.


Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Martin KaFai Lau
On Mon, Apr 18, 2016 at 05:06:57PM -0700, Eric Dumazet wrote:
> It should only be a request from user space to ask TCP to not aggregate
> stuff on future sendpage()/sendmsg() on the skb carrying this new flag.
>
How about something like this.  Please advise if tcp_sendmsg_noappend can
be simpler.

diff --git a/include/net/tcp.h b/include/net/tcp.h
index c0ef054..ac31798 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -762,7 +762,8 @@ struct tcp_skb_cb {

__u8ip_dsfield; /* IPv4 tos or IPv6 dsfield */
__u8txstamp_ack:1,  /* Record TX timestamp for ack? */
-   unused:7;
+   eor:1,  /* Is skb MSG_EOR marked */
+   unused:6;
__u32   ack_seq;/* Sequence number ACK'd*/
union {
struct inet_skb_parmh4;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4d73858..12772be 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -874,6 +874,13 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, 
int flags)
return mss_now;
 }

+static bool tcp_sendmsg_noappend(const struct sock *sk)
+{
+   const struct sk_buff *skb = tcp_write_queue_tail(sk);
+
+   return (!skb || TCP_SKB_CB(skb)->eor);
+}
+
 static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
size_t size, int flags)
 {
@@ -903,6 +910,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
goto out_err;

+   if (tcp_sendmsg_noappend(sk))
+   goto new_segment;
+
while (size > 0) {
struct sk_buff *skb = tcp_write_queue_tail(sk);
int copy, i;
@@ -960,6 +970,7 @@ new_segment:
size -= copy;
if (!size) {
tcp_tx_timestamp(sk, sk->sk_tsflags, skb);
+   TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR);
goto out;
}

@@ -1145,6 +1156,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)

sg = !!(sk->sk_route_caps & NETIF_F_SG);

+   if (tcp_sendmsg_noappend(sk))
+   goto new_segment;
+
while (msg_data_left(msg)) {
int copy = 0;
int max = size_goal;
@@ -1250,6 +1264,7 @@ new_segment:
copied += copy;
if (!msg_data_left(msg)) {
tcp_tx_timestamp(sk, sockc.tsflags, skb);
+   TCP_SKB_CB(skb)->eor = !!(flags & MSG_EOR);
goto out;
}

--
2.5.1


Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 21:48 -0400, David Miller wrote:

> 
> And anyways, I get unaligned accesses without Roopa's changes :-/
> 
> davem@patience:~$ ip l l
> [3391066.656729] Kernel unaligned access at TPC[7d6c14] 
> loopback_get_stats64+0x74/0xa0
> [3391066.672020] Kernel unaligned access at TPC[7d6c18] 
> loopback_get_stats64+0x78/0xa0
> [3391066.687282] Kernel unaligned access at TPC[7d6c1c] 
> loopback_get_stats64+0x7c/0xa0
> [3391066.702573] Kernel unaligned access at TPC[7d6c20] 
> loopback_get_stats64+0x80/0xa0
> [3391066.717858] Kernel unaligned access at TPC[8609dc] 
> dev_get_stats+0x3c/0xe0

Yes, rtnl_fill_stats() probably has the same mistake.

commit 550bce59baf3f3059cd4ae1e268f08f2d2cb1d5c
Author: Roopa Prabhu 
Date:   Fri Apr 15 20:36:25 2016 -0700

rtnetlink: rtnl_fill_stats: avoid an unnecssary stats copy

This patch passes netlink attr data ptr directly to dev_get_stats
thus elimiating a stats copy.

Suggested-by: David Miller 
Signed-off-by: Roopa Prabhu 
Signed-off-by: David S. Miller 





Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread David Miller
From: David Miller 
Date: Mon, 18 Apr 2016 20:57:55 -0400 (EDT)

> From: Eric Dumazet 
> Date: Mon, 18 Apr 2016 14:35:56 -0700
> 
>> On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote:
>> 
>>> +   if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
>>> +   struct rtnl_link_stats64 *sp;
>>> +
>>> +   attr = nla_reserve(skb, IFLA_STATS_LINK_64,
>>> +  sizeof(struct rtnl_link_stats64));
>>> +   if (!attr)
>>> +   goto nla_put_failure;
>>> +
>>> +   sp = nla_data(attr);
>> 
>> Are you sure we have a guarantee that sp is aligned to u64 fields ?
>> 
>> x86 does not care, but some arches would have a potential misalign
>> access here.
> 
> I'll do some testing on sparc and deal with any fallout.

Just thinking out loud before I start testing, yeah I think you are
right.  nlmsghdr is 64-bit aligned, but the nlattr header is 32-bit
which will thus make the attribute data area not be aligned.

I think the time has probably come to have a new netlink attribute
format that doesn't have this multi-decade old problem.

There are a lot of bits left in nla_type, we can use one to indicate
that the nlattr struct is another 4 bytes in length in order to
archieve proper alignment of the payload data.

+struct nlattr64 {
+   __u16   nla_len;
+   __u16   nla_type;
+   __u32   nla_pad;
+};
 ...
+#define NLA_F_64BIT_ALIGNED(1 << 13)
-#define NLA_TYPE_MASK  ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER)
+#define NLA_TYPE_MASK  ~(NLA_F_NESTED | NLA_F_NET_BYTEORDER |
  NLA_F_64BIT_ALIGNED)
 ...
#define NLA64_ALIGNTO   8
#define NLA64_ALIGN(len)(((len) + NLA64_ALIGNTO - 1) & ~(NLA64_ALIGNTO 
- 1))
#define NLA64_HDRLEN((int) NLA64_ALIGN(sizeof(struct nlattr64)))

We're going to need some new nla64_*() helpers and code added to some
of the existing ones to test that new bit.

For example, nla_data would now be:

static inline void *nla_data(const struct nlattr *nla)
{
if (nla->nla_type & NLA_F_64BIT_ALIGNED)
return (char *) nla + NLA64_HDRLEN;
else
return (char *) nla + NLA_HDRLEN;
}

nla_len would be:

static inline int nla_len(const struct nlattr *nla)
{
int hdrlen = NLA_HDRLEN;

if (nla->nla_type & NLA_F_64BIT_ALIGNED)
hdrlen = NLA64_hdrlen;
return nla->nla_len - hdrlen;
}

etc. etc.

And anyways, I get unaligned accesses without Roopa's changes :-/

davem@patience:~$ ip l l
[3391066.656729] Kernel unaligned access at TPC[7d6c14] 
loopback_get_stats64+0x74/0xa0
[3391066.672020] Kernel unaligned access at TPC[7d6c18] 
loopback_get_stats64+0x78/0xa0
[3391066.687282] Kernel unaligned access at TPC[7d6c1c] 
loopback_get_stats64+0x7c/0xa0
[3391066.702573] Kernel unaligned access at TPC[7d6c20] 
loopback_get_stats64+0x80/0xa0
[3391066.717858] Kernel unaligned access at TPC[8609dc] dev_get_stats+0x3c/0xe0


Re: [PATCH net-next 2/8] perf, bpf: allow bpf programs attach to tracepoints

2016-04-18 Thread Alexei Starovoitov

On 4/18/16 3:16 PM, Steven Rostedt wrote:

On Mon, 18 Apr 2016 14:43:07 -0700
Alexei Starovoitov  wrote:



I was worried about this too, but single 'if' and two calls
(as in commit 98b5c2c65c295) is a better way, since it's faster, cleaner
and doesn't need to refactor the whole perf_trace_buf_submit() to pass
extra event_call argument to it.
perf_trace_buf_submit() is already ugly with 8 arguments!


Right, but I solved that in ftrace by creating an on-stack descriptor
that can be passed by a single parameter. See the "fbuffer" in the
trace_event_raw_event* code.


Yes. That what I referred to in below 'a struct to pass args'...
But, fine, will try to optimize the size further.
Frankly much bigger .text savings will come from combining
trace_event_raw_event_*() with perf_trace_*()
Especially if you're ok with copying tp args into perf's percpu
buffer first and then copying into ftrace's ring buffer.
Then we can half the number of such auto-generated functions.


Passing more args or creating a struct to pass args only going to
hurt performance without much reduction in .text size.
tinyfication folks will disable tracepoints anyway.
Note that the most common case is bpf returning 0 and not even
calling perf_trace_buf_submit() which is already slow due
to so many args passed on stack.
This stuff is called million times a second, so every instruction
counts.


Note, that doesn't matter if you are bloating the kernel for the 99.9%
of those that don't use bpf.

Please remember this! Us tracing folks are second class citizens! If
there's a way to speed up tracing by 10%, but in doing so we cause
mainline to be hurt by over 1%, we shouldn't be doing it. Tracing and
hooks on tracepoints are really not used by many people. Don't fall
into Linus's category of "my code is the most important". That's
especially true for tracing.


tracing was indeed not used that often in the past, but
bpf+tracing completely changed the picture. It's no longer just
debugging. It is the first class citizen that runs 24/7 in production
and its performance and lowest overhead are crucial.



Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread David Miller
From: Eric Dumazet 
Date: Mon, 18 Apr 2016 14:35:56 -0700

> On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote:
> 
>> +if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
>> +struct rtnl_link_stats64 *sp;
>> +
>> +attr = nla_reserve(skb, IFLA_STATS_LINK_64,
>> +   sizeof(struct rtnl_link_stats64));
>> +if (!attr)
>> +goto nla_put_failure;
>> +
>> +sp = nla_data(attr);
> 
> Are you sure we have a guarantee that sp is aligned to u64 fields ?
> 
> x86 does not care, but some arches would have a potential misalign
> access here.

I'll do some testing on sparc and deal with any fallout.


Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 16:43 -0700, ka...@fb.com wrote:

> >
> > netperf could then get an option to set this MSG_EOR ;)
> Not sure how it is related.  Can you share how netperf can
> benefit from MSG_EOR in TCP tests without any of the
> SOF_TIMESTAMPING_TX_RECORD_MASK.

Simply setting MSG_EOR would be orthogonal to other timestamping stuff.

Maybe the application does not _want_ to be notified when skb is sent or
acknowledged, but would like some kind of "tcpdump awareness" or
something about burst control, who knows...

It should only be a request from user space to ask TCP to not aggregate
stuff on future sendpage()/sendmsg() on the skb carrying this new flag.

We already have other flags to ask for timestamping stuff, and they
could be used at the same time.

If the stack needs to be changed to properly fragment skb (or
aggregating them at retransmit), this is a separate concern.

Note that you do not need to automatically assert MSG_BOR (Begin of
Request) : MSG_EOR should really control the fact that last byte sent
marks the skb as being a non candidate for aggregation.

This would keep tcp_sendmsg() reasonnably fast.
Your tcp_sendmsg_noappend() is quite expensive :(




Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread kafai
On Mon, Apr 18, 2016 at 04:18:13PM -0700, Eric Dumazet wrote:
> On Mon, 2016-04-18 at 15:46 -0700, Martin KaFai Lau wrote:
> > This patch allows the user process to use MSG_EOR during
> > tcp_sendmsg to tell the kernel that it is the last byte
> > of an application response message.
> >
> > It is currently useful when the end-user has turned on any bit of the
> > SOF_TIMESTAMPING_TX_RECORD_MASK (either by setsockopt or cmsg).
> > The kernel will then mark the newly added tcb->eor_info bit so
> > that the shinfo->tskey will not be overwritten (i.e. lost) in
> > the later skb append/collapse operation.
> >
> > With selective SOF_TIMESTAMPING_TX_ACK (by cmsg) and MSG_EOR (this
> > patch), the user application can specially tell which outgoing byte
> > it wants to track its ACK and ask the kernel not to lose this
> > tracking info in the later skb append/collapse action.
> >
> > This patch handles the append case in tcp_sendmsg.  The later
> > patches will handle the collapse during retransmission and
> > skb slicing in tcp_fragment()/tso_fragment().
> >
> > One of our use case is at the webserver.  The webserver tracks
> > the HTTP2 response latency by measuring when the webserver sends
> > the first byte to the socket till the TCP ACK of the last byte
> > is received.  In the cases where we don't have client side
> > measurement, measuring from the server side is the only option.
> > In the cases we have the client side measurement, the server side
> > data can also be used to justify/cross-check-with the client
> > side data.
> >
> > Signed-off-by: Martin KaFai Lau 
> > Cc: Eric Dumazet 
> > Cc: Neal Cardwell 
> > Cc: Soheil Hassas Yeganeh 
> > Cc: Willem de Bruijn 
> > Cc: Yuchung Cheng 
> > ---
>
> MSG_EOR should not depend on SKBTX_ANY_TSTAMP
>
> Really, simply using send(fd, ..., len, MSG_EOR) should instruct TCP to
> mark the cooked skb as a non candidate for future coalescing.
It was one of my earlier local attempt.  There are cases that coalescing
will not lose the tskey, so I trashed it.

If we mark eor only based on MSG_EOR, we can still do checks on
prev_skb's tskey and next_skb's tskey before coalescing two skbs
or
you meant simply don't coalesce if the prev_skb has eor marked?

>
> netperf could then get an option to set this MSG_EOR ;)
Not sure how it is related.  Can you share how netperf can
benefit from MSG_EOR in TCP tests without any of the
SOF_TIMESTAMPING_TX_RECORD_MASK.


Re: [RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 15:46 -0700, Martin KaFai Lau wrote:
> This patch allows the user process to use MSG_EOR during
> tcp_sendmsg to tell the kernel that it is the last byte
> of an application response message.
> 
> It is currently useful when the end-user has turned on any bit of the
> SOF_TIMESTAMPING_TX_RECORD_MASK (either by setsockopt or cmsg).
> The kernel will then mark the newly added tcb->eor_info bit so
> that the shinfo->tskey will not be overwritten (i.e. lost) in
> the later skb append/collapse operation.
> 
> With selective SOF_TIMESTAMPING_TX_ACK (by cmsg) and MSG_EOR (this
> patch), the user application can specially tell which outgoing byte
> it wants to track its ACK and ask the kernel not to lose this
> tracking info in the later skb append/collapse action.
> 
> This patch handles the append case in tcp_sendmsg.  The later
> patches will handle the collapse during retransmission and
> skb slicing in tcp_fragment()/tso_fragment().
> 
> One of our use case is at the webserver.  The webserver tracks
> the HTTP2 response latency by measuring when the webserver sends
> the first byte to the socket till the TCP ACK of the last byte
> is received.  In the cases where we don't have client side
> measurement, measuring from the server side is the only option.
> In the cases we have the client side measurement, the server side
> data can also be used to justify/cross-check-with the client
> side data.
> 
> Signed-off-by: Martin KaFai Lau 
> Cc: Eric Dumazet 
> Cc: Neal Cardwell 
> Cc: Soheil Hassas Yeganeh 
> Cc: Willem de Bruijn 
> Cc: Yuchung Cheng 
> ---

MSG_EOR should not depend on SKBTX_ANY_TSTAMP

Really, simply using send(fd, ..., len, MSG_EOR) should instruct TCP to
mark the cooked skb as a non candidate for future coalescing.

netperf could then get an option to set this MSG_EOR ;)

I believe Soheil was working on such simple alternative ?




[RFC PATCH v2 net-next 2/7] tcp: Merge tx_flags/tskey/txstamp_ack in tcp_collapse_retrans

2016-04-18 Thread Martin KaFai Lau
If two skbs are merged/collapsed during retransmission, the current
logic does not merge the tx_flags, tskey and txstamp_ack.  The end
result is the SCM_TSTAMP_ACK timestamp could be missing for a
packet that the end-user has specifically turned on
SOF_TIMESTAMPING_TX_ACK (e.g. by cmsg).

The patch:
1. Merge the tx_flags and txstamp_ack
2. Overwrite the tskey with the later skb (next_skb)

BPF Output Before:
~~


BPF Output After:
~~
packetdrill-2092  [001] d.s.   453.998486: : ee_data:1459

Packetdrill Script:
~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 
0.100 > S. 0:0(0) ack 1 
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 730) = 730
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0
0.200 write(4, ..., 11680) = 11680

0.200 > P. 1:731(730) ack 1
0.200 > P. 731:1461(730) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:13141(4380) ack 1

0.300 < . 1:1(0) ack 1 win 257 
0.300 < . 1:1(0) ack 1 win 257 
0.300 < . 1:1(0) ack 1 win 257 
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 13141 win 257

0.400 close(4) = 0
0.400 > F. 13141:13141(0) ack 1
0.500 < F. 1:1(0) ack 13142 win 257
0.500 > . 13142:13142(0) ack 2

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp_output.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0527ce9..889ed96 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2443,6 +2443,22 @@ u32 __tcp_select_window(struct sock *sk)
return window;
 }
 
+static void tcp_skb_collapse_tstamp(struct sk_buff *skb,
+   const struct sk_buff *next_skb)
+{
+   const struct skb_shared_info *next_shinfo = skb_shinfo(next_skb);
+
+   if (unlikely(next_shinfo->tx_flags & SKBTX_ANY_TSTAMP)) {
+   struct skb_shared_info *shinfo = skb_shinfo(skb);
+   u8 tsflags = next_shinfo->tx_flags & SKBTX_ANY_TSTAMP;
+
+   shinfo->tx_flags |= tsflags;
+   shinfo->tskey = next_shinfo->tskey;
+   TCP_SKB_CB(skb)->txstamp_ack =
+   !!(shinfo->tx_flags & SKBTX_ACK_TSTAMP);
+   }
+}
+
 /* Collapses two adjacent SKB's during retransmission. */
 static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
 {
@@ -2486,6 +2502,8 @@ static void tcp_collapse_retrans(struct sock *sk, struct 
sk_buff *skb)
 
tcp_adjust_pcount(sk, next_skb, tcp_skb_pcount(next_skb));
 
+   tcp_skb_collapse_tstamp(skb, next_skb);
+
sk_wmem_free_skb(sk, next_skb);
 }
 
-- 
2.5.1



[RFC PATCH v2 net-next 1/7] tcp: Carry txstamp_ack in tcp_fragment_tstamp

2016-04-18 Thread Martin KaFai Lau
When a tcp skb is sliced into two smaller skbs (e.g. in
tcp_fragment() and tso_fragment()),  it does not carry
the txstamp_ack bit to the newly created skb if it is needed.
The end result is a timestamping event (SCM_TSTAMP_ACK) will
be missing from the sk->sk_error_queue.

This patch carries this bit to the new skb2 (if needed)
in tcp_fragment_tstamp().

BPF Output Before:
~~


BPF Output After:
~~
<...>-2050  [000] d.s.   100.928763: : ee_data:14599

Packetdrill Script:
~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 
0.100 > S. 0:0(0) ack 1 
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 14600) = 14600
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0

0.200 > . 1:7301(7300) ack 1
0.200 > P. 7301:14601(7300) ack 1

0.300 < . 1:1(0) ack 14601 win 257

0.300 close(4) = 0
0.300 > F. 14601:14601(0) ack 1
0.400 < F. 1:1(0) ack 16062 win 257
0.400 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp_output.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6451b83..0527ce9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1123,6 +1123,8 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, 
struct sk_buff *skb2)
shinfo->tx_flags &= ~tsflags;
shinfo2->tx_flags |= tsflags;
swap(shinfo->tskey, shinfo2->tskey);
+   TCP_SKB_CB(skb2)->txstamp_ack = TCP_SKB_CB(skb)->txstamp_ack;
+   TCP_SKB_CB(skb)->txstamp_ack = 0;
}
 }
 
-- 
2.5.1



[RFC PATCH v2 net-next 6/7] tcp: Carry eor_info in tcp_fragment_tstamp() and tcp_skb_collapse_tstamp()

2016-04-18 Thread Martin KaFai Lau
Like txstamp_ack bit, if needed, the eor_info bit should also be carried
to the new skb2 when splitting a skb
or
to the prev skb from the next_skb when collapsing skbs.

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp_output.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d21a78f..e71336c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1125,6 +1125,8 @@ static void tcp_fragment_tstamp(struct sk_buff *skb, 
struct sk_buff *skb2)
swap(shinfo->tskey, shinfo2->tskey);
TCP_SKB_CB(skb2)->txstamp_ack = TCP_SKB_CB(skb)->txstamp_ack;
TCP_SKB_CB(skb)->txstamp_ack = 0;
+   TCP_SKB_CB(skb2)->eor_info = TCP_SKB_CB(skb)->eor_info;
+   TCP_SKB_CB(skb)->eor_info = 0;
}
 }
 
@@ -2456,6 +2458,8 @@ void tcp_skb_collapse_tstamp(struct sk_buff *skb,
shinfo->tskey = next_shinfo->tskey;
TCP_SKB_CB(skb)->txstamp_ack =
!!(shinfo->tx_flags & SKBTX_ACK_TSTAMP);
+   if (TCP_SKB_CB(next_skb)->eor_info)
+   TCP_SKB_CB(skb)->eor_info = 1;
}
 }
 
-- 
2.5.1



[RFC PATCH v2 net-next 5/7] tcp: Make use of MSG_EOR in tcp_sendpage

2016-04-18 Thread Martin KaFai Lau
It reuses the tcp_sendmsg_noappend() to decide if a new_segment
is needed before entering the loop.  More checks could be added
later for the tcp_sendpage case to decide if a new_segment is
needed immediately.

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2918f42..6bb33b8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -913,6 +913,9 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))
goto out_err;
 
+   if (tcp_sendmsg_noappend(sk, sk->sk_tsflags))
+   goto new_segment;
+
while (size > 0) {
struct sk_buff *skb = tcp_write_queue_tail(sk);
int copy, i;
@@ -969,7 +972,7 @@ new_segment:
offset += copy;
size -= copy;
if (!size) {
-   tcp_tx_timestamp(sk, sk->sk_tsflags, skb, 0);
+   tcp_tx_timestamp(sk, sk->sk_tsflags, skb, flags);
goto out;
}
 
-- 
2.5.1



[RFC PATCH v2 net-next 3/7] tcp: Merge tx_flags/tskey/txstamp_ack in tcp_shifted_skb

2016-04-18 Thread Martin KaFai Lau
After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible.  tx_flags/tskey/txstamp_ack also has
to be merged in this case.

This patch resues the tcp_skb_collapse_tstamp() to handle
them.

BPF Output Before:
~


BPF Output After:
~
<...>-2024  [007] d.s.88.644374: : ee_data:14599

Packetdrill Script:
~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 
0.100 > S. 0:0(0) ack 1 
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140
+0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0

0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1

0.300 < . 1:1(0) ack 1 win 257 
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 include/net/tcp.h | 2 ++
 net/ipv4/tcp_input.c  | 1 +
 net/ipv4/tcp_output.c | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fd40f8c..c0ef054 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -557,6 +557,8 @@ void tcp_send_ack(struct sock *sk);
 void tcp_send_delayed_ack(struct sock *sk);
 void tcp_send_loss_probe(struct sock *sk);
 bool tcp_schedule_loss_probe(struct sock *sk);
+void tcp_skb_collapse_tstamp(struct sk_buff *skb,
+const struct sk_buff *next_skb);
 
 /* tcp_input.c */
 void tcp_resume_early_retransmit(struct sock *sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5e45a9c..75e8336 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1309,6 +1309,7 @@ static bool tcp_shifted_skb(struct sock *sk, struct 
sk_buff *skb,
if (skb == tcp_highest_sack(sk))
tcp_advance_highest_sack(sk, skb);
 
+   tcp_skb_collapse_tstamp(prev, skb);
tcp_unlink_write_queue(skb, sk);
sk_wmem_free_skb(sk, skb);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 889ed96..d21a78f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2443,8 +2443,8 @@ u32 __tcp_select_window(struct sock *sk)
return window;
 }
 
-static void tcp_skb_collapse_tstamp(struct sk_buff *skb,
-   const struct sk_buff *next_skb)
+void tcp_skb_collapse_tstamp(struct sk_buff *skb,
+const struct sk_buff *next_skb)
 {
const struct skb_shared_info *next_shinfo = skb_shinfo(next_skb);
 
-- 
2.5.1



[RFC PATCH v2 net-next 4/7] tcp: Make use of MSG_EOR flag in tcp_sendmsg

2016-04-18 Thread Martin KaFai Lau
This patch allows the user process to use MSG_EOR during
tcp_sendmsg to tell the kernel that it is the last byte
of an application response message.

It is currently useful when the end-user has turned on any bit of the
SOF_TIMESTAMPING_TX_RECORD_MASK (either by setsockopt or cmsg).
The kernel will then mark the newly added tcb->eor_info bit so
that the shinfo->tskey will not be overwritten (i.e. lost) in
the later skb append/collapse operation.

With selective SOF_TIMESTAMPING_TX_ACK (by cmsg) and MSG_EOR (this
patch), the user application can specially tell which outgoing byte
it wants to track its ACK and ask the kernel not to lose this
tracking info in the later skb append/collapse action.

This patch handles the append case in tcp_sendmsg.  The later
patches will handle the collapse during retransmission and
skb slicing in tcp_fragment()/tso_fragment().

One of our use case is at the webserver.  The webserver tracks
the HTTP2 response latency by measuring when the webserver sends
the first byte to the socket till the TCP ACK of the last byte
is received.  In the cases where we don't have client side
measurement, measuring from the server side is the only option.
In the cases we have the client side measurement, the server side
data can also be used to justify/cross-check-with the client
side data.

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 include/net/tcp.h |  5 -
 net/ipv4/tcp.c| 21 +
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index c0ef054..f3c5dcb 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -762,7 +762,10 @@ struct tcp_skb_cb {
 
__u8ip_dsfield; /* IPv4 tos or IPv6 dsfield */
__u8txstamp_ack:1,  /* Record TX timestamp for ack? */
-   unused:7;
+   eor_info:1, /* Any EOR marked info that prevents
+* skbs from merging.
+*/
+   unused:6;
__u32   ack_seq;/* Sequence number ACK'd*/
union {
struct inet_skb_parmh4;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4d73858..2918f42 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -428,15 +428,18 @@ void tcp_init_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(tcp_init_sock);
 
-static void tcp_tx_timestamp(struct sock *sk, u16 tsflags, struct sk_buff *skb)
+static void tcp_tx_timestamp(struct sock *sk, u16 tsflags, struct sk_buff *skb,
+int flags)
 {
if (sk->sk_tsflags || tsflags) {
struct skb_shared_info *shinfo = skb_shinfo(skb);
struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
 
sock_tx_timestamp(sk, tsflags, >tx_flags);
-   if (shinfo->tx_flags & SKBTX_ANY_TSTAMP)
+   if (shinfo->tx_flags & SKBTX_ANY_TSTAMP) {
shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
+   tcb->eor_info = !!(flags & MSG_EOR);
+   }
tcb->txstamp_ack = !!(shinfo->tx_flags & SKBTX_ACK_TSTAMP);
}
 }
@@ -874,6 +877,13 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, 
int flags)
return mss_now;
 }
 
+static bool tcp_sendmsg_noappend(struct sock *sk, u16 tx_tsflags)
+{
+   return unlikely((tx_tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) &&
+   tcp_send_head(sk) &&
+   TCP_SKB_CB(tcp_write_queue_tail(sk))->eor_info);
+}
+
 static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
size_t size, int flags)
 {
@@ -959,7 +969,7 @@ new_segment:
offset += copy;
size -= copy;
if (!size) {
-   tcp_tx_timestamp(sk, sk->sk_tsflags, skb);
+   tcp_tx_timestamp(sk, sk->sk_tsflags, skb, 0);
goto out;
}
 
@@ -1145,6 +1155,9 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 
sg = !!(sk->sk_route_caps & NETIF_F_SG);
 
+   if (tcp_sendmsg_noappend(sk, sockc.tsflags))
+   goto new_segment;
+
while (msg_data_left(msg)) {
int copy = 0;
int max = size_goal;
@@ -1249,7 +1262,7 @@ new_segment:
 
copied += copy;
if (!msg_data_left(msg)) {
-   tcp_tx_timestamp(sk, sockc.tsflags, skb);
+   tcp_tx_timestamp(sk, sockc.tsflags, skb, flags);
goto out;
}
 
-- 
2.5.1



[RFC PATCH v2 net-next 7/7] tcp: Avoid losing eor_info when collapsing skbs

2016-04-18 Thread Martin KaFai Lau
When collapsing skbs during tcp_collapse_retrans and tcp_shift_skb_data,
this patch is to avoid collapsing to a prev skb that has eor_info mark
and the next_skb also has the tskey set (i.e. the prev skb will lose
a eor marked tskey info).

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 include/net/tcp.h | 6 ++
 net/ipv4/tcp_input.c  | 3 +++
 net/ipv4/tcp_output.c | 7 +--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f3c5dcb..56a287a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -812,6 +812,12 @@ static inline int tcp_skb_mss(const struct sk_buff *skb)
return TCP_SKB_CB(skb)->tcp_gso_size;
 }
 
+static inline bool  tcp_skb_can_collapse_eor(const struct sk_buff *skb,
+const struct sk_buff *next_skb)
+{
+   return !(TCP_SKB_CB(skb)->eor_info && skb_shinfo(next_skb)->tskey);
+}
+
 /* Events passed to congestion control interface */
 enum tcp_ca_event {
CA_EVENT_TX_START,  /* first transmit when no packets in flight */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 75e8336..e60922d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1368,6 +1368,9 @@ static struct sk_buff *tcp_shift_skb_data(struct sock 
*sk, struct sk_buff *skb,
if ((TCP_SKB_CB(prev)->sacked & TCPCB_TAGBITS) != TCPCB_SACKED_ACKED)
goto fallback;
 
+   if (!tcp_skb_can_collapse_eor(prev, skb))
+   goto fallback;
+
in_sack = !after(start_seq, TCP_SKB_CB(skb)->seq) &&
  !before(end_seq, TCP_SKB_CB(skb)->end_seq);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e71336c..1ae21f1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2512,7 +2512,8 @@ static void tcp_collapse_retrans(struct sock *sk, struct 
sk_buff *skb)
 }
 
 /* Check if coalescing SKBs is legal. */
-static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *skb)
+static bool tcp_can_collapse(const struct sock *sk, const struct sk_buff *to,
+const struct sk_buff *skb)
 {
if (tcp_skb_pcount(skb) > 1)
return false;
@@ -2526,6 +2527,8 @@ static bool tcp_can_collapse(const struct sock *sk, const 
struct sk_buff *skb)
/* Some heurestics for collapsing over SACK'd could be invented */
if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)
return false;
+   if (!tcp_skb_can_collapse_eor(to, skb))
+   return false;
 
return true;
 }
@@ -2546,7 +2549,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *to,
return;
 
tcp_for_write_queue_from_safe(skb, tmp, sk) {
-   if (!tcp_can_collapse(sk, skb))
+   if (!tcp_can_collapse(sk, to, skb))
break;
 
space -= skb->len;
-- 
2.5.1



[RFC PATCH v2 net-next 0/7] tcp: Make use of MSG_EOR in tcp_sendmsg

2016-04-18 Thread Martin KaFai Lau
v2:
~ Rework based on the recent work
  "add TX timestamping via cmsg" by
  Soheil Hassas Yeganeh 
~ This version takes the MSG_EOR bit as a signal of
  end-of-response-message and leave the selective
  timestamping job to the cmsg
~ Changes based on the v1 feedback (like avoid
  unlikely check in a loop and adding tcp_sendpage
  support)
~ The first 3 patches are bug fixes.  The fixes in this
  series depend on the newly introduced txstamp_ack in
  net-next.  I will make relevant patches against net after
  getting some feedback.
~ The test results are based on the recently posted net fix:
  "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks"
~ Due to the lacking cmsg support in packetdrill (or I just
  could not find it), a BPF prog is used to kprobe
  to sock_queue_err_skb() and print out the value of
  serr->ee.ee_data.  The BPF prog (run-able from bcc) is
  attached at the end.

Some of the following is stolen from a commit message of
the following patch to serve as a high level description of
the objective in this series:

This patchset allows the user process to use MSG_EOR during
tcp_sendmsg to tell the kernel that it is the last byte
of an application response message.

It is currently useful when the end-user has turned on any bit of the
SOF_TIMESTAMPING_TX_RECORD_MASK (either by setsockopt or cmsg).
The kernel will then mark the newly added tcb->eor_info bit so
that the shinfo->tskey will not be overwritten (i.e. lost) in
the later skb append/collapse operation.

Each skb can only track one tskey (which is the seq number of the
last byte of the message).   To allow tracking the last byte of
multiple response messages (e.g. HTTP2), this patch takes an
approach by not appending to the previous skb during tcp_sendmsg
if this previous skb's eor information (only shinfo->tskey for now)
will be overwritten.  A similar case also happens during
retransmission.

This approach avoids introducing another list to track the tskey.
The downside is that it will have less tso benefit and/or more
outgoing packets.  Practically, due to the amount of measurement
data generated, sampling is usually used in production. (i.e. not
every connection is tracked).

One of our use case is at the webserver.  The webserver tracks
the HTTP2 response latency by measuring when the webserver sends
the first byte to the socket till the TCP ACK of the last byte
is received.  In the cases where we don't have client side
measurement, measuring from the server side is the only option.
In the cases we have the client side measurement, the server side
data can also be used to justify/cross-check-with the client
side data.

The TCP PRR paper [1] also measures a similar metrics:
"The TCP latency of a HTTP response when the server sends the first
byte until it receives the acknowledgment (ACK) for the last byte."

[1] Proportional Rate Reduction for TCP:
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37486.pdf

BPF prog used for testing:
~
#!/usr/bin/env python

from __future__ import print_function
from bcc import BPF

bpf_text = """
#include 
#include 
#include 
#include 

#ifdef memset
#undef memset
#endif

int trace_err_skb(struct pt_regs *ctx)
{
struct sk_buff *skb = (struct sk_buff *)ctx->si;
struct sock *sk = (struct sock *)ctx->di;
struct sock_exterr_skb *serr;
u32 ee_data = 0;

if (!sk || !skb)
return 0;

serr = SKB_EXT_ERR(skb);
bpf_probe_read(_data, sizeof(ee_data), >ee.ee_data);
bpf_trace_printk("ee_data:%u\\n", ee_data);

return 0;
};
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()



Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP

2016-04-18 Thread Alexandre Belloni
On 18/04/2016 at 15:17:58 -0700, Florian Fainelli wrote :
> Yes, seems like it, how about adding this:
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c
> b/drivers/net/ethernet/cadence/macb.c
> index 98b99149ce0b..21096dfb0e83 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -440,7 +440,7 @@ static int macb_mii_init(struct macb *bp)
> snprintf(bp->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
>  bp->pdev->name, bp->pdev->id);
> bp->mii_bus->priv = bp;
> -   bp->mii_bus->parent = >dev->dev;
> +   bp->mii_bus->parent = >pdev->dev;
> pdata = dev_get_platdata(>pdev->dev);
> 
> dev_set_drvdata(>dev->dev, bp->mii_bus);

Works fine.

But still, this doesn't solve the phy issue ;)


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


[PATCH net] tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks

2016-04-18 Thread Martin KaFai Lau
Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
it could incorrectly think that a skb has already
been acked and queue a SCM_TSTAMP_ACK cmsg to the
sk->sk_error_queue.

In tcp_ack_tstamp(), it checks
'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
script, between() returns true but the tskey is actually not acked.
e.g. try between(3, 2, 1).

The fix is to replace between() with one before() and one !before().
By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
removed.

A packetdrill script is used to reproduce the dup ack scenario.
Due to the lacking cmsg support in packetdrill (may be I
cannot find it),  a BPF prog is used to kprobe to
sock_queue_err_skb() and print out the value of
serr->ee.ee_data.

Both the packetdrill and the bcc BPF script is attached at the end of
this commit message.

BPF Output Before Fix:
~~
  <...>-2056  [001] d.s.   433.927987: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   433.929563: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   433.930765: : ee_data:1459  #incorrect
packetdrill-2056  [001] d.s.   434.028177: : ee_data:1459
packetdrill-2056  [001] d.s.   434.029686: : ee_data:14599

BPF Output After Fix:
~~
  <...>-2049  [000] d.s.   113.517039: : ee_data:1459
  <...>-2049  [000] d.s.   113.517253: : ee_data:14599

BCC BPF Script:
~~
#!/usr/bin/env python

from __future__ import print_function
from bcc import BPF

bpf_text = """
#include 
#include 
#include 
#include 

#ifdef memset
#undef memset
#endif

int trace_err_skb(struct pt_regs *ctx)
{
struct sk_buff *skb = (struct sk_buff *)ctx->si;
struct sock *sk = (struct sock *)ctx->di;
struct sock_exterr_skb *serr;
u32 ee_data = 0;

if (!sk || !skb)
return 0;

serr = SKB_EXT_ERR(skb);
bpf_probe_read(_data, sizeof(ee_data), >ee.ee_data);
bpf_trace_printk("ee_data:%u\\n", ee_data);

return 0;
};
"""

b = BPF(text=bpf_text)
b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb")
print("Attached to kprobe")
b.trace_print()

Packetdrill Script:
~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 < S 0:0(0) win 32792 
0.100 > S. 0:0(0) ack 1 
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 1460) = 1460
0.200 write(4, ..., 13140) = 13140

0.200 > P. 1:1461(1460) ack 1
0.200 > . 1461:8761(7300) ack 1
0.200 > P. 8761:14601(5840) ack 1

0.300 < . 1:1(0) ack 1 win 257 
0.300 < . 1:1(0) ack 1 win 257 
0.300 < . 1:1(0) ack 1 win 257 
0.300 > P. 1:1461(1460) ack 1
0.400 < . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 > F. 14601:14601(0) ack 1
0.500 < F. 1:1(0) ack 14602 win 257
0.500 > . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau 
Cc: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Soheil Hassas Yeganeh 
Cc: Willem de Bruijn 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e6e65f7..0edb071 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3098,7 +3098,8 @@ static void tcp_ack_tstamp(struct sock *sk, struct 
sk_buff *skb,
 
shinfo = skb_shinfo(skb);
if ((shinfo->tx_flags & SKBTX_ACK_TSTAMP) &&
-   between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1))
+   !before(shinfo->tskey, prior_snd_una) &&
+   before(shinfo->tskey, tcp_sk(sk)->snd_una))
__skb_tstamp_tx(skb, NULL, sk, SCM_TSTAMP_ACK);
 }
 
-- 
2.5.1



[PATCH net-next] net: dsa: remove tag_protocol from dsa_switch

2016-04-18 Thread Vivien Didelot
Having the tag protocol in dsa_switch_driver for setup time and in
dsa_switch_tree for runtime is enough. Remove dsa_switch's one.

Signed-off-by: Vivien Didelot 
---
 include/net/dsa.h | 5 -
 net/dsa/dsa.c | 5 ++---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index c4bc42b..2d280ab 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -136,11 +136,6 @@ struct dsa_switch {
void *priv;
 
/*
-* Tagging protocol understood by this switch
-*/
-   enum dsa_tag_protocol   tag_protocol;
-
-   /*
 * Configuration data for this switch.
 */
struct dsa_chip_data*pd;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index efa612f..d61ceed 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -267,7 +267,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 * switch.
 */
if (dst->cpu_switch == index) {
-   switch (ds->tag_protocol) {
+   switch (drv->tag_protocol) {
 #ifdef CONFIG_NET_DSA_TAG_DSA
case DSA_TAG_PROTO_DSA:
dst->rcv = dsa_netdev_ops.rcv;
@@ -295,7 +295,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
goto out;
}
 
-   dst->tag_protocol = ds->tag_protocol;
+   dst->tag_protocol = drv->tag_protocol;
}
 
/*
@@ -411,7 +411,6 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
ds->pd = pd;
ds->drv = drv;
ds->priv = priv;
-   ds->tag_protocol = drv->tag_protocol;
ds->master_dev = host_dev;
 
ret = dsa_switch_setup_one(ds, parent);
-- 
2.8.0



Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP

2016-04-18 Thread Florian Fainelli
On 18/04/16 15:14, Alexandre Belloni wrote:
> On 15/04/2016 at 15:23:39 -0700, Florian Fainelli wrote :
>> On 15/04/16 15:17, Alexandre Belloni wrote:
>>> On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> Trace without my patch:
> libphy: MACB_mii_bus: probed
> macb f802.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf802 irq 
> 27 (fc:c2:3d:0c:6e:05)
> Micrel KSZ8081 or KSZ8091 f802.etherne:01: attached PHY driver 
> [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f802.etherne:01, 
> irq=171)
> Micrel KSZ8081 or KSZ8091 f802.etherne:01: PHY state change READY -> 
> READY
> [...]
> Micrel KSZ8081 or KSZ8091 f802.etherne:01: PHY state change READY -> 
> READY

 Are there some state changes before this? How is it getting to state
 READY? It would expect it to start in DOWN, from when the phy device
 was created in phy_device_create().

>>>
>>> No other changes. I forgot to mention that this is when booting with a
>>> cable plugged in. Unplugging and replugging the cable makes the link
>>> detection work fine even without the patch.
>>
>> OK, so the last hunk of the change in d5c3d84657db ("net: phy: Avoid
>> polling PHY with PHY_IGNORE_INTERRUPTS"):
>>
>> -   queue_delayed_work(system_power_efficient_wq, >state_queue,
>> -  PHY_STATE_TIME * HZ);
>> +   /* Only re-schedule a PHY state machine change if we are polling the
>> +* PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
>> +* between states from phy_mac_interrupt()
>> +*/
>> +   if (phydev->irq == PHY_POLL)
>> +   queue_delayed_work(system_power_efficient_wq,
>> >state_queue,
>> +  PHY_STATE_TIME * HZ);
>>
>>
>> is presumably what broke for you, right?
>>
>> Could you also give this patch a spin and see if it works better with
>> it? The macb driver does something racy with how the MDIO and PHY are
>> probe wrt. registering the netdev, that needs fixing too.
>>
>> diff --git a/drivers/net/ethernet/cadence/macb.c
>> b/drivers/net/ethernet/cadence/macb.c
>> index eec3200ade4a..98b99149ce0b 100644
>> --- a/drivers/net/ethernet/cadence/macb.c
>> +++ b/drivers/net/ethernet/cadence/macb.c
>> @@ -3005,28 +3005,36 @@ static int macb_probe(struct platform_device *pdev)
>> if (err)
>> goto err_out_free_netdev;
>>
>> +   err = macb_mii_init(bp);
>> +   if (err)
>> +   goto err_out_free_netdev;
>> +
>> +   phydev = bp->phy_dev;
>> +   phy_attached_info(phydev);
>> +
>> +   netif_carrier_off(dev);
>> +
>> err = register_netdev(dev);
>> if (err) {
>> dev_err(>dev, "Cannot register net device,
>> aborting.\n");
>> goto err_out_unregister_netdev;
>> }
>>
>> -   err = macb_mii_init(bp);
>> -   if (err)
>> -   goto err_out_unregister_netdev;
>> -
>> -   netif_carrier_off(dev);
>> -
>> netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
>> macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
>> dev->base_addr, dev->irq, dev->dev_addr);
>>
>> -   phydev = bp->phy_dev;
>> -   phy_attached_info(phydev);
>> -
>> return 0;
>>
>>  err_out_unregister_netdev:
>> +   phy_disconnect(bp->phy_dev);
>> +   mdiobus_unregister(bp->mii_bus);
>> +   mdiobus_free(bp->mii_bus);
>> +
>> +   /* Shutdown the PHY if there is a GPIO reset */
>> +   if (bp->reset_gpio)
>> +   gpiod_set_value(bp->reset_gpio, 0);
>> +
>> unregister_netdev(dev);
>>
>>  err_out_free_netdev:
>>
> 
> Well, this fails with:
> 
> [2.78] [ cut here ]
> [2.78] WARNING: CPU: 0 PID: 1 at lib/kobject.c:597 
> kobject_get+0x6c/0xa0
> [2.79] kobject: '(null)' (df532280): is not initialized, yet 
> kobject_get() is being called.
> [2.80] Modules linked in:
> [2.80] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-rc1+ #46
> [2.81] Hardware name: Atmel SAMA5
> [2.81] [] (unwind_backtrace) from [] 
> (show_stack+0x10/0x14)
> [2.82] [] (show_stack) from [] (__warn+0xe4/0xfc)
> [2.83] [] (__warn) from [] 
> (warn_slowpath_fmt+0x38/0x48)
> [2.84] [] (warn_slowpath_fmt) from [] 
> (kobject_get+0x6c/0xa0)
> [2.84] [] (kobject_get) from [] 
> (device_add+0xac/0x56c)
> [2.85] [] (device_add) from [] 
> (__mdiobus_register+0x8c/0x198)
> [2.86] [] (__mdiobus_register) from [] 
> (of_mdiobus_register+0x20/0x184)
> [2.87] [] (of_mdiobus_register) from [] 
> (macb_probe+0x488/0x898)
> [2.88] [] (macb_probe) from [] 
> (platform_drv_probe+0x4c/0xb0)
> [2.88] [] (platform_drv_probe) from [] 
> (driver_probe_device+0x214/0x2c0)
> [2.89] [] (driver_probe_device) from [] 
> (__driver_attach+0xb8/0xbc)
> [2.90] [] 

Re: [PATCH net-next 2/8] perf, bpf: allow bpf programs attach to tracepoints

2016-04-18 Thread Steven Rostedt
On Mon, 18 Apr 2016 14:43:07 -0700
Alexei Starovoitov  wrote:


> I was worried about this too, but single 'if' and two calls
> (as in commit 98b5c2c65c295) is a better way, since it's faster, cleaner
> and doesn't need to refactor the whole perf_trace_buf_submit() to pass
> extra event_call argument to it.
> perf_trace_buf_submit() is already ugly with 8 arguments!

Right, but I solved that in ftrace by creating an on-stack descriptor
that can be passed by a single parameter. See the "fbuffer" in the
trace_event_raw_event* code.


> Passing more args or creating a struct to pass args only going to
> hurt performance without much reduction in .text size.
> tinyfication folks will disable tracepoints anyway.
> Note that the most common case is bpf returning 0 and not even
> calling perf_trace_buf_submit() which is already slow due
> to so many args passed on stack.
> This stuff is called million times a second, so every instruction
> counts.

Note, that doesn't matter if you are bloating the kernel for the 99.9%
of those that don't use bpf.

Please remember this! Us tracing folks are second class citizens! If
there's a way to speed up tracing by 10%, but in doing so we cause
mainline to be hurt by over 1%, we shouldn't be doing it. Tracing and
hooks on tracepoints are really not used by many people. Don't fall
into Linus's category of "my code is the most important". That's
especially true for tracing.

These macros causes bloat. There's been many complaints about it.
There's a way around it that isn't that bad (with the descriptor), we
should be using it.

-- Steve


Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP

2016-04-18 Thread Alexandre Belloni
On 15/04/2016 at 15:23:39 -0700, Florian Fainelli wrote :
> On 15/04/16 15:17, Alexandre Belloni wrote:
> > On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> >>> Trace without my patch:
> >>> libphy: MACB_mii_bus: probed
> >>> macb f802.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf802 irq 
> >>> 27 (fc:c2:3d:0c:6e:05)
> >>> Micrel KSZ8081 or KSZ8091 f802.etherne:01: attached PHY driver 
> >>> [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f802.etherne:01, 
> >>> irq=171)
> >>> Micrel KSZ8081 or KSZ8091 f802.etherne:01: PHY state change READY -> 
> >>> READY
> >>> [...]
> >>> Micrel KSZ8081 or KSZ8091 f802.etherne:01: PHY state change READY -> 
> >>> READY
> >>
> >> Are there some state changes before this? How is it getting to state
> >> READY? It would expect it to start in DOWN, from when the phy device
> >> was created in phy_device_create().
> >>
> > 
> > No other changes. I forgot to mention that this is when booting with a
> > cable plugged in. Unplugging and replugging the cable makes the link
> > detection work fine even without the patch.
> 
> OK, so the last hunk of the change in d5c3d84657db ("net: phy: Avoid
> polling PHY with PHY_IGNORE_INTERRUPTS"):
> 
> -   queue_delayed_work(system_power_efficient_wq, >state_queue,
> -  PHY_STATE_TIME * HZ);
> +   /* Only re-schedule a PHY state machine change if we are polling the
> +* PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
> +* between states from phy_mac_interrupt()
> +*/
> +   if (phydev->irq == PHY_POLL)
> +   queue_delayed_work(system_power_efficient_wq,
> >state_queue,
> +  PHY_STATE_TIME * HZ);
> 
> 
> is presumably what broke for you, right?
> 
> Could you also give this patch a spin and see if it works better with
> it? The macb driver does something racy with how the MDIO and PHY are
> probe wrt. registering the netdev, that needs fixing too.
> 
> diff --git a/drivers/net/ethernet/cadence/macb.c
> b/drivers/net/ethernet/cadence/macb.c
> index eec3200ade4a..98b99149ce0b 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -3005,28 +3005,36 @@ static int macb_probe(struct platform_device *pdev)
> if (err)
> goto err_out_free_netdev;
> 
> +   err = macb_mii_init(bp);
> +   if (err)
> +   goto err_out_free_netdev;
> +
> +   phydev = bp->phy_dev;
> +   phy_attached_info(phydev);
> +
> +   netif_carrier_off(dev);
> +
> err = register_netdev(dev);
> if (err) {
> dev_err(>dev, "Cannot register net device,
> aborting.\n");
> goto err_out_unregister_netdev;
> }
> 
> -   err = macb_mii_init(bp);
> -   if (err)
> -   goto err_out_unregister_netdev;
> -
> -   netif_carrier_off(dev);
> -
> netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
> macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
> dev->base_addr, dev->irq, dev->dev_addr);
> 
> -   phydev = bp->phy_dev;
> -   phy_attached_info(phydev);
> -
> return 0;
> 
>  err_out_unregister_netdev:
> +   phy_disconnect(bp->phy_dev);
> +   mdiobus_unregister(bp->mii_bus);
> +   mdiobus_free(bp->mii_bus);
> +
> +   /* Shutdown the PHY if there is a GPIO reset */
> +   if (bp->reset_gpio)
> +   gpiod_set_value(bp->reset_gpio, 0);
> +
> unregister_netdev(dev);
> 
>  err_out_free_netdev:
> 

Well, this fails with:

[2.78] [ cut here ]
[2.78] WARNING: CPU: 0 PID: 1 at lib/kobject.c:597 kobject_get+0x6c/0xa0
[2.79] kobject: '(null)' (df532280): is not initialized, yet 
kobject_get() is being called.
[2.80] Modules linked in:
[2.80] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-rc1+ #46
[2.81] Hardware name: Atmel SAMA5
[2.81] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[2.82] [] (show_stack) from [] (__warn+0xe4/0xfc)
[2.83] [] (__warn) from [] 
(warn_slowpath_fmt+0x38/0x48)
[2.84] [] (warn_slowpath_fmt) from [] 
(kobject_get+0x6c/0xa0)
[2.84] [] (kobject_get) from [] 
(device_add+0xac/0x56c)
[2.85] [] (device_add) from [] 
(__mdiobus_register+0x8c/0x198)
[2.86] [] (__mdiobus_register) from [] 
(of_mdiobus_register+0x20/0x184)
[2.87] [] (of_mdiobus_register) from [] 
(macb_probe+0x488/0x898)
[2.88] [] (macb_probe) from [] 
(platform_drv_probe+0x4c/0xb0)
[2.88] [] (platform_drv_probe) from [] 
(driver_probe_device+0x214/0x2c0)
[2.89] [] (driver_probe_device) from [] 
(__driver_attach+0xb8/0xbc)
[2.90] [] (__driver_attach) from [] 
(bus_for_each_dev+0x68/0x9c)
[2.91] [] (bus_for_each_dev) from [] 
(bus_add_driver+0x1a0/0x218)
[2.92] [] (bus_add_driver) from [] 

[PATCH] rtl8xxxu: hide unused tables

2016-04-18 Thread Arnd Bergmann
The references to some arrays in the rtl8xxxu driver were moved inside
of an #ifdef, but the symbols remain outside, resulting in build warnings:

rtl8xxxu/rtl8xxxu.c:1506:33: error: 'rtl8188ru_radioa_1t_highpa_table' defined 
but not used
rtl8xxxu/rtl8xxxu.c:1431:33: error: 'rtl8192cu_radioa_1t_init_table' defined 
but not used
rtl8xxxu/rtl8xxxu.c:1407:33: error: 'rtl8192cu_radiob_2t_init_table' defined 
but not used
rtl8xxxu/rtl8xxxu.c:1332:33: error: 'rtl8192cu_radioa_2t_init_table' defined 
but not used
rtl8xxxu/rtl8xxxu.c:239:35: error: 'rtl8192c_power_base' defined but not used
rtl8xxxu/rtl8xxxu.c:217:35: error: 'rtl8188r_power_base' defined but not used

This adds an extra #ifdef around them to shut up the warnings.

Signed-off-by: Arnd Bergmann 
Fixes: 2fc0b8e5a17d ("rtl8xxxu: Add TX power base values for gen1 parts")
Fixes: 4062b8ffec36 ("rtl8xxxu: Move PHY RF init into device specific 
functions")
---
 drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
index 928ca56f751c..0ba84b5fe0d6 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.c
@@ -214,6 +214,7 @@ static struct rtl8xxxu_reg8val rtl8192e_mac_init_table[] = {
{0x, 0xff},
 };
 
+#ifdef CONFIG_RTL8XXXU_UNTESTED
 static struct rtl8xxxu_power_base rtl8188r_power_base = {
.reg_0e00 = 0x06080808,
.reg_0e04 = 0x00040406,
@@ -257,6 +258,7 @@ static struct rtl8xxxu_power_base rtl8192c_power_base = {
.reg_084c = 0x0b0c0d0e,
.reg_0868 = 0x01030509,
 };
+#endif
 
 static struct rtl8xxxu_power_base rtl8723a_power_base = {
.reg_0e00 = 0x0a0c0c0c,
@@ -1329,6 +1331,7 @@ static struct rtl8xxxu_rfregval 
rtl8723bu_radioa_1t_init_table[] = {
{0xff, 0x}
 };
 
+#ifdef CONFIG_RTL8XXXU_UNTESTED
 static struct rtl8xxxu_rfregval rtl8192cu_radioa_2t_init_table[] = {
{0x00, 0x00030159}, {0x01, 0x00031284},
{0x02, 0x00098000}, {0x03, 0x00018c63},
@@ -1577,6 +1580,7 @@ static struct rtl8xxxu_rfregval 
rtl8188ru_radioa_1t_highpa_table[] = {
{0x00, 0x00030159},
{0xff, 0x}
 };
+#endif
 
 static struct rtl8xxxu_rfregval rtl8192eu_radioa_init_table[] = {
{0x7f, 0x0082}, {0x81, 0x0003fc00},
-- 
2.7.0



[PATCH] net: w5100: don't build spi driver without w5100

2016-04-18 Thread Arnd Bergmann
The w5100-spi driver front-end only makes sense when the w5100
core driver is enabled, not for a configuration that only has w5300:

drivers/net/built-in.o: In function `w5100_spi_remove':
drivers/net/ethernet/wiznet/w5100-spi.c:277: undefined reference to 
`w5100_remove'
drivers/net/built-in.o: In function `w5100_spi_probe':
drivers/net/ethernet/wiznet/w5100-spi.c:272: undefined reference to 
`w5100_probe'
drivers/net/built-in.o: In function `w5200_spi_init':
drivers/net/ethernet/wiznet/w5100-spi.c:125: undefined reference to 
`w5100_ops_priv'
drivers/net/built-in.o: In function `w5200_spi_readbulk':
drivers/net/ethernet/wiznet/w5100-spi.c:125: undefined reference to 
`w5100_ops_priv'
drivers/net/built-in.o: In function `w5200_spi_writebulk':
drivers/net/ethernet/wiznet/w5100-spi.c:125: undefined reference to 
`w5100_ops_priv'
drivers/net/built-in.o:(.data+0x3ed1c): undefined reference to `w5100_pm_ops'

This adds an appropriate Kconfig dependency.

Signed-off-by: Arnd Bergmann 
Fixes: 630cf09751fe ("net: w5100: support SPI interface mode")
---
 drivers/net/ethernet/wiznet/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/wiznet/Kconfig 
b/drivers/net/ethernet/wiznet/Kconfig
index 1f15376e9856..f3385a1999a2 100644
--- a/drivers/net/ethernet/wiznet/Kconfig
+++ b/drivers/net/ethernet/wiznet/Kconfig
@@ -71,7 +71,7 @@ endchoice
 
 config WIZNET_W5100_SPI
tristate "WIZnet W5100/W5200 Ethernet support for SPI mode"
-   depends on WIZNET_BUS_ANY
+   depends on WIZNET_BUS_ANY && WIZNET_W5100
depends on SPI
---help---
  In SPI mode host system accesses registers using SPI protocol
-- 
2.7.0



[PATCHv2 net] openvswitch: Orphan skbs before IPv6 defrag

2016-04-18 Thread Joe Stringer
This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always
orphan skbs inside ip_defrag()").

Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
cloned (implicitly orphaning) prior to queueing for reassembly. As such,
when the IPv6 message is eventually reassembled, the skb->sk for all
fragments would be NULL. After that commit was introduced, rather than
cloning, the original skbs were queued directly without orphaning. The
end result is that all frags except for the first and last may have a
socket attached.

This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
prevent BUG_ON(skb->sk) during a later call to ip6_fragment().

kernel BUG at net/ipv6/ip6_output.c:631!
[...]
Call Trace:
 
 [] ? __lock_acquire+0x927/0x20a0
 [] ? do_output.isra.28+0x1b0/0x1b0 [openvswitch]
 [] ? __lock_is_held+0x52/0x70
 [] ovs_fragment+0x1f7/0x280 [openvswitch]
 [] ? mark_held_locks+0x75/0xa0
 [] ? _raw_spin_unlock_irqrestore+0x36/0x50
 [] ? dst_discard_out+0x20/0x20
 [] ? dst_ifdown+0x80/0x80
 [] do_output.isra.28+0xf3/0x1b0 [openvswitch]
 [] do_execute_actions+0x709/0x12c0 [openvswitch]
 [] ? ovs_flow_stats_update+0x74/0x1e0 [openvswitch]
 [] ? ovs_flow_stats_update+0xa1/0x1e0 [openvswitch]
 [] ? _raw_spin_unlock+0x27/0x40
 [] ovs_execute_actions+0x45/0x120 [openvswitch]
 [] ovs_dp_process_packet+0x85/0x150 [openvswitch]
 [] ? _raw_spin_unlock+0x27/0x40
 [] ovs_execute_actions+0xc4/0x120 [openvswitch]
 [] ovs_dp_process_packet+0x85/0x150 [openvswitch]
 [] ? key_extract+0x442/0xc10 [openvswitch]
 [] ovs_vport_receive+0x5d/0xb0 [openvswitch]
 [] ? __lock_acquire+0x927/0x20a0
 [] ? __lock_acquire+0x927/0x20a0
 [] ? __lock_acquire+0x927/0x20a0
 [] ? _raw_spin_unlock_irqrestore+0x36/0x50
 [] internal_dev_xmit+0x6d/0x150 [openvswitch]
 [] ? internal_dev_xmit+0x5/0x150 [openvswitch]
 [] dev_hard_start_xmit+0x2df/0x660
 [] ? validate_xmit_skb.isra.105.part.106+0x1a/0x2b0
 [] __dev_queue_xmit+0x8f5/0x950
 [] ? __dev_queue_xmit+0x50/0x950
 [] ? mark_held_locks+0x75/0xa0
 [] dev_queue_xmit+0x10/0x20
 [] neigh_resolve_output+0x178/0x220
 [] ? ip6_finish_output2+0x219/0x7b0
 [] ip6_finish_output2+0x219/0x7b0
 [] ? ip6_finish_output2+0x65/0x7b0
 [] ? ip_idents_reserve+0x6b/0x80
 [] ? ip6_fragment+0x93f/0xc50
 [] ip6_fragment+0xba1/0xc50
 [] ? ip6_flush_pending_frames+0x40/0x40
 [] ip6_finish_output+0xcb/0x1d0
 [] ip6_output+0x5f/0x1a0
 [] ? ip6_fragment+0xc50/0xc50
 [] ip6_local_out+0x3d/0x80
 [] ip6_send_skb+0x2f/0xc0
 [] ip6_push_pending_frames+0x4d/0x50
 [] icmpv6_push_pending_frames+0xac/0xe0
 [] icmpv6_echo_reply+0x42e/0x500
 [] icmpv6_rcv+0x4cf/0x580
 [] ip6_input_finish+0x1a7/0x690
 [] ? ip6_input_finish+0x5/0x690
 [] ip6_input+0x30/0xa0
 [] ? ip6_rcv_finish+0x1a0/0x1a0
 [] ip6_rcv_finish+0x4e/0x1a0
 [] ipv6_rcv+0x45f/0x7c0
 [] ? ipv6_rcv+0x36/0x7c0
 [] ? ip6_make_skb+0x1c0/0x1c0
 [] __netif_receive_skb_core+0x229/0xb80
 [] ? mark_held_locks+0x75/0xa0
 [] ? process_backlog+0x6f/0x230
 [] __netif_receive_skb+0x16/0x70
 [] process_backlog+0x78/0x230
 [] ? process_backlog+0xdd/0x230
 [] net_rx_action+0x203/0x480
 [] ? mark_held_locks+0x75/0xa0
 [] __do_softirq+0xde/0x49f
 [] ? ip6_finish_output2+0x228/0x7b0
 [] do_softirq_own_stack+0x1c/0x30
 
 [] do_softirq.part.18+0x3b/0x40
 [] __local_bh_enable_ip+0xb6/0xc0
 [] ip6_finish_output2+0x251/0x7b0
 [] ? ip6_fragment+0xba1/0xc50
 [] ? ip_idents_reserve+0x6b/0x80
 [] ? ip6_fragment+0x93f/0xc50
 [] ip6_fragment+0xba1/0xc50
 [] ? ip6_flush_pending_frames+0x40/0x40
 [] ip6_finish_output+0xcb/0x1d0
 [] ip6_output+0x5f/0x1a0
 [] ? ip6_fragment+0xc50/0xc50
 [] ip6_local_out+0x3d/0x80
 [] ip6_send_skb+0x2f/0xc0
 [] ip6_push_pending_frames+0x4d/0x50
 [] rawv6_sendmsg+0xa28/0xe30
 [] ? inet_sendmsg+0xc7/0x1d0
 [] inet_sendmsg+0x106/0x1d0
 [] ? inet_sendmsg+0x5/0x1d0
 [] sock_sendmsg+0x38/0x50
 [] SYSC_sendto+0xf6/0x170
 [] ? trace_hardirqs_on_thunk+0x1b/0x1d
 [] SyS_sendto+0xe/0x10
 [] entry_SYSCALL_64_fastpath+0x18/0xa8
Code: 06 48 83 3f 00 75 26 48 8b 87 d8 00 00 00 2b 87 d0 00 00 00 48 39 d0 72 
14 8b 87 e4 00 00 00 83 f8 01 75 09 48 83 7f 18 00 74 9a <0f> 0b 41 8b 86 cc 00 
00 00 49 8#
RIP  [] ip6_fragment+0x73a/0xc50
 RSP 

Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone
operations")
Reported-by: Daniele Di Proietto 
Signed-off-by: Joe Stringer 
---
 net/openvswitch/conntrack.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 1b9d286756be..b5fea1101faa 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -367,6 +367,7 @@ static int handle_fragments(struct net *net, struct 
sw_flow_key *key,
} else if (key->eth.type == htons(ETH_P_IPV6)) {
enum ip6_defrag_users user = IP6_DEFRAG_CONNTRACK_IN + zone;
 
+   skb_orphan(skb);
memset(IP6CB(skb), 0, sizeof(struct 

Re: [PATCH nf] netfilter: ipv6: Orphan skbs in nf_ct_frag6_gather()

2016-04-18 Thread Joe Stringer
On 18 April 2016 at 11:35, Pablo Neira Ayuso  wrote:
> On Thu, Apr 14, 2016 at 05:35:39PM -0700, Joe Stringer wrote:
>> On 14 April 2016 at 03:35, Pablo Neira Ayuso  wrote:
>> > On Thu, Apr 14, 2016 at 10:40:15AM +0200, Florian Westphal wrote:
>> >> David Laight  wrote:
>> >> > From: Joe Stringer
>> >> > > Sent: 13 April 2016 19:10
>> >> > > This is the IPv6 equivalent of commit 8282f27449bf ("inet: frag: 
>> >> > > Always
>> >> > > orphan skbs inside ip_defrag()").
>> >> > >
>> >> > > Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
>> >> > > clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would 
>> >> > > be
>> >> > > cloned (implicitly orphaning) prior to queueing for reassembly. As 
>> >> > > such,
>> >> > > when the IPv6 message is eventually reassembled, the skb->sk for all
>> >> > > fragments would be NULL. After that commit was introduced, rather than
>> >> > > cloning, the original skbs were queued directly without orphaning. The
>> >> > > end result is that all frags except for the first and last may have a
>> >> > > socket attached.
>> >> >
>> >> > I'd have thought that the queued fragments would still want to be
>> >> > resource-counted against the socket (I think that is what skb->sk is 
>> >> > for).
>> >>
>> >> No, ipv4/ipv6 reasm has its own accouting.
>> >>
>> >> > Although I can't imagine why IPv6 reassembly is happening on skb
>> >> > associated with a socket.
>> >>
>> >> Right, thats a much more interesting question -- both ipv4 and
>> >> ipv6 orphan skbs before NF_HOOK prerouting trip.
>> >>
>> >> (That being said, I don't mind the patch, I'm just be curious how this
>> >>  can happen).
>> >
>> > If this change is specific to get this working in ovs and its
>> > conntrack support, then I don't think this belong to core
>> > infrastructure. This should be fixed in ovs instead.
>>
>> I admit I've only been able to reproduce it with OVS. My main reason
>> for proposing the fix this way was just because this is what the IPv4
>> code does, so I figured IPv6 should be consistent with that.
>
> You mean that this is what you did in 029f7f3b8701 to fix this, right?
>
> But we shouldn't add code to the core that is OVS specific for no
> reason. We don't need this orphan from ipv4 and ipv6 as Florian
> indicated.

That makes complete sense to me. I was wondering whether the original
IPv4 fix was the correct one from the nf core perspective - but it
seems like perhaps it is, given that ip_defrag() has a lot more users
from a lot more different paths which are likely relying on the skb to
be orphaned. In comparison, in the IPv6 path, nf_ct_frag6_gather() is
only called from OVS and the netfilter hooks; the hooks already do the
orphaning, so it's more consistent for OVS to do it as well.

> Is there any chance you can fix this from OVS and its conntrack glue
> code? Thanks.

Sure, I'll resend the patch making the fix in OVS.


Re: [PATCH] macsec: fix crypto Kconfig dependency

2016-04-18 Thread Stephen Rothwell
Hi Dave,

On Mon, 18 Apr 2016 12:43:16 -0400 (EDT) David Miller  
wrote:
>
> From: Herbert Xu 
> Date: Mon, 18 Apr 2016 18:43:36 +0800
> 
> > Right, the problem is that nothing within crypto ever selects
> > CRYPTO since it's also used as a way of hiding the crypto menu
> > options.  
> 
> As far as I understand it, this won't help.  Because selects do not
> trigger other selects and dependencies.

My understanding is that selects trigger other selects, but do not
honour dependencies.  So, in general, you should not select a symbol
that has dependencies (unless you also have the same dependencies or
select those dependencies).

-- 
Cheers,
Stephen Rothwell


Re: [PATCH net-next 2/8] perf, bpf: allow bpf programs attach to tracepoints

2016-04-18 Thread Alexei Starovoitov

On 4/18/16 1:29 PM, Steven Rostedt wrote:

On Mon, 4 Apr 2016 21:52:48 -0700
Alexei Starovoitov  wrote:


introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be
attached to tracepoints.
The tracepoint will copy the arguments in the per-cpu buffer and pass
it to the bpf program as its first argument.
The layout of the fields can be discovered by doing
'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format'
prior to the compilation of the program with exception that first 8 bytes
are reserved and not accessible to the program. This area is used to store
the pointer to 'struct pt_regs' which some of the bpf helpers will use:
+-+
| 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program)
+-+
| N bytes | static tracepoint fields defined in tracepoint/format (bpf readonly)
+-+
| dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet)
+-+

Not that all of the fields are already dumped to user space via perf ring buffer
and some application access it directly without consulting tracepoint/format.
Same rule applies here: static tracepoint fields should only be accessed
in a format defined in tracepoint/format. The order of fields and
field sizes are not an ABI.

Signed-off-by: Alexei Starovoitov 
---

...

-   entry = perf_trace_buf_prepare(__entry_size,\
-   event_call->event.type, &__regs, ); \
+   event_type = prog ? TRACE_EVENT_TYPE_MAX : event_call->event.type; \


Can you move this into perf_trace_entry_prepare?


that's the old version.
The last one are commits 1e1dcd93b46 and 98b5c2c65c295 in net-next.


+   if (prog) { \
+   *(struct pt_regs **)entry = __regs; \
+   if (!trace_call_bpf(prog, entry) || hlist_empty(head)) { \
+   perf_swevent_put_recursion_context(rctx);   \
+   return; \
+   }   \
+   memset(>ent, 0, sizeof(entry->ent));   \
+   }   \


And perhaps this into perf_trace_buf_submit()?

Tracepoints are a major cause of bloat, and the reasons for these
prepare and submit functions is to move code out of the macros. Every
tracepoint in the kernel (1000 and counting) will include this code.
I've already had complaints that each tracepoint can add up to 5k to
the core.


I was worried about this too, but single 'if' and two calls
(as in commit 98b5c2c65c295) is a better way, since it's faster, cleaner
and doesn't need to refactor the whole perf_trace_buf_submit() to pass
extra event_call argument to it.
perf_trace_buf_submit() is already ugly with 8 arguments!
Passing more args or creating a struct to pass args only going to
hurt performance without much reduction in .text size.
tinyfication folks will disable tracepoints anyway.
Note that the most common case is bpf returning 0 and not even
calling perf_trace_buf_submit() which is already slow due
to so many args passed on stack.
This stuff is called million times a second, so every instruction
counts.



Re: [PATCH net-next] macvlan: fix failure during registration

2016-04-18 Thread Eric W. Biederman
Francesco Ruggeri  writes:

> On Mon, Apr 18, 2016 at 11:48 AM, Eric W. Biederman
>  wrote:
>>
>> These interactions all seem a little bit funny.  At a quick skim it
>> would make more sense to increment the port count in macvlan_init,
>> and completely remove the need to mess with port counts anywhere except
>> macvlan_init and macvlan_uninit.
>
> Thanks Eric, let me try that.
>
>>
>> If for some reason that can't be done the code can easily look at
>> dev->reg_state.  If dev->reg_state == NETREG_UNITIALIZED it should
>> be exactly the same as your new flag being set.
>>
>
> I am not sure that in macvlan_uninit one can tell whether it is being
> invoked in the context of a failed register_netdevice (if that is what
> you meant).
>
> In case of register_netdevice failing in
> call_netdevice_notifiers(NETDEV_REGISTER) the call sequence is:
>
> macvlan_common_newlink
>   register_netdevice
> call_netdevice_notifiers(NETDEV_REGISTER, dev) (<== fail here)
>   rollback_registered(dev);
> rollback_registered_many
>   dev->reg_state = NETREG_UNREGISTERING
>   dev->netdev_ops->ndo_uninit(dev)
>   dev->reg_state = NETREG_UNREGISTERED;
>

The code I have looks a little different.  But that is a good point.

But please see if you can get macvlan_init to do the necessary work.
That should simplify everything, and make clever games unnecessary.

Eric





Re: [PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Eric Dumazet
On Mon, 2016-04-18 at 14:10 -0700, Roopa Prabhu wrote:

> + if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
> + struct rtnl_link_stats64 *sp;
> +
> + attr = nla_reserve(skb, IFLA_STATS_LINK_64,
> +sizeof(struct rtnl_link_stats64));
> + if (!attr)
> + goto nla_put_failure;
> +
> + sp = nla_data(attr);

Are you sure we have a guarantee that sp is aligned to u64 fields ?

x86 does not care, but some arches would have a potential misalign
access here.


> + dev_get_stats(dev, sp);
> + }




Re: [PATCH net-next 0/8] allow bpf attach to tracepoints

2016-04-18 Thread Alexei Starovoitov

On 4/18/16 1:47 PM, Steven Rostedt wrote:

On Mon, 18 Apr 2016 12:51:43 -0700
Alexei Starovoitov  wrote:



yeah, it could be added to ftrace as well, but it won't be as effective
as perf_trace, since the cost of trace_event_buffer_reserve() in
trace_event_raw_event_() handler is significantly higher than
perf_trace_buf_alloc() in perf_trace_().


Right, because ftrace optimizes the case where all traces make it into
the ring buffer (zero copy). Of course, if a filter is a attached, it
would be trivial to add a case to copy first into a per cpu context
buffer, and only copy into the ring buffer if the filter succeeds. Hmm,
that may actually be something I want to do regardless with the current
filter system.


Then from the program point of view it wouldn't care how that memory
is allocated, so the user tools will just use perf_trace_() style.
The only use case I see for bpf with ftrace's tracepoint handler
is to work as an actual filter, but we already have filters there...


But wasn't it shown that eBPF could speed up the current filters? If we
hook into eBPF then we could replace what we have with a faster filter.


yes. Long ago I had a patch to accelerate filter_match_preds(),
but then abandoned it, since, as you said, ftrace is primarily targeting
streaming these events to user space and faster filtering probably
won't be much noticeable to the end user.
So far all the tools around bpf have been capitalizing on
the ability to aggregate data in the kernel.


At the same time there is whole ftrace as function tracer. That is
very lucrative field for bpf to plug into ;)


Which could get this as a side-effect of this change.


not really as side-effect. That would be the new thing to design.
I only have few rough ideas on how to approach that.



Re: [PATCH net-next] tcp-tso: do not split TSO packets at retransmit time

2016-04-18 Thread Yuchung Cheng
On Mon, Apr 18, 2016 at 1:56 PM, Eric Dumazet  wrote:
> Linux TCP stack painfully segments all TSO/GSO packets before retransmits.
>
> This was fine back in the days when TSO/GSO were emerging, with their
> bugs, but we believe the dark age is over.
>
> Keeping big packets in write queues, but also in stack traversal
> has a lot of benefits.
>  - Less memory overhead, because write queues have less skbs
>  - Less cpu overhead at ACK processing.
>  - Better SACK processing, as lot of studies mentioned how
>awful linux was at this ;)
>  - Less cpu overhead to send the rtx packets
>(IP stack traversal, netfilter traversal, qdisc, drivers...)
>  - Better latencies in presence of losses.
>  - Smaller spikes in fq like packet schedulers, as retransmits
>are not constrained by TCP Small Queues.
>
> 1 % packet losses are common today, and at 100Gbit speeds, this
> translates to ~80,000 losses per second. If we are unlucky and
> first MSS of a 45-MSS TSO is lost, we are cooking 44 MSS segments
> at rtx instead of a single 44-MSS TSO packet.
>
> Signed-off-by: Eric Dumazet 
Acked-by: Yuchung Cheng 
> ---
>  include/net/tcp.h |  4 ++--
>  net/ipv4/tcp_input.c  |  2 +-
>  net/ipv4/tcp_output.c | 64 
> +++
>  net/ipv4/tcp_timer.c  |  4 ++--
>  4 files changed, 34 insertions(+), 40 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index fd40f8c64d5f..0dc272dcd772 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -538,8 +538,8 @@ __u32 cookie_v6_init_sequence(const struct sk_buff *skb, 
> __u16 *mss);
>  void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
>int nonagle);
>  bool tcp_may_send_now(struct sock *sk);
> -int __tcp_retransmit_skb(struct sock *, struct sk_buff *);
> -int tcp_retransmit_skb(struct sock *, struct sk_buff *);
> +int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
> +int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
>  void tcp_retransmit_timer(struct sock *sk);
>  void tcp_xmit_retransmit_queue(struct sock *);
>  void tcp_simple_retransmit(struct sock *);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 90e0d9256b74..729e489b5608 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5543,7 +5543,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, 
> struct sk_buff *synack,
> if (data) { /* Retransmit unacked data in SYN */
> tcp_for_write_queue_from(data, sk) {
> if (data == tcp_send_head(sk) ||
> -   __tcp_retransmit_skb(sk, data))
> +   __tcp_retransmit_skb(sk, data, 1))
> break;
> }
> tcp_rearm_rto(sk);
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 6451b83d81e9..4876b256a70a 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2266,7 +2266,7 @@ void tcp_send_loss_probe(struct sock *sk)
> if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
> goto rearm_timer;
>
> -   if (__tcp_retransmit_skb(sk, skb))
> +   if (__tcp_retransmit_skb(sk, skb, 1))
> goto rearm_timer;
>
> /* Record snd_nxt for loss detection. */
> @@ -2551,17 +2551,17 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
> struct sk_buff *to,
>   * state updates are done by the caller.  Returns non-zero if an
>   * error occurred which prevented the send.
>   */
> -int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
> +int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
>  {
> -   struct tcp_sock *tp = tcp_sk(sk);
> struct inet_connection_sock *icsk = inet_csk(sk);
> +   struct tcp_sock *tp = tcp_sk(sk);
> unsigned int cur_mss;
> -   int err;
> +   int diff, len, err;
> +
>
> -   /* Inconslusive MTU probe */
> -   if (icsk->icsk_mtup.probe_size) {
> +   /* Inconclusive MTU probe */
> +   if (icsk->icsk_mtup.probe_size)
> icsk->icsk_mtup.probe_size = 0;
> -   }
>
> /* Do not sent more than we queued. 1/4 is reserved for possible
>  * copying overhead: fragmentation, tunneling, mangling etc.
> @@ -2594,30 +2594,27 @@ int __tcp_retransmit_skb(struct sock *sk, struct 
> sk_buff *skb)
> TCP_SKB_CB(skb)->seq != tp->snd_una)
> return -EAGAIN;
>
> -   if (skb->len > cur_mss) {
> -   if (tcp_fragment(sk, skb, cur_mss, cur_mss, GFP_ATOMIC))
> +   len = cur_mss * segs;
> +   if (skb->len > len) {
> +   if (tcp_fragment(sk, skb, len, cur_mss, GFP_ATOMIC))
> return -ENOMEM; /* We'll try again later. */
> } else {
> -   int oldpcount = tcp_skb_pcount(skb);
> +   if 

Re: [PATCH net-next v4] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread roopa
On 4/18/16, 9:28 AM, David Miller wrote:
> From: Roopa Prabhu 
> Date: Sun, 17 Apr 2016 22:35:35 -0700
>
>> This patch adds a new RTM_GETSTATS message to query link stats via netlink
>> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
>> returns a lot more than just stats and is expensive in some cases when
>> frequent polling for stats from userspace is a common operation.
> The following build failure needs to be resolved:
>
> security/selinux/nlmsgtab.c: In function ‘selinux_nlmsg_lookup’:
> include/linux/compiler.h:506:38: error: call to ‘__compiletime_assert_158’ 
> declared with attribute error: BUILD_BUG_ON failed: RTM_MAX != (RTM_NEWNSID + 
> 3)
sorry, did not know about this check. sent v5 just now..


[PATCH net-next v5] rtnetlink: add new RTM_GETSTATS message to dump link stats

2016-04-18 Thread Roopa Prabhu
From: Roopa Prabhu 

This patch adds a new RTM_GETSTATS message to query link stats via netlink
from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
returns a lot more than just stats and is expensive in some cases when
frequent polling for stats from userspace is a common operation.

RTM_GETSTATS is an attempt to provide a light weight netlink message
to explicity query only link stats from the kernel on an interface.
The idea is to also keep it extensible so that new kinds of stats can be
added to it in the future.

This patch adds the following attribute for NETDEV stats:
struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
[IFLA_STATS_LINK_64]  = { .len = sizeof(struct rtnl_link_stats64) },
};

Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
a single interface or all interfaces with NLM_F_DUMP.

Future possible new types of stat attributes:
link af stats:
- IFLA_STATS_LINK_IPV6  (nested. for ipv6 stats)
- IFLA_STATS_LINK_MPLS  (nested. for mpls/mdev stats)
extended stats:
- IFLA_STATS_LINK_EXTENDED (nested. extended software netdev stats like 
bridge,
  vlan, vxlan etc)
- IFLA_STATS_LINK_HW_EXTENDED (nested. extended hardware stats which are
  available via ethtool today)

This patch also declares a filter mask for all stat attributes.
User has to provide a mask of stats attributes to query. filter mask
can be specified in the new hdr 'struct if_stats_msg' for stats messages.
Other important field in the header is the ifindex.

This api can also include attributes for global stats (eg tcp) in the future.
When global stats are included in a stats msg, the ifindex in the header
must be zero. A single stats message cannot contain both global and
netdev specific stats. To easily distinguish them, netdev specific stat
attributes name are prefixed with IFLA_STATS_LINK_

Without any attributes in the filter_mask, no stats will be returned.

This patch has been tested with mofified iproute2 ifstat.

Suggested-by: Jamal Hadi Salim 
Signed-off-by: Roopa Prabhu 
---
RFC to v1 (apologies for the delay in sending this version out. busy days):
- Addressed feedback from Dave
- removed rtnl_link_stats
- Added hdr struct if_stats_msg to carry ifindex and
  filter mask
- new macro IFLA_STATS_FILTER_BIT(ATTR) for filter mask
- split the ipv6 patch into a separate patch, need some more eyes on it
- prefix attributes with IFLA_STATS instead of IFLA_LINK_STATS for
  shorter attribute names

v2:
- move IFLA_STATS_INET6 declaration to the inet6 patch
- get rid of RTM_DELSTATS
- mark ipv6 patch RFC. It can be used as an example for
  other AF stats like stats

v3:
- add required padding to the if_stats_msg structure(suggested by jamal)
- rename netdev stat attributes with IFLA_STATS_LINK prefix
  so that they are easily distinguishable with global
  stats in the future (after global stats discussion with thomas)
- get rid of unnecessary copy when getting stats with dev_get_stats
  (suggested by dave)

v4:
- dropped calcit and af stats from this patch. Will add it
  back when it becomes necessary and with the first af stats
  patch
- add check for null filter in dump and return -EINVAL:
  this follows rtnl_fdb_dump in returning an error.
  But since netlink_dump does not propagate the error
  to the user, the user will not see an error and
  but will also not see any data. This is consistent with
  other kinds of dumps.

v5:
- fix selinux nlmsgtab to account for new RTM_*STATS messages

 include/uapi/linux/if_link.h   |  23 +++
 include/uapi/linux/rtnetlink.h |   5 ++
 net/core/rtnetlink.c   | 150 +
 security/selinux/nlmsgtab.c|   4 +-
 4 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index bb3a90b..0762f35 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -781,4 +781,27 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* STATS section */
+
+struct if_stats_msg {
+   __u8  family;
+   __u8  pad1;
+   __u16 pad2;
+   __u32 ifindex;
+   __u32 filter_mask;
+};
+
+/* A stats attribute can be netdev specific or a global stat.
+ * For netdev stats, lets use the prefix IFLA_STATS_LINK_*
+ */
+enum {
+   IFLA_STATS_UNSPEC,
+   IFLA_STATS_LINK_64,
+   __IFLA_STATS_MAX,
+};
+
+#define IFLA_STATS_MAX (__IFLA_STATS_MAX - 1)
+
+#define IFLA_STATS_FILTER_BIT(ATTR)(1 << (ATTR))
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index ca764b5..cc885c4 100644
--- 

[PATCH net-next] tcp-tso: do not split TSO packets at retransmit time

2016-04-18 Thread Eric Dumazet
Linux TCP stack painfully segments all TSO/GSO packets before retransmits.

This was fine back in the days when TSO/GSO were emerging, with their
bugs, but we believe the dark age is over.

Keeping big packets in write queues, but also in stack traversal
has a lot of benefits.
 - Less memory overhead, because write queues have less skbs
 - Less cpu overhead at ACK processing.
 - Better SACK processing, as lot of studies mentioned how
   awful linux was at this ;)
 - Less cpu overhead to send the rtx packets
   (IP stack traversal, netfilter traversal, qdisc, drivers...)
 - Better latencies in presence of losses.
 - Smaller spikes in fq like packet schedulers, as retransmits
   are not constrained by TCP Small Queues.

1 % packet losses are common today, and at 100Gbit speeds, this
translates to ~80,000 losses per second. If we are unlucky and
first MSS of a 45-MSS TSO is lost, we are cooking 44 MSS segments
at rtx instead of a single 44-MSS TSO packet.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h |  4 ++--
 net/ipv4/tcp_input.c  |  2 +-
 net/ipv4/tcp_output.c | 64 +++
 net/ipv4/tcp_timer.c  |  4 ++--
 4 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fd40f8c64d5f..0dc272dcd772 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -538,8 +538,8 @@ __u32 cookie_v6_init_sequence(const struct sk_buff *skb, 
__u16 *mss);
 void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
   int nonagle);
 bool tcp_may_send_now(struct sock *sk);
-int __tcp_retransmit_skb(struct sock *, struct sk_buff *);
-int tcp_retransmit_skb(struct sock *, struct sk_buff *);
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
 void tcp_retransmit_timer(struct sock *sk);
 void tcp_xmit_retransmit_queue(struct sock *);
 void tcp_simple_retransmit(struct sock *);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 90e0d9256b74..729e489b5608 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5543,7 +5543,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, 
struct sk_buff *synack,
if (data) { /* Retransmit unacked data in SYN */
tcp_for_write_queue_from(data, sk) {
if (data == tcp_send_head(sk) ||
-   __tcp_retransmit_skb(sk, data))
+   __tcp_retransmit_skb(sk, data, 1))
break;
}
tcp_rearm_rto(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6451b83d81e9..4876b256a70a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2266,7 +2266,7 @@ void tcp_send_loss_probe(struct sock *sk)
if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
goto rearm_timer;
 
-   if (__tcp_retransmit_skb(sk, skb))
+   if (__tcp_retransmit_skb(sk, skb, 1))
goto rearm_timer;
 
/* Record snd_nxt for loss detection. */
@@ -2551,17 +2551,17 @@ static void tcp_retrans_try_collapse(struct sock *sk, 
struct sk_buff *to,
  * state updates are done by the caller.  Returns non-zero if an
  * error occurred which prevented the send.
  */
-int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 {
-   struct tcp_sock *tp = tcp_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
+   struct tcp_sock *tp = tcp_sk(sk);
unsigned int cur_mss;
-   int err;
+   int diff, len, err;
+
 
-   /* Inconslusive MTU probe */
-   if (icsk->icsk_mtup.probe_size) {
+   /* Inconclusive MTU probe */
+   if (icsk->icsk_mtup.probe_size)
icsk->icsk_mtup.probe_size = 0;
-   }
 
/* Do not sent more than we queued. 1/4 is reserved for possible
 * copying overhead: fragmentation, tunneling, mangling etc.
@@ -2594,30 +2594,27 @@ int __tcp_retransmit_skb(struct sock *sk, struct 
sk_buff *skb)
TCP_SKB_CB(skb)->seq != tp->snd_una)
return -EAGAIN;
 
-   if (skb->len > cur_mss) {
-   if (tcp_fragment(sk, skb, cur_mss, cur_mss, GFP_ATOMIC))
+   len = cur_mss * segs;
+   if (skb->len > len) {
+   if (tcp_fragment(sk, skb, len, cur_mss, GFP_ATOMIC))
return -ENOMEM; /* We'll try again later. */
} else {
-   int oldpcount = tcp_skb_pcount(skb);
+   if (skb_unclone(skb, GFP_ATOMIC))
+   return -ENOMEM;
 
-   if (unlikely(oldpcount > 1)) {
-   if (skb_unclone(skb, GFP_ATOMIC))
-   return -ENOMEM;
-   tcp_init_tso_segs(skb, cur_mss);
-   tcp_adjust_pcount(sk, skb, oldpcount 

Re: [PATCH net-next 0/8] allow bpf attach to tracepoints

2016-04-18 Thread Steven Rostedt
On Mon, 18 Apr 2016 12:51:43 -0700
Alexei Starovoitov  wrote:


> yeah, it could be added to ftrace as well, but it won't be as effective
> as perf_trace, since the cost of trace_event_buffer_reserve() in
> trace_event_raw_event_() handler is significantly higher than 
> perf_trace_buf_alloc() in perf_trace_().

Right, because ftrace optimizes the case where all traces make it into
the ring buffer (zero copy). Of course, if a filter is a attached, it
would be trivial to add a case to copy first into a per cpu context
buffer, and only copy into the ring buffer if the filter succeeds. Hmm,
that may actually be something I want to do regardless with the current
filter system.

> Then from the program point of view it wouldn't care how that memory
> is allocated, so the user tools will just use perf_trace_() style.
> The only use case I see for bpf with ftrace's tracepoint handler
> is to work as an actual filter, but we already have filters there...

But wasn't it shown that eBPF could speed up the current filters? If we
hook into eBPF then we could replace what we have with a faster filter.

> so not clear to me of the actual value of adding bpf to ftrace's
> tracepoint handler.

Well, you wouldn't because you don't use ftrace ;-) But I'm sure others
might.

> At the same time there is whole ftrace as function tracer. That is
> very lucrative field for bpf to plug into ;)

Which could get this as a side-effect of this change.

-- Steve


Re: [PATCH net-next 2/8] perf, bpf: allow bpf programs attach to tracepoints

2016-04-18 Thread Steven Rostedt
On Mon, 4 Apr 2016 21:52:48 -0700
Alexei Starovoitov  wrote:

> introduce BPF_PROG_TYPE_TRACEPOINT program type and allow it to be
> attached to tracepoints.
> The tracepoint will copy the arguments in the per-cpu buffer and pass
> it to the bpf program as its first argument.
> The layout of the fields can be discovered by doing
> 'cat /sys/kernel/debug/tracing/events/sched/sched_switch/format'
> prior to the compilation of the program with exception that first 8 bytes
> are reserved and not accessible to the program. This area is used to store
> the pointer to 'struct pt_regs' which some of the bpf helpers will use:
> +-+
> | 8 bytes | hidden 'struct pt_regs *' (inaccessible to bpf program)
> +-+
> | N bytes | static tracepoint fields defined in tracepoint/format (bpf 
> readonly)
> +-+
> | dynamic | __dynamic_array bytes of tracepoint (inaccessible to bpf yet)
> +-+
> 
> Not that all of the fields are already dumped to user space via perf ring 
> buffer
> and some application access it directly without consulting tracepoint/format.
> Same rule applies here: static tracepoint fields should only be accessed
> in a format defined in tracepoint/format. The order of fields and
> field sizes are not an ABI.
> 
> Signed-off-by: Alexei Starovoitov 
> ---
>  include/trace/perf.h| 18 ++
>  include/uapi/linux/bpf.h|  1 +
>  kernel/events/core.c| 13 +
>  kernel/trace/trace_event_perf.c |  3 +++
>  4 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/include/trace/perf.h b/include/trace/perf.h
> index 26486fcd74ce..55feb69c873f 100644
> --- a/include/trace/perf.h
> +++ b/include/trace/perf.h
> @@ -37,18 +37,19 @@ perf_trace_##call(void *__data, proto)
> \
>   struct trace_event_call *event_call = __data;   \
>   struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\
>   struct trace_event_raw_##call *entry;   \
> + struct bpf_prog *prog = event_call->prog;   \
>   struct pt_regs *__regs; \
>   u64 __addr = 0, __count = 1;\
>   struct task_struct *__task = NULL;  \
>   struct hlist_head *head;\
>   int __entry_size;   \
>   int __data_size;\
> - int rctx;   \
> + int rctx, event_type;   \
>   \
>   __data_size = trace_event_get_offsets_##call(&__data_offsets, args); \
>   \
>   head = this_cpu_ptr(event_call->perf_events);   \
> - if (__builtin_constant_p(!__task) && !__task && \
> + if (!prog && __builtin_constant_p(!__task) && !__task &&\
>   hlist_empty(head))  \
>   return; \
>   \
> @@ -56,8 +57,9 @@ perf_trace_##call(void *__data, proto)  
> \
>sizeof(u64));  \
>   __entry_size -= sizeof(u32);\
>   \
> - entry = perf_trace_buf_prepare(__entry_size,\
> - event_call->event.type, &__regs, );\
> + event_type = prog ? TRACE_EVENT_TYPE_MAX : event_call->event.type; \

Can you move this into perf_trace_entry_prepare?

> + entry = perf_trace_buf_prepare(__entry_size, event_type,\
> +&__regs, ); \
>   if (!entry) \
>   return; \
>   \
> @@ -67,6 +69,14 @@ perf_trace_##call(void *__data, proto) 
> \
>   \
>   { assign; } \
>   \
> + if (prog) { \
> + *(struct pt_regs **)entry = __regs; \
> + if (!trace_call_bpf(prog, entry) || hlist_empty(head)) { \
> + perf_swevent_put_recursion_context(rctx);   \
> +   

Greetings!!!

2016-04-18 Thread andreas123
Hi, how are you? My name is J Eric Denials, External Financial Auditor at 
Lloyds Banking Group plc., London. It is a pleasure to contact you at this time 
through this medium. I have a cool and legitimate deal to do with you as you're 
a foreigner, it will be mutually beneficial to both. If you’re interested, 
urgently, get back to me for more explanation about the deal.
Regards
Eric


Re: [PATCH] netfilter: ctnetlink: add more #ifdef around unused code

2016-04-18 Thread Pablo Neira Ayuso
On Mon, Apr 18, 2016 at 10:04:43PM +0200, Arnd Bergmann wrote:
> On Monday 18 April 2016 20:43:36 Pablo Neira Ayuso wrote:
> > On Mon, Apr 18, 2016 at 08:33:15PM +0200, Arnd Bergmann wrote:
> > > On Monday 18 April 2016 20:16:59 Pablo Neira Ayuso wrote:
> > > > On Sat, Apr 16, 2016 at 10:17:43PM +0200, Arnd Bergmann wrote:
> > > > > A recent patch removed many 'inline' annotations for static
> > > > > functions in this file, which has caused warnings for functions
> > > > > that are not used in a given configuration, in particular when
> > > > > CONFIG_NF_CONNTRACK_EVENTS is disabled:
> > > > > 
> > > > > nf_conntrack_netlink.c:572:15: 'ctnetlink_timestamp_size' defined but 
> > > > > not used
> > > > > nf_conntrack_netlink.c:546:15: 'ctnetlink_acct_size' defined but not 
> > > > > used
> > > > > nf_conntrack_netlink.c:339:12: 'ctnetlink_label_size' defined but not 
> > > > > used
> > > > 
> > > > Arnd, thanks for the fix.
> > > > 
> > > > I'm planning to push this though:
> > > > 
> > > > http://patchwork.ozlabs.org/patch/610820/
> > > > 
> > > > This is restoring the inlines for the size calculation functions, but
> > > > I think that's ok. They are rather small and they're called from the
> > > > event notification path (ie. packet path), so the compiler just place
> > > > them out of the way when not needed and we calm down the gcc warning.
> > > 
> > > Looks good. I'll put this in my randconfig builder to replace my own
> > > patch and will let you know if you missed something.
> > 
> > Thanks, will wait for your ack then.
> 
> 100 clean randconfig builds later, looks good to me.
> 
> Acked-by: Arnd Bergmann 

Thanks! I have applied this to nf-next.


[PATCH net-next] net: dsa: kill circular reference with slave priv

2016-04-18 Thread Vivien Didelot
The dsa_slave_priv structure does not need a pointer to its net_device.
Kill it.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa_priv.h | 5 -
 net/dsa/slave.c| 9 -
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 1d1a546..dfa3377 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -22,11 +22,6 @@ struct dsa_device_ops {
 };
 
 struct dsa_slave_priv {
-   /*
-* The linux network interface corresponding to this
-* switch port.
-*/
-   struct net_device   *dev;
struct sk_buff *(*xmit)(struct sk_buff *skb,
struct net_device *dev);
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2dae0d0..3b6750f 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -673,10 +673,10 @@ static void dsa_slave_get_ethtool_stats(struct net_device 
*dev,
struct dsa_slave_priv *p = netdev_priv(dev);
struct dsa_switch *ds = p->parent;
 
-   data[0] = p->dev->stats.tx_packets;
-   data[1] = p->dev->stats.tx_bytes;
-   data[2] = p->dev->stats.rx_packets;
-   data[3] = p->dev->stats.rx_bytes;
+   data[0] = dev->stats.tx_packets;
+   data[1] = dev->stats.tx_bytes;
+   data[2] = dev->stats.rx_packets;
+   data[3] = dev->stats.rx_bytes;
if (ds->drv->get_ethtool_stats != NULL)
ds->drv->get_ethtool_stats(ds, p->port, data + 4);
 }
@@ -1063,7 +1063,6 @@ int dsa_slave_create(struct dsa_switch *ds, struct device 
*parent,
slave_dev->vlan_features = master->vlan_features;
 
p = netdev_priv(slave_dev);
-   p->dev = slave_dev;
p->parent = ds;
p->port = port;
 
-- 
2.8.0



Re: [PATCH net-next] macvlan: fix failure during registration

2016-04-18 Thread Francesco Ruggeri
On Mon, Apr 18, 2016 at 11:48 AM, Eric W. Biederman
 wrote:
>
> These interactions all seem a little bit funny.  At a quick skim it
> would make more sense to increment the port count in macvlan_init,
> and completely remove the need to mess with port counts anywhere except
> macvlan_init and macvlan_uninit.

Thanks Eric, let me try that.

>
> If for some reason that can't be done the code can easily look at
> dev->reg_state.  If dev->reg_state == NETREG_UNITIALIZED it should
> be exactly the same as your new flag being set.
>

I am not sure that in macvlan_uninit one can tell whether it is being
invoked in the context of a failed register_netdevice (if that is what
you meant).

In case of register_netdevice failing in
call_netdevice_notifiers(NETDEV_REGISTER) the call sequence is:

macvlan_common_newlink
  register_netdevice
call_netdevice_notifiers(NETDEV_REGISTER, dev) (<== fail here)
  rollback_registered(dev);
rollback_registered_many
  dev->reg_state = NETREG_UNREGISTERING
  dev->netdev_ops->ndo_uninit(dev)
  dev->reg_state = NETREG_UNREGISTERED;


Francesco


Re: [PATCH] netfilter: ctnetlink: add more #ifdef around unused code

2016-04-18 Thread Arnd Bergmann
On Monday 18 April 2016 20:43:36 Pablo Neira Ayuso wrote:
> On Mon, Apr 18, 2016 at 08:33:15PM +0200, Arnd Bergmann wrote:
> > On Monday 18 April 2016 20:16:59 Pablo Neira Ayuso wrote:
> > > On Sat, Apr 16, 2016 at 10:17:43PM +0200, Arnd Bergmann wrote:
> > > > A recent patch removed many 'inline' annotations for static
> > > > functions in this file, which has caused warnings for functions
> > > > that are not used in a given configuration, in particular when
> > > > CONFIG_NF_CONNTRACK_EVENTS is disabled:
> > > > 
> > > > nf_conntrack_netlink.c:572:15: 'ctnetlink_timestamp_size' defined but 
> > > > not used
> > > > nf_conntrack_netlink.c:546:15: 'ctnetlink_acct_size' defined but not 
> > > > used
> > > > nf_conntrack_netlink.c:339:12: 'ctnetlink_label_size' defined but not 
> > > > used
> > > 
> > > Arnd, thanks for the fix.
> > > 
> > > I'm planning to push this though:
> > > 
> > > http://patchwork.ozlabs.org/patch/610820/
> > > 
> > > This is restoring the inlines for the size calculation functions, but
> > > I think that's ok. They are rather small and they're called from the
> > > event notification path (ie. packet path), so the compiler just place
> > > them out of the way when not needed and we calm down the gcc warning.
> > 
> > Looks good. I'll put this in my randconfig builder to replace my own
> > patch and will let you know if you missed something.
> 
> Thanks, will wait for your ack then.

100 clean randconfig builds later, looks good to me.

Acked-by: Arnd Bergmann 


Re: [PATCH net-next 0/8] allow bpf attach to tracepoints

2016-04-18 Thread Alexei Starovoitov

On 4/18/16 9:13 AM, Steven Rostedt wrote:

On Mon, 4 Apr 2016 21:52:46 -0700
Alexei Starovoitov  wrote:


Hi Steven, Peter,

last time we discussed bpf+tracepoints it was a year ago [1] and the reason
we didn't proceed with that approach was that bpf would make arguments
arg1, arg2 to trace_xx(arg1, arg2) call to be exposed to bpf program
and that was considered unnecessary extension of abi. Back then I wanted
to avoid the cost of buffer alloc and field assign part in all
of the tracepoints, but looks like when optimized the cost is acceptable.
So this new apporach doesn't expose any new abi to bpf program.
The program is looking at tracepoint fields after they were copied
by perf_trace_xx() and described in /sys/kernel/debug/tracing/events/xxx/format


Does this mean that ftrace could use this ability as well? As all the
current filtering of ftrace was done after it was copied to the buffer,
and that was what you wanted to avoid.


yeah, it could be added to ftrace as well, but it won't be as effective
as perf_trace, since the cost of trace_event_buffer_reserve() in
trace_event_raw_event_() handler is significantly higher than 
perf_trace_buf_alloc() in perf_trace_().

Then from the program point of view it wouldn't care how that memory
is allocated, so the user tools will just use perf_trace_() style.
The only use case I see for bpf with ftrace's tracepoint handler
is to work as an actual filter, but we already have filters there...
so not clear to me of the actual value of adding bpf to ftrace's
tracepoint handler.
At the same time there is whole ftrace as function tracer. That is
very lucrative field for bpf to plug into ;)

As far as 2nd part of your question about copying. Yeah, it adds to
the cost, so kprobe sometimes is faster than perf_trace tracepoint
that is copying a lot of args which are not going to be used.



RE: [Intel-wired-lan] [PATCH net-next V2 2/2] intel: ixgbevf: Support Windows hosts (Hyper-V)

2016-04-18 Thread KY Srinivasan


> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Monday, April 18, 2016 10:00 AM
> To: KY Srinivasan ; Alexander Duyck
> 
> Cc: David Miller ; Netdev
> ; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; Robo Bot
> ; Jason Wang ;
> e...@mellanox.com; ja...@mellanox.com; yevge...@mellanox.com; John
> Ronciak ; intel-wired-lan  l...@lists.osuosl.org>
> Subject: Re: [Intel-wired-lan] [PATCH net-next V2 2/2] intel: ixgbevf: Support
> Windows hosts (Hyper-V)
> 
> On Mon, 2016-04-18 at 16:52 +, KY Srinivasan wrote:
> []
> > > > +bool ixgbevf_on_hyperv(struct ixgbe_hw *hw)
> > > > +{
> > > > +   if (hw->mbx.ops.check_for_msg == NULL)
> > > > +   return true;
> > > > +   else
> > > > +   return false;
> > > > +}
> 
> trivia:
> 
> bool func(...)
> {
>   if ()
>   return true;
>   else
>   return false;
> }
> 
> can generally be written as:
> 
> bool func(...)
> {
>   return ;
> }

Thanks Joe; will update.

K. Y


[PATCH net-next 6/7] vxlan: break dependency with netdev drivers

2016-04-18 Thread Hannes Frederic Sowa
Currently all drivers depend and autoload the vxlan module because how
vxlan_get_rx_port is linked into them. Remove this dependency:

By using a new event type in the netdevice notifier call chain we proxy
the request from the drivers to flush and resetup the vxlan ports not
directly via function call but by the already existing netdevice
notifier call chain.

I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so.
We don't need to save those ids, as the event type field is an unsigned
long and using specialized event types for this purpose seemed to be a
more elegant way. This also comes in beneficial if in future we want to
add offloading knobs for vxlan.

Cc: Jesse Gross 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/vxlan.c   | 14 +-
 include/linux/netdevice.h |  1 +
 include/net/vxlan.h   |  6 ++
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index c2e22c2532a1a8..6fb93b57a72489 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2527,7 +2527,7 @@ static struct device_type vxlan_type = {
  * supply the listening VXLAN udp ports. Callers are expected
  * to implement the ndo_add_vxlan_port.
  */
-void vxlan_get_rx_port(struct net_device *dev)
+static void vxlan_push_rx_ports(struct net_device *dev)
 {
struct vxlan_sock *vs;
struct net *net = dev_net(dev);
@@ -2536,6 +2536,9 @@ void vxlan_get_rx_port(struct net_device *dev)
__be16 port;
unsigned int i;
 
+   if (!dev->netdev_ops->ndo_add_vxlan_port)
+   return;
+
spin_lock(>sock_lock);
for (i = 0; i < PORT_HASH_SIZE; ++i) {
hlist_for_each_entry_rcu(vs, >sock_list[i], hlist) {
@@ -2547,7 +2550,6 @@ void vxlan_get_rx_port(struct net_device *dev)
}
spin_unlock(>sock_lock);
 }
-EXPORT_SYMBOL_GPL(vxlan_get_rx_port);
 
 /* Initialize the device structure. */
 static void vxlan_setup(struct net_device *dev)
@@ -3283,20 +3285,22 @@ static void vxlan_handle_lowerdev_unregister(struct 
vxlan_net *vn,
unregister_netdevice_many(_kill);
 }
 
-static int vxlan_lowerdev_event(struct notifier_block *unused,
-   unsigned long event, void *ptr)
+static int vxlan_netdevice_event(struct notifier_block *unused,
+unsigned long event, void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 
if (event == NETDEV_UNREGISTER)
vxlan_handle_lowerdev_unregister(vn, dev);
+   else if (event == NETDEV_OFFLOAD_PUSH_VXLAN)
+   vxlan_push_rx_ports(dev);
 
return NOTIFY_DONE;
 }
 
 static struct notifier_block vxlan_notifier_block __read_mostly = {
-   .notifier_call = vxlan_lowerdev_event,
+   .notifier_call = vxlan_netdevice_event,
 };
 
 static __net_init int vxlan_init_net(struct net *net)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a3bb534576a383..d4c8cd424f8dbc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2244,6 +2244,7 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_BONDING_INFO0x0019
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
+#define NETDEV_OFFLOAD_PUSH_VXLAN  0x001C
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index d442eb3129cde4..673e9f9e6da768 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -390,13 +390,11 @@ static inline __be32 vxlan_compute_rco(unsigned int 
start, unsigned int offset)
return vni_field;
 }
 
-#if IS_ENABLED(CONFIG_VXLAN)
-void vxlan_get_rx_port(struct net_device *netdev);
-#else
 static inline void vxlan_get_rx_port(struct net_device *netdev)
 {
+   ASSERT_RTNL();
+   call_netdevice_notifiers(NETDEV_OFFLOAD_PUSH_VXLAN, netdev);
 }
-#endif
 
 static inline unsigned short vxlan_get_sk_family(struct vxlan_sock *vs)
 {
-- 
2.5.5



[PATCH net-next 5/7] qlcnic: protect qlicnic_attach_func with rtnl_lock

2016-04-18 Thread Hannes Frederic Sowa
qlcnic_attach_func requires rtnl_lock to be held.

Cc: dept-gelinuxnic...@qlogic.com
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 1205f6f9c94173..1c29105b6c364f 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -3952,8 +3952,14 @@ static pci_ers_result_t 
qlcnic_82xx_io_error_detected(struct pci_dev *pdev,
 
 static pci_ers_result_t qlcnic_82xx_io_slot_reset(struct pci_dev *pdev)
 {
-   return qlcnic_attach_func(pdev) ? PCI_ERS_RESULT_DISCONNECT :
-   PCI_ERS_RESULT_RECOVERED;
+   pci_ers_result_t res;
+
+   rtnl_lock();
+   res = qlcnic_attach_func(pdev) ? PCI_ERS_RESULT_DISCONNECT :
+PCI_ERS_RESULT_RECOVERED;
+   rtnl_unlock();
+
+   return res;
 }
 
 static void qlcnic_82xx_io_resume(struct pci_dev *pdev)
-- 
2.5.5



[PATCH net-next 7/7] geneve: break dependency with netdev drivers

2016-04-18 Thread Hannes Frederic Sowa
Equivalent to "vxlan: break dependency with netdev drivers", don't
autoload geneve module in case the driver is loaded. Instead make the
coupling weaker by using netdevice notifiers as proxy.

Cc: Jesse Gross 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/geneve.c  | 31 ---
 include/linux/netdevice.h |  1 +
 include/net/geneve.h  |  6 ++
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index efbc7ceedc3a13..59314cf47584e4 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1172,7 +1172,7 @@ static struct device_type geneve_type = {
  * supply the listening GENEVE udp ports. Callers are expected
  * to implement the ndo_add_geneve_port.
  */
-void geneve_get_rx_port(struct net_device *dev)
+static void geneve_push_rx_ports(struct net_device *dev)
 {
struct net *net = dev_net(dev);
struct geneve_net *gn = net_generic(net, geneve_net_id);
@@ -1181,6 +1181,9 @@ void geneve_get_rx_port(struct net_device *dev)
struct sock *sk;
__be16 port;
 
+   if (!dev->netdev_ops->ndo_add_geneve_port)
+   return;
+
rcu_read_lock();
list_for_each_entry_rcu(gs, >sock_list, list) {
sk = gs->sock->sk;
@@ -1190,7 +1193,6 @@ void geneve_get_rx_port(struct net_device *dev)
}
rcu_read_unlock();
 }
-EXPORT_SYMBOL_GPL(geneve_get_rx_port);
 
 /* Initialize the device structure. */
 static void geneve_setup(struct net_device *dev)
@@ -1538,6 +1540,21 @@ struct net_device *geneve_dev_create_fb(struct net *net, 
const char *name,
 }
 EXPORT_SYMBOL_GPL(geneve_dev_create_fb);
 
+static int geneve_netdevice_event(struct notifier_block *unused,
+ unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+   if (event == NETDEV_OFFLOAD_PUSH_GENEVE)
+   geneve_push_rx_ports(dev);
+
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block geneve_notifier_block __read_mostly = {
+   .notifier_call = geneve_netdevice_event,
+};
+
 static __net_init int geneve_init_net(struct net *net)
 {
struct geneve_net *gn = net_generic(net, geneve_net_id);
@@ -1590,11 +1607,18 @@ static int __init geneve_init_module(void)
if (rc)
goto out1;
 
-   rc = rtnl_link_register(_link_ops);
+   rc = register_netdevice_notifier(_notifier_block);
if (rc)
goto out2;
 
+   rc = rtnl_link_register(_link_ops);
+   if (rc)
+   goto out3;
+
return 0;
+
+out3:
+   unregister_netdevice_notifier(_notifier_block);
 out2:
unregister_pernet_subsys(_net_ops);
 out1:
@@ -1605,6 +1629,7 @@ late_initcall(geneve_init_module);
 static void __exit geneve_cleanup_module(void)
 {
rtnl_link_unregister(_link_ops);
+   unregister_netdevice_notifier(_notifier_block);
unregister_pernet_subsys(_net_ops);
 }
 module_exit(geneve_cleanup_module);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d4c8cd424f8dbc..1f6d5db471a24f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2245,6 +2245,7 @@ struct netdev_lag_lower_state_info {
 #define NETDEV_PRECHANGEUPPER  0x001A
 #define NETDEV_CHANGELOWERSTATE0x001B
 #define NETDEV_OFFLOAD_PUSH_VXLAN  0x001C
+#define NETDEV_OFFLOAD_PUSH_GENEVE 0x001D
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
diff --git a/include/net/geneve.h b/include/net/geneve.h
index e6c23dc765f7ec..cb544a530146c2 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -62,13 +62,11 @@ struct genevehdr {
struct geneve_opt options[];
 };
 
-#if IS_ENABLED(CONFIG_GENEVE)
-void geneve_get_rx_port(struct net_device *netdev);
-#else
 static inline void geneve_get_rx_port(struct net_device *netdev)
 {
+   ASSERT_RTNL();
+   call_netdevice_notifiers(NETDEV_OFFLOAD_PUSH_GENEVE, netdev);
 }
-#endif
 
 #ifdef CONFIG_INET
 struct net_device *geneve_dev_create_fb(struct net *net, const char *name,
-- 
2.5.5



[PATCH net-next 4/7] ixgbe: protect vxlan_get_rx_port in ixgbe_service_task with rtnl_lock

2016-04-18 Thread Hannes Frederic Sowa
vxlan_get_rx_port requires rtnl_lock to be held.

Cc: Jeff Kirsher 
Cc: Jesse Brandeburg 
Cc: Shannon Nelson 
Cc: Carolyn Wyborny 
Cc: Don Skidmore 
Cc: Bruce Allan 
Cc: John Ronciak 
Cc: Mitch Williams 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2976df77bf14f5..b2f2cf40f06a87 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7192,10 +7192,12 @@ static void ixgbe_service_task(struct work_struct *work)
return;
}
 #ifdef CONFIG_IXGBE_VXLAN
+   rtnl_lock();
if (adapter->flags2 & IXGBE_FLAG2_VXLAN_REREG_NEEDED) {
adapter->flags2 &= ~IXGBE_FLAG2_VXLAN_REREG_NEEDED;
vxlan_get_rx_port(adapter->netdev);
}
+   rtnl_unlock();
 #endif /* CONFIG_IXGBE_VXLAN */
ixgbe_reset_subtask(adapter);
ixgbe_phy_interrupt_subtask(adapter);
-- 
2.5.5



[PATCH net-next 3/7] mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock

2016-04-18 Thread Hannes Frederic Sowa
mlx4_en_start_port requires rtnl_lock to be held.

Cc: Eugenia Emantayev 
Cc: Yishai Hadas 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b4b258c8ca47d4..8bd143dda95d11 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1856,6 +1856,7 @@ static void mlx4_en_restart(struct work_struct *work)
 
en_dbg(DRV, priv, "Watchdog task called for port %d\n", priv->port);
 
+   rtnl_lock();
mutex_lock(>state_lock);
if (priv->port_up) {
mlx4_en_stop_port(dev, 1);
@@ -1863,6 +1864,7 @@ static void mlx4_en_restart(struct work_struct *work)
en_err(priv, "Failed restarting port %d\n", priv->port);
}
mutex_unlock(>state_lock);
+   rtnl_unlock();
 }
 
 static void mlx4_en_clear_stats(struct net_device *dev)
-- 
2.5.5



[PATCH net-next 1/7] benet: be_resume needs to protect be_open with rtnl_lock

2016-04-18 Thread Hannes Frederic Sowa
be_open calls down to functions which expects rtnl lock to be held.

Cc: Sathya Perla 
Cc: Ajit Khaparde 
Cc: Padmanabh Ratnakar 
Cc: Sriharsha Basavapatna 
Cc: Somnath Kotur 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 536686476369bf..ed98ef1ecac38d 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4890,11 +4890,13 @@ static int be_resume(struct be_adapter *adapter)
if (status)
return status;
 
-   if (netif_running(netdev)) {
+   rtnl_lock();
+   if (netif_running(netdev))
status = be_open(netdev);
-   if (status)
-   return status;
-   }
+   rtnl_unlock();
+
+   if (status)
+   return status;
 
netif_device_attach(netdev);
 
-- 
2.5.5



[PATCH net-next 2/7] fm10k: protect fm10k_open in fm10k_io_resume with rtnl_lock

2016-04-18 Thread Hannes Frederic Sowa
fm10k_open requires rtnl_lock to be held.

Cc: Jeff Kirsher 
Cc: Jesse Brandeburg 
Cc: Shannon Nelson 
Cc: Carolyn Wyborny 
Cc: Don Skidmore 
Cc: Bruce Allan 
Cc: John Ronciak 
Cc: Mitch Williams 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index f0992950e228eb..04304d9a633935 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2362,8 +2362,10 @@ static void fm10k_io_resume(struct pci_dev *pdev)
/* reset clock */
fm10k_ts_reset(interface);
 
+   rtnl_lock();
if (netif_running(netdev))
err = fm10k_open(netdev);
+   rtnl_unlock();
 
/* final check of hardware state before registering the interface */
err = err ? : fm10k_hw_ready(interface);
-- 
2.5.5



[PATCH net-next 0/7] net: network drivers should not depend on geneve/vxlan

2016-04-18 Thread Hannes Frederic Sowa
This patchset removes the dependency of network drivers on vxlan or
geneve, so those don't get autoloaded when the nic driver is loaded.

Also audited the code such that vxlan_get_rx_port and geneve_get_rx_port
are not called without rtnl lock.

Hannes Frederic Sowa (7):
  benet: be_resume needs to protect be_open with rtnl_lock
  fm10k: protect fm10k_open in fm10k_io_resume with rtnl_lock
  mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock
  ixgbe: protect vxlan_get_rx_port in ixgbe_service_task with rtnl_lock
  qlcnic: protect qlicnic_attach_func with rtnl_lock
  vxlan: break dependency with netdev drivers
  geneve: break dependency with netdev drivers

 drivers/net/ethernet/emulex/benet/be_main.c  | 10 +---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  2 ++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 ++
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 10 ++--
 drivers/net/geneve.c | 31 +---
 drivers/net/vxlan.c  | 14 +++
 include/linux/netdevice.h|  2 ++
 include/net/geneve.h |  6 ++---
 include/net/vxlan.h  |  6 ++---
 10 files changed, 63 insertions(+), 22 deletions(-)

-- 
2.5.5



[PATCH net-next v2 1/2] bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_output

2016-04-18 Thread Daniel Borkmann
Add a BPF_F_CURRENT_CPU flag to optimize the use-case where user space has
per-CPU ring buffers and the eBPF program pushes the data into the current
CPU's ring buffer which saves us an extra helper function call in eBPF.
Also, make sure to properly reserve the remaining flags which are not used.

Signed-off-by: Daniel Borkmann 
Signed-off-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h | 4 
 kernel/trace/bpf_trace.c | 7 ++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 70eda5a..b7b0fb1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -347,6 +347,10 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
 #define BPF_F_DONT_FRAGMENT(1ULL << 2)
 
+/* BPF_FUNC_perf_event_output flags. */
+#define BPF_F_INDEX_MASK   0xULL
+#define BPF_F_CURRENT_CPU  BPF_F_INDEX_MASK
+
 /* user accessible mirror of in-kernel sk_buff.
  * new fields can only be added to the end of this structure
  */
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6855878..6bfe55c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -225,11 +225,12 @@ static const struct bpf_func_proto 
bpf_perf_event_read_proto = {
.arg2_type  = ARG_ANYTHING,
 };
 
-static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 index, u64 r4, u64 size)
+static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
 {
struct pt_regs *regs = (struct pt_regs *) (long) r1;
struct bpf_map *map = (struct bpf_map *) (long) r2;
struct bpf_array *array = container_of(map, struct bpf_array, map);
+   u64 index = flags & BPF_F_INDEX_MASK;
void *data = (void *) (long) r4;
struct perf_sample_data sample_data;
struct perf_event *event;
@@ -239,6 +240,10 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 
index, u64 r4, u64 size)
.data = data,
};
 
+   if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
+   return -EINVAL;
+   if (index == BPF_F_CURRENT_CPU)
+   index = raw_smp_processor_id();
if (unlikely(index >= array->map.max_entries))
return -E2BIG;
 
-- 
1.9.3



[PATCH net-next v2 0/2] BPF updates

2016-04-18 Thread Daniel Borkmann
This minor set adds a new helper bpf_event_output() for eBPF cls/act
program types which allows to pass events to user space applications.
For details, please see individual patches.

Thanks!

v1 -> v2:
  - Address kbuild bot found compile issue in patch 2
  - Rest as is

Daniel Borkmann (2):
  bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_output
  bpf: add event output helper for notifications/sampling/logging

 include/linux/bpf.h  |  2 ++
 include/uapi/linux/bpf.h |  4 
 kernel/bpf/core.c|  7 +++
 kernel/trace/bpf_trace.c | 34 +-
 net/core/filter.c|  2 ++
 5 files changed, 48 insertions(+), 1 deletion(-)

-- 
1.9.3



[PATCH net-next v2 2/2] bpf: add event output helper for notifications/sampling/logging

2016-04-18 Thread Daniel Borkmann
This patch adds a new helper for cls/act programs that can push events
to user space applications. For networking, this can be f.e. for sampling,
debugging, logging purposes or pushing of arbitrary wake-up events. The
idea is similar to a43eec304259 ("bpf: introduce bpf_perf_event_output()
helper") and 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example").

The eBPF program utilizes a perf event array map that user space populates
with fds from perf_event_open(), the eBPF program calls into the helper
f.e. as skb_event_output(skb, _map, BPF_F_CURRENT_CPU, raw, sizeof(raw))
so that the raw data is pushed into the fd f.e. at the map index of the
current CPU.

User space can poll/mmap/etc on this and has a data channel for receiving
events that can be post-processed. The nice thing is that since the eBPF
program and user space application making use of it are tightly coupled,
they can define their own arbitrary raw data format and what/when they
want to push.

While f.e. packet headers could be one part of the meta data that is being
pushed, this is not a substitute for things like packet sockets as whole
packet is not being pushed and push is only done in a single direction.
Intention is more of a generically usable, efficient event pipe to applications.
Workflow is that tc can pin the map and applications can attach themselves
e.g. after cls/act setup to one or multiple map slots, demuxing is done by
the eBPF program.

Adding this facility is with minimal effort, it reuses the helper
introduced in a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
and we get its functionality for free by overloading its BPF_FUNC_ identifier
for cls/act programs, ctx is currently unused, but will be made use of in
future. Example will be added to iproute2's BPF example files.

Signed-off-by: Daniel Borkmann 
Signed-off-by: Alexei Starovoitov 
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/core.c|  7 +++
 kernel/trace/bpf_trace.c | 27 +++
 net/core/filter.c|  2 ++
 4 files changed, 38 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5fb3c61..f63afdc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -169,7 +169,9 @@ u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 
r5);
 u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 void bpf_fd_array_map_clear(struct bpf_map *map);
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog 
*fp);
+
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
+const struct bpf_func_proto *bpf_get_event_output_proto(void);
 
 #ifdef CONFIG_BPF_SYSCALL
 DECLARE_PER_CPU(int, bpf_prog_active);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index be0abf6..e4248fe 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -764,14 +764,21 @@ const struct bpf_func_proto bpf_map_delete_elem_proto 
__weak;
 const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
 const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
 const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
+
 const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
+
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
return NULL;
 }
 
+const struct bpf_func_proto * __weak bpf_get_event_output_proto(void)
+{
+   return NULL;
+}
+
 /* Always built-in helper functions. */
 const struct bpf_func_proto bpf_tail_call_proto = {
.func   = NULL,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6bfe55c..75cb154 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -277,6 +277,33 @@ static const struct bpf_func_proto 
bpf_perf_event_output_proto = {
.arg5_type  = ARG_CONST_STACK_SIZE,
 };
 
+static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs);
+
+static u64 bpf_event_output(u64 r1, u64 r2, u64 flags, u64 r4, u64 size)
+{
+   struct pt_regs *regs = this_cpu_ptr(_pt_regs);
+
+   perf_fetch_caller_regs(regs);
+
+   return bpf_perf_event_output((long)regs, r2, flags, r4, size);
+}
+
+static const struct bpf_func_proto bpf_event_output_proto = {
+   .func   = bpf_event_output,
+   .gpl_only   = true,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_CONST_MAP_PTR,
+   .arg3_type  = ARG_ANYTHING,
+   .arg4_type  = ARG_PTR_TO_STACK,
+   .arg5_type  = ARG_CONST_STACK_SIZE,
+};
+
+const struct bpf_func_proto *bpf_get_event_output_proto(void)
+{
+   return _event_output_proto;
+}
+
 static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id 
func_id)
 {
switch (func_id) {
diff --git a/net/core/filter.c b/net/core/filter.c
index 5d2ac2b..218e5de 100644
--- 

Re: [PATCH net-next] macvlan: fix failure during registration

2016-04-18 Thread Eric W. Biederman

Francesco Ruggeri  writes:

> Resending, did not include netdev the first time ...
>
> If a macvlan/macvtap creation fails in register_netdevice in
> call_netdevice_notifiers(NETDEV_REGISTER) then while cleaning things up in
> rollback_registered_many it invokes macvlan_uninit. This results in
> port->count being decremented twice (in macvlan_uninit and in
> macvlan_common_newlink).
> A similar problem may exist in the ipvlan driver.
> This patch adds priv_flags to struct macvlan_dev and a flag that
> macvlan_uninit can use to determine if it is invoked in the context of a
> failed newlink.
> In macvtap_device_event(NETDEV_UNREGISTER) it also avoids cleaning up
> /dev/tapNN in case creation of the char device had previously failed.
> Tested in 3.18.

These interactions all seem a little bit funny.  At a quick skim it
would make more sense to increment the port count in macvlan_init,
and completely remove the need to mess with port counts anywhere except
macvlan_init and macvlan_uninit.

If for some reason that can't be done the code can easily look at
dev->reg_state.  If dev->reg_state == NETREG_UNITIALIZED it should
be exactly the same as your new flag being set.

> diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
> index a4ccc31..7cf82d8 100644
> --- a/include/linux/if_macvlan.h
> +++ b/include/linux/if_macvlan.h
> @@ -48,6 +48,7 @@ struct macvlan_dev {
>   netdev_features_t   set_features;
>   enum macvlan_mode   mode;
>   u16 flags;
> + u16 priv_flags;
>   /* This array tracks active taps. */
>   struct macvtap_queue__rcu *taps[MAX_MACVTAP_QUEUES];
>   /* This list tracks all taps (both enabled and disabled) */
> @@ -63,6 +64,8 @@ struct macvlan_dev {
>   unsigned intmacaddr_count;
>  };
>  
> +#define MACVLAN_PRIV_FLAG_REGISTERING1
> +
>  static inline void macvlan_count_rx(const struct macvlan_dev *vlan,
>   unsigned int len, bool success,
>   bool multicast)


Re: [PATCH net-next] enic: set netdev->vlan_features

2016-04-18 Thread David Miller
From: Govindarajulu Varadarajan 
Date: Sat, 16 Apr 2016 00:40:43 +0530

> From: Govindarajulu Varadarajan <_gov...@gmx.com>
> 
> Driver sets vlan_feature to netdev->features as hardware supports all of
> them on vlan interface.
> 
> Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com>

Applied.


Re: [PATCH 0/3] fec: ethtool: move to new api {get|set}_link_ksettings

2016-04-18 Thread David Miller
From: Philippe Reynes 
Date: Fri, 15 Apr 2016 00:34:58 +0200

> Ethtool has a new api {get|set}_link_ksettings that deprecate
> the old api {get|set}_settings. We update the fec driver to use
> this new ethtool api.
> 
> For this first version, I've converted old u32 value in phy structure
> to link_modes structure. Another way would be to replace u32 in
> phy structure to use DECLARE_LINK_MODE_MASK for advertising, 

Series applied, thanks.


Re: [PATCH] netfilter: ctnetlink: add more #ifdef around unused code

2016-04-18 Thread Pablo Neira Ayuso
On Mon, Apr 18, 2016 at 08:33:15PM +0200, Arnd Bergmann wrote:
> On Monday 18 April 2016 20:16:59 Pablo Neira Ayuso wrote:
> > On Sat, Apr 16, 2016 at 10:17:43PM +0200, Arnd Bergmann wrote:
> > > A recent patch removed many 'inline' annotations for static
> > > functions in this file, which has caused warnings for functions
> > > that are not used in a given configuration, in particular when
> > > CONFIG_NF_CONNTRACK_EVENTS is disabled:
> > > 
> > > nf_conntrack_netlink.c:572:15: 'ctnetlink_timestamp_size' defined but not 
> > > used
> > > nf_conntrack_netlink.c:546:15: 'ctnetlink_acct_size' defined but not used
> > > nf_conntrack_netlink.c:339:12: 'ctnetlink_label_size' defined but not used
> > 
> > Arnd, thanks for the fix.
> > 
> > I'm planning to push this though:
> > 
> > http://patchwork.ozlabs.org/patch/610820/
> > 
> > This is restoring the inlines for the size calculation functions, but
> > I think that's ok. They are rather small and they're called from the
> > event notification path (ie. packet path), so the compiler just place
> > them out of the way when not needed and we calm down the gcc warning.
> 
> Looks good. I'll put this in my randconfig builder to replace my own
> patch and will let you know if you missed something.

Thanks, will wait for your ack then.


Re: [PATCH net-next] tun: don't require serialization lock on tx

2016-04-18 Thread David Miller
From: Paolo Abeni 
Date: Thu, 14 Apr 2016 18:39:39 +0200

> The current tun_net_xmit() implementation don't need any external
> lock since it relies on rcu protection for the tun data structure
> and on socket queue lock for skb queuing.
> 
> This patch set the NETIF_F_LLTX feature bit in the tun device, so
> that on xmit, in absence of qdisc, no serialization lock is acquired
> by the caller.
> 
> The user space can remove the default tun qdisc with:
> 
> tc qdisc replace dev  root noqueue
> 
> Signed-off-by: Paolo Abeni 
> Acked-by: Hannes Frederic Sowa 
> Acked-by: Eric Dumazet 

Applied, thanks.


Re: [PATCH nf] netfilter: ipv6: Orphan skbs in nf_ct_frag6_gather()

2016-04-18 Thread Pablo Neira Ayuso
On Thu, Apr 14, 2016 at 05:35:39PM -0700, Joe Stringer wrote:
> On 14 April 2016 at 03:35, Pablo Neira Ayuso  wrote:
> > On Thu, Apr 14, 2016 at 10:40:15AM +0200, Florian Westphal wrote:
> >> David Laight  wrote:
> >> > From: Joe Stringer
> >> > > Sent: 13 April 2016 19:10
> >> > > This is the IPv6 equivalent of commit 8282f27449bf ("inet: frag: Always
> >> > > orphan skbs inside ip_defrag()").
> >> > >
> >> > > Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
> >> > > clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would 
> >> > > be
> >> > > cloned (implicitly orphaning) prior to queueing for reassembly. As 
> >> > > such,
> >> > > when the IPv6 message is eventually reassembled, the skb->sk for all
> >> > > fragments would be NULL. After that commit was introduced, rather than
> >> > > cloning, the original skbs were queued directly without orphaning. The
> >> > > end result is that all frags except for the first and last may have a
> >> > > socket attached.
> >> >
> >> > I'd have thought that the queued fragments would still want to be
> >> > resource-counted against the socket (I think that is what skb->sk is 
> >> > for).
> >>
> >> No, ipv4/ipv6 reasm has its own accouting.
> >>
> >> > Although I can't imagine why IPv6 reassembly is happening on skb
> >> > associated with a socket.
> >>
> >> Right, thats a much more interesting question -- both ipv4 and
> >> ipv6 orphan skbs before NF_HOOK prerouting trip.
> >>
> >> (That being said, I don't mind the patch, I'm just be curious how this
> >>  can happen).
> >
> > If this change is specific to get this working in ovs and its
> > conntrack support, then I don't think this belong to core
> > infrastructure. This should be fixed in ovs instead.
> 
> I admit I've only been able to reproduce it with OVS. My main reason
> for proposing the fix this way was just because this is what the IPv4
> code does, so I figured IPv6 should be consistent with that.

You mean that this is what you did in 029f7f3b8701 to fix this, right?

But we shouldn't add code to the core that is OVS specific for no
reason. We don't need this orphan from ipv4 and ipv6 as Florian
indicated.

Is there any chance you can fix this from OVS and its conntrack glue
code? Thanks.


Re: [PATCH] netfilter: ctnetlink: add more #ifdef around unused code

2016-04-18 Thread Arnd Bergmann
On Monday 18 April 2016 20:16:59 Pablo Neira Ayuso wrote:
> On Sat, Apr 16, 2016 at 10:17:43PM +0200, Arnd Bergmann wrote:
> > A recent patch removed many 'inline' annotations for static
> > functions in this file, which has caused warnings for functions
> > that are not used in a given configuration, in particular when
> > CONFIG_NF_CONNTRACK_EVENTS is disabled:
> > 
> > nf_conntrack_netlink.c:572:15: 'ctnetlink_timestamp_size' defined but not 
> > used
> > nf_conntrack_netlink.c:546:15: 'ctnetlink_acct_size' defined but not used
> > nf_conntrack_netlink.c:339:12: 'ctnetlink_label_size' defined but not used
> 
> Arnd, thanks for the fix.
> 
> I'm planning to push this though:
> 
> http://patchwork.ozlabs.org/patch/610820/
> 
> This is restoring the inlines for the size calculation functions, but
> I think that's ok. They are rather small and they're called from the
> event notification path (ie. packet path), so the compiler just place
> them out of the way when not needed and we calm down the gcc warning.

Looks good. I'll put this in my randconfig builder to replace my own
patch and will let you know if you missed something.

Arnd


  1   2   3   >