rfc: Are any of the seq_pad() uses really necessary?

2016-09-22 Thread Joe Perches
$ git grep -w seq_pad net
net/ipv4/fib_trie.c:seq_pad(seq, '\n');
net/ipv4/ping.c:seq_pad(seq, '\n');
net/ipv4/tcp_ipv4.c:seq_pad(seq, '\n');
net/ipv4/udp.c: seq_pad(seq, '\n');
net/phonet/socket.c:seq_pad(seq, '\n');
net/phonet/socket.c:seq_pad(seq, '\n');
net/sctp/objcnt.c:  seq_pad(seq, '\n');

what these uses do is add trailing blanks to a particular
preset block width and then append a newline.

None of these trailing pad bytes seem useful to me.

Are there really tools that expect specific line widths
when reading from things like /proc//net/

For instance:

$ cat /proc//net/udp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   
uid  timeout inode ref pointer drops 
  484: :14E9 : 07 : 00:    
1110 16961 2  0 
  486: :14EB : 07 : 00:    
1020 2022599 2  0   
  788: :A619 : 07 : 00:   
10000 4390482 2  0   
 3081: :8F0E : 07 : 00:    
1110 16963 2  0 
 3376: 357F:0035 : 07 : 00:    
1020 2022601 2  0   
 3391: :0044 : 07 : 00: 
 00 4546167 2  0   

These seq_pad uses were modified by:

>From 652586df95e5d76b37d07a11839126dcfede1621 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Thu, 14 Nov 2013 14:31:57 -0800
Subject: [PATCH] seq_file: remove "%n" usage from seq_file users

All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.

Signed-off-by: Tetsuo Handa 
Signed-off-by: Kees Cook 
Cc: Joe Perches 
Cc: David Miller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

If these are really necessary, then maybe the seq_pad function
could be optimized using a memset instead of
seq_printf(, "%*s", len, "");



[PATCH net-next 3/3] bpf: add helper to invalidate hash

2016-09-22 Thread Daniel Borkmann
Add a small helper that complements 36bbef52c7eb ("bpf: direct packet
write and access for helpers for clsact progs") for invalidating the
current skb->hash after mangling on headers via direct packet write.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  7 +++
 net/core/filter.c| 18 ++
 2 files changed, 25 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e07432b..f09c70b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -419,6 +419,13 @@ enum bpf_func_id {
 */
BPF_FUNC_csum_update,
 
+   /**
+* bpf_set_hash_invalid(skb)
+* Invalidate current skb>hash.
+* @skb: pointer to skb
+*/
+   BPF_FUNC_set_hash_invalid,
+
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index acf84fb..00351cd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1777,6 +1777,22 @@ static const struct bpf_func_proto 
bpf_get_hash_recalc_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_set_hash_invalid, struct sk_buff *, skb)
+{
+   /* After all direct packet write, this can be used once for
+* triggering a lazy recalc on next skb_get_hash() invocation.
+*/
+   skb_clear_hash(skb);
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_set_hash_invalid_proto = {
+   .func   = bpf_set_hash_invalid,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+};
+
 BPF_CALL_3(bpf_skb_vlan_push, struct sk_buff *, skb, __be16, vlan_proto,
   u16, vlan_tci)
 {
@@ -2534,6 +2550,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return _get_route_realm_proto;
case BPF_FUNC_get_hash_recalc:
return _get_hash_recalc_proto;
+   case BPF_FUNC_set_hash_invalid:
+   return _set_hash_invalid_proto;
case BPF_FUNC_perf_event_output:
return _skb_event_output_proto;
case BPF_FUNC_get_smp_processor_id:
-- 
1.9.3



[PATCH net-next 2/3] bpf: use bpf_get_smp_processor_id_proto instead of raw one

2016-09-22 Thread Daniel Borkmann
Same motivation as in commit 80b48c445797 ("bpf: don't use raw processor
id in generic helper"), but this time for XDP typed programs. Thus, allow
for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise
use the raw variant.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index e5d9977..acf84fb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2551,6 +2551,8 @@ xdp_func_proto(enum bpf_func_id func_id)
switch (func_id) {
case BPF_FUNC_perf_event_output:
return _xdp_event_output_proto;
+   case BPF_FUNC_get_smp_processor_id:
+   return _get_smp_processor_id_proto;
default:
return sk_filter_func_proto(func_id);
}
-- 
1.9.3



Re: [PATCH net-next 0/4] net: dsa: add port fast ageing

2016-09-22 Thread Andrew Lunn
On Thu, Sep 22, 2016 at 04:49:20PM -0400, Vivien Didelot wrote:
> Today the DSA drivers are in charge of flushing the MAC addresses
> associated to a port when its STP state changes from Learning or
> Forwarding, to Disabled or Blocking or Listening.
> 
> This makes the drivers more complex and hides this generic switch logic.
> 
> This patchset introduces a new optional port_fast_age operation to
> dsa_switch_ops, to move this logic to the DSA layer and keep drivers
> simple. b53 and mv88e6xxx are updated accordingly.

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH v2] bpf: Set register type according to is_valid_access()

2016-09-22 Thread Alexei Starovoitov
On Thu, Sep 22, 2016 at 09:56:47PM +0200, Mickaël Salaün wrote:
> This fix a pointer leak when an unprivileged eBPF program read a pointer
> value from the context. Even if is_valid_access() returns a pointer
> type, the eBPF verifier replace it with UNKNOWN_VALUE. The register
> value containing an address is then allowed to leak. Moreover, this
> prevented unprivileged eBPF programs to use functions with (legitimate)
> pointer arguments.
> 
> This bug is not an issue for now because the only unprivileged eBPF
> program allowed is of type BPF_PROG_TYPE_SOCKET_FILTER and all the types
> from its context are UNKNOWN_VALUE. However, this fix is important for
> future unprivileged eBPF program types which could use pointers in their
> context.
> 
> Signed-off-by: Mickaël Salaün 
> Fixes: 969bf05eb3ce ("bpf: direct packet access")

Please drop 'fixes' tag and rewrite commit log.
It's not a fix.
Right now only two reg types can be seen: PTR_TO_PACKET and PTR_TO_PACKET_END.
Both are only in clsact and xdp programs which are root only.
So nothing is leaking at present.
Best case this patch is a pre-patch for some future work.



[PATCH net-next] drivers: net: xgene: Fix MSS programming

2016-09-22 Thread Iyappan Subramanian
Current driver programs static value of MSS in hardware register for TSO
offload engine to segment the TCP payload regardless the MSS value
provided by network stack.

This patch fixes this by programming hardware registers with the
stack provided MSS value.

Since the hardware has the limitation of having only 4 MSS registers,
this patch uses reference count of mss values being used.

Signed-off-by: Iyappan Subramanian 
Signed-off-by: Toan Le 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|  7 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 90 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  8 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c | 18 -
 4 files changed, 100 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
index 8a8d055..8456337 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.h
@@ -237,6 +237,8 @@ enum xgene_enet_rm {
 #define TCPHDR_LEN 6
 #define IPHDR_POS  6
 #define IPHDR_LEN  6
+#define MSS_POS20
+#define MSS_LEN2
 #define EC_POS 22  /* Enable checksum */
 #define EC_LEN 1
 #define ET_POS 23  /* Enable TSO */
@@ -253,6 +255,11 @@ enum xgene_enet_rm {
 
 #define LAST_BUFFER(0x7800ULL << BUFDATALEN_POS)
 
+#define TSO_MSS0_POS   0
+#define TSO_MSS0_LEN   14
+#define TSO_MSS1_POS   16
+#define TSO_MSS1_LEN   14
+
 struct xgene_enet_raw_desc {
__le64 m0;
__le64 m1;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 522ba92..429f18f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -137,6 +137,7 @@ static irqreturn_t xgene_enet_rx_irq(const int irq, void 
*data)
 static int xgene_enet_tx_completion(struct xgene_enet_desc_ring *cp_ring,
struct xgene_enet_raw_desc *raw_desc)
 {
+   struct xgene_enet_pdata *pdata = netdev_priv(cp_ring->ndev);
struct sk_buff *skb;
struct device *dev;
skb_frag_t *frag;
@@ -144,6 +145,7 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
u16 skb_index;
u8 status;
int i, ret = 0;
+   u8 mss_index;
 
skb_index = GET_VAL(USERINFO, le64_to_cpu(raw_desc->m0));
skb = cp_ring->cp_skb[skb_index];
@@ -160,6 +162,13 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
   DMA_TO_DEVICE);
}
 
+   if (GET_BIT(ET, le64_to_cpu(raw_desc->m3))) {
+   mss_index = GET_VAL(MSS, le64_to_cpu(raw_desc->m3));
+   spin_lock(>mss_lock);
+   pdata->mss_refcnt[mss_index]--;
+   spin_unlock(>mss_lock);
+   }
+
/* Checking for error */
status = GET_VAL(LERR, le64_to_cpu(raw_desc->m0));
if (unlikely(status > 2)) {
@@ -178,15 +187,53 @@ static int xgene_enet_tx_completion(struct 
xgene_enet_desc_ring *cp_ring,
return ret;
 }
 
-static u64 xgene_enet_work_msg(struct sk_buff *skb)
+static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss)
+{
+   struct xgene_enet_pdata *pdata = netdev_priv(ndev);
+   bool mss_index_found = false;
+   int mss_index;
+   int i;
+
+   spin_lock(>mss_lock);
+
+   /* Reuse the slot if MSS matches */
+   for (i = 0; !mss_index_found && i < NUM_MSS_REG; i++) {
+   if (pdata->mss[i] == mss) {
+   pdata->mss_refcnt[i]++;
+   mss_index = i;
+   mss_index_found = true;
+   }
+   }
+
+   /* Overwrite the slot with ref_count = 0 */
+   for (i = 0; !mss_index_found && i < NUM_MSS_REG; i++) {
+   if (!pdata->mss_refcnt[i]) {
+   pdata->mss_refcnt[i]++;
+   pdata->mac_ops->set_mss(pdata, mss, i);
+   pdata->mss[i] = mss;
+   mss_index = i;
+   mss_index_found = true;
+   }
+   }
+
+   spin_unlock(>mss_lock);
+
+   /* No slots with ref_count = 0 available, return busy */
+   if (!mss_index_found)
+   return -EBUSY;
+
+   return mss_index;
+}
+
+static int xgene_enet_work_msg(struct sk_buff *skb, u64 *hopinfo)
 {
struct net_device *ndev = skb->dev;
struct iphdr *iph;
u8 l3hlen = 0, l4hlen = 0;
u8 ethhdr, proto = 0, csum_enable = 0;
-   u64 hopinfo = 0;
u32 hdr_len, mss = 0;
u32 i, len, 

Re: [Patch net] sch_qfq: keep backlog updated with qlen

2016-09-22 Thread Jamal Hadi Salim

On 16-09-18 07:22 PM, Cong Wang wrote:

Reported-by: Stas Nichiporovich 
Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 


Acked-by: Jamal Hadi Salim 


cheers,
jamal


Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Jamal Hadi Salim

On 16-09-22 09:21 AM, Shmulik Ladkani wrote:

From: Shmulik Ladkani 

Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.



Thank you for doing this. There was something that made me remove
initial support for this feature - I am blanking out right now but
will find my notes and give more details. It may be around preventing
loops maybe. If that was the thought then:
I am just wondering is there a use case for a packet that is redirected
from egress ethx to ingress of ethy that then requires ingress of ethy
classify? Otherwise you could just set the "dont classify" flag.
i.e SET_TC_NCLS()

cheers,
jamal


Re: [PATCH iproute2] ip: Use specific slave id

2016-09-22 Thread Stephen Hemminger
On Tue, 20 Sep 2016 18:02:12 +0800
Hangbin Liu  wrote:

> The original bond/bridge/vrf and slaves use same id, which make people
> confused. Use bond/bridge/vrf_slave as id name will make code more clear.
> 
> Acked-by: Phil Sutter 
> Signed-off-by: Hangbin Liu 
> ---

Applied


Re: [PATCH] net: VRF: Fix receiving multicast traffic

2016-09-22 Thread Mark Tomlinson

On 09/23/2016 03:14 AM, David Ahern wrote:
>
> l3mdev devices do not support IPv4 multicast so checking mcast against that 
> device should not be working at all. For that reason I was fine with the 
> change in the previous patch. ie., you want the real ingress device there not 
> the vrf device.
>
> What test are you running that says your previous patch broke something?
Although we do not expect any multicast routing to work in an l3mdev, 
(IGMP snooping or PIM), we still want to have multicast packets 
delivered for protocols such as RIP. This was working before my previous 
patch, but these multicast packets are now dropped. This current patch 
fixes that again, hopefully still with the benefits of my first patch.


Re: [PATCH net-next] tcp: add tcp_add_backlog()

2016-09-22 Thread Eric Dumazet
On Thu, 2016-09-22 at 19:34 -0300, Marcelo Ricardo Leitner wrote:
> On Sat, Aug 27, 2016 at 07:37:54AM -0700, Eric Dumazet wrote:
> > +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
> > +{
> > +   u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
>  ^^^
> ...
> > +   if (!skb->data_len)
> > +   skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> > +
> > +   if (unlikely(sk_add_backlog(sk, skb, limit))) {
> ...
> > -   } else if (unlikely(sk_add_backlog(sk, skb,
> > -  sk->sk_rcvbuf + sk->sk_sndbuf))) {
>^ [1]
> > -   bh_unlock_sock(sk);
> > -   __NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> > +   } else if (tcp_add_backlog(sk, skb)) {
> 
> Hi Eric, after this patch, do you think we still need to add sk_sndbuf
> as a stretching factor to the backlog here?
> 
> It was added by [1] and it was justified that the (s)ack packets were
> just too big for the rx buf size. Maybe this new patch alone is enough
> already, as such packets will have a very small truesize then.
> 
>   Marcelo
> 
> [1] da882c1f2eca ("tcp: sk_add_backlog() is too agressive for TCP")
> 

Hi Marcelo

Yes, it is still needed, some drivers provide linear skbs, so the
skb->truesize of ack packets will likely be the same (skb->head points
to a full size frame allocated by the driver)






Re: [PATCH net-next] Documentation: devicetree: revise ethernet device-tree binding about TRGMII

2016-09-22 Thread Sean Wang
Date: Thu, 22 Sep 2016 19:48:47 +0300, Sergei Shtylyov 
 wrote:
>On 09/22/2016 07:16 PM, sean.w...@mediatek.com wrote:
>
>> From: Sean Wang 
>>
>> fix typo in mediatek-net.txt and add phy-mode "trgmii" to ethernet.txt
>
>These changes are unrelated to each other, so there should be 2 separate 
>patches. And have the patches I reviewed been merged already, why are you 
>sending an incremental patch?
>

okay, I will make them into distinct patchs.

I saw they had been applied so I created an incremental
patch based on codebase after applied.

>> Cc: devicet...@vger.kernel.org
>> Reported-by: Sergei Shtylyov 
>> Signed-off-by: Sean Wang 
>[...]
>
>MBR, Sergei
>
>


[PATCH 1/7] hv_netvsc: use consume_skb

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

Packets that are transmitted in normal path should use consume_skb
instead of kfree_skb. This allows for better tracing of packet drops.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index ff05b9b..720b5fa 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -635,7 +635,7 @@ static void netvsc_send_tx_complete(struct netvsc_device 
*net_device,
q_idx = nvsc_packet->q_idx;
channel = incoming_channel;
 
-   dev_kfree_skb_any(skb);
+   dev_consume_skb_any(skb);
}
 
num_outstanding_sends =
@@ -944,7 +944,7 @@ int netvsc_send(struct hv_device *device,
}
 
if (msdp->skb)
-   dev_kfree_skb_any(msdp->skb);
+   dev_consume_skb_any(msdp->skb);
 
if (xmit_more && !packet->cp_partial) {
msdp->skb = skb;
-- 
1.7.4.1



[PATCH 3/7] hv_netvsc: simplify callback event code

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

The callback handler for netlink events can be simplified:
 * Consolidate check for netlink callback events about this driver itself.
 * Ignore non-Ethernet devices.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c |   28 ++--
 1 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index e74dbcc..849b566 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1238,10 +1238,6 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
struct net_device *ndev;
struct net_device_context *net_device_ctx;
struct netvsc_device *netvsc_dev;
-   const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
-
-   if (eth_ops == NULL || eth_ops == _ops)
-   return NOTIFY_DONE;
 
/*
 * We will use the MAC address to locate the synthetic interface to
@@ -1286,12 +1282,8 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
 {
struct net_device *ndev;
struct netvsc_device *netvsc_dev;
-   const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
struct net_device_context *net_device_ctx;
 
-   if (eth_ops == _ops)
-   return NOTIFY_DONE;
-
ndev = get_netvsc_net_device(vf_netdev->dev_addr);
if (!ndev)
return NOTIFY_DONE;
@@ -1329,10 +1321,6 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
struct net_device *ndev;
struct netvsc_device *netvsc_dev;
struct net_device_context *net_device_ctx;
-   const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
-
-   if (eth_ops == _ops)
-   return NOTIFY_DONE;
 
ndev = get_netvsc_net_device(vf_netdev->dev_addr);
if (!ndev)
@@ -1361,12 +1349,8 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
 {
struct net_device *ndev;
struct netvsc_device *netvsc_dev;
-   const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
struct net_device_context *net_device_ctx;
 
-   if (eth_ops == _ops)
-   return NOTIFY_DONE;
-
ndev = get_netvsc_net_device(vf_netdev->dev_addr);
if (!ndev)
return NOTIFY_DONE;
@@ -1542,13 +1526,21 @@ static int netvsc_netdev_event(struct notifier_block 
*this,
 {
struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
 
+   /* Skip our own events */
+   if (event_dev->netdev_ops == _ops)
+   return NOTIFY_DONE;
+
+   /* Avoid non-Ethernet type devices */
+   if (event_dev->type != ARPHRD_ETHER)
+   return NOTIFY_DONE;
+
/* Avoid Vlan dev with same MAC registering as VF */
if (event_dev->priv_flags & IFF_802_1Q_VLAN)
return NOTIFY_DONE;
 
/* Avoid Bonding master dev with same MAC registering as VF */
-   if (event_dev->priv_flags & IFF_BONDING &&
-   event_dev->flags & IFF_MASTER)
+   if ((event_dev->priv_flags & IFF_BONDING) &&
+   (event_dev->flags & IFF_MASTER))
return NOTIFY_DONE;
 
switch (event) {
-- 
1.7.4.1



[PATCH 7/7] hv_netvsc: count multicast packets received

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

Useful for debugging issues with multicast and SR-IOV to keep track
of number of received multicast packets.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h |2 ++
 drivers/net/hyperv/netvsc_drv.c |9 -
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 1d49740..7130bf9 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -649,6 +649,8 @@ struct multi_recv_comp {
 struct netvsc_stats {
u64 packets;
u64 bytes;
+   u64 broadcast;
+   u64 multicast;
struct u64_stats_sync syncp;
 };
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 9375d82..52eeb2f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -705,6 +705,11 @@ int netvsc_recv_callback(struct hv_device *device_obj,
u64_stats_update_begin(_stats->syncp);
rx_stats->packets++;
rx_stats->bytes += packet->total_data_buflen;
+
+   if (skb->pkt_type == PACKET_BROADCAST)
+   ++rx_stats->broadcast;
+   else if (skb->pkt_type == PACKET_MULTICAST)
+   ++rx_stats->multicast;
u64_stats_update_end(_stats->syncp);
 
/*
@@ -947,7 +952,7 @@ static struct rtnl_link_stats64 *netvsc_get_stats64(struct 
net_device *net,
cpu);
struct netvsc_stats *rx_stats = per_cpu_ptr(ndev_ctx->rx_stats,
cpu);
-   u64 tx_packets, tx_bytes, rx_packets, rx_bytes;
+   u64 tx_packets, tx_bytes, rx_packets, rx_bytes, rx_multicast;
unsigned int start;
 
do {
@@ -960,12 +965,14 @@ static struct rtnl_link_stats64 
*netvsc_get_stats64(struct net_device *net,
start = u64_stats_fetch_begin_irq(_stats->syncp);
rx_packets = rx_stats->packets;
rx_bytes = rx_stats->bytes;
+   rx_multicast = rx_stats->multicast + 
rx_stats->broadcast;
} while (u64_stats_fetch_retry_irq(_stats->syncp, start));
 
t->tx_bytes += tx_bytes;
t->tx_packets   += tx_packets;
t->rx_bytes += rx_bytes;
t->rx_packets   += rx_packets;
+   t->multicast+= rx_multicast;
}
 
t->tx_dropped   = net->stats.tx_dropped;
-- 
1.7.4.1



Re: [PATCH iproute2] misc/ss: tcp cwnd should be unsigned

2016-09-22 Thread Stephen Hemminger
On Thu, 22 Sep 2016 16:40:28 +0800
Hangbin Liu  wrote:

> tcp->snd_cwd is a u32, but ss treats it like a signed int. This may
> results in negative bandwidth calculations.
> 
> Signed-off-by: Hangbin Liu 

Sure applied.


Re: [PATCH net] act_ife: Add support for machines with hard_header_len != mac_len

2016-09-22 Thread Jamal Hadi Salim

On 16-09-21 08:54 AM, Yotam Gigi wrote:

Without that fix, the following could occur:
 - On encode ingress, the total amount of skb_pushes (in lines 751 and
   753) was more than specified in cow.
 - On machines with hard_header_len > mac_len, the packet format was not


Just curious: What hardware would this be?



Fixes: ef6980b6becb ("net sched: introduce IFE action")
Signed-off-by: Yotam Gigi 
---
 net/sched/act_ife.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e87cd81..27b19ca 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -708,11 +708,13 @@ static int tcf_ife_encode(struct sk_buff *skb, const 
struct tc_action *a,
   where ORIGDATA = original ethernet header ...
 */
u16 metalen = ife_get_sz(skb, ife);
-   int hdrm = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
-   unsigned int skboff = skb->dev->hard_header_len;
u32 at = G_TC_AT(skb->tc_verd);
-   int new_len = skb->len + hdrm;
bool exceed_mtu = false;
+   unsigned int skboff;
+   int total_push;
+   int reserve;
+   int new_len;
+   int hdrm;
int err;

if (at & AT_EGRESS) {
@@ -724,6 +726,22 @@ static int tcf_ife_encode(struct sk_buff *skb, const 
struct tc_action *a,
bstats_update(>tcf_bstats, skb);
tcf_lastuse_update(>tcf_tm);

+   if (at & AT_EGRESS) {
+   /* on egress, reserve space for hard_header_len instead of
+* mac_len
+*/
+   skb_reset_mac_len(skb);


The skb_reset_mac_len() above is unneeded.


+   hdrm = metalen + skb->mac_len + IFE_METAHDRLEN;


Can you move this line outside of the if? It appears on the else
so factoring it out is useful.


+   total_push = hdrm;
+   reserve = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
+   } else {
+   /* on ingress, push mac_len as it already get parsed from tc */
+   hdrm = metalen + skb->mac_len + IFE_METAHDRLEN;
+   total_push = hdrm + skb->mac_len;
+   reserve = total_push;
+   }
+   new_len =  skb->len + hdrm;
+
if (!metalen) { /* no metadata to send */
/* abuse overlimits to count when we allow packet
 * with no metadata
@@ -742,19 +760,17 @@ static int tcf_ife_encode(struct sk_buff *skb, const 
struct tc_action *a,

iethh = eth_hdr(skb);

-   err = skb_cow_head(skb, hdrm);
+   err = skb_cow_head(skb, reserve);
if (unlikely(err)) {
ife->tcf_qstats.drops++;
spin_unlock(>tcf_lock);
return TC_ACT_SHOT;
}

-   if (!(at & AT_EGRESS))
-   skb_push(skb, skb->dev->hard_header_len);
-
-   __skb_push(skb, hdrm);
+   __skb_push(skb, total_push);
memcpy(skb->data, iethh, skb->mac_len);
skb_reset_mac_header(skb);
+   skboff += skb->mac_len;


Above looks dangerous. Did the compiler not warn?
Maybe init skboff to skb->mac_len at the top.

Otherwise the ingress bits look good. Thanks!

Please fix above and resend with:
Signed-off-by: Jamal Hadi Salim 

cheers,
jamal


[PATCH net-next 1/3] bpf: use skb_to_full_sk helper in bpf_skb_under_cgroup

2016-09-22 Thread Daniel Borkmann
We need to use skb_to_full_sk() helper introduced in commit bd5eb35f16a9
("xfrm: take care of request sockets") as otherwise we miss tcp synack
messages, since ownership is on request socket and therefore it would
miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly
in the bpf_get_cgroup_classid() helper via 2309236c13fe ("cls_cgroup:
get sk_classid only from full sockets") fix to not let this fall through.

Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 0920c2a..e5d9977 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2408,7 +2408,7 @@ BPF_CALL_3(bpf_skb_under_cgroup, struct sk_buff *, skb, 
struct bpf_map *, map,
struct cgroup *cgrp;
struct sock *sk;
 
-   sk = skb->sk;
+   sk = skb_to_full_sk(skb);
if (!sk || !sk_fullsock(sk))
return -ENOENT;
if (unlikely(idx >= array->map.max_entries))
-- 
1.9.3



Re: [Patch net] sch_sfb: keep backlog updated with qlen

2016-09-22 Thread Jamal Hadi Salim

On 16-09-18 07:22 PM, Cong Wang wrote:

Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


[PATCH net-next 0/3] Few minor BPF helper improvements

2016-09-22 Thread Daniel Borkmann
Just a few minor improvements around BPF helpers, first one is a
fix but given this late stage and that it's not really a critical
one, I think net-next is just fine. For details please see the
individual patches.

Thanks!

Daniel Borkmann (3):
  bpf: use skb_to_full_sk helper in bpf_skb_under_cgroup
  bpf: use bpf_get_smp_processor_id_proto instead of raw one
  bpf: add helper to invalidate hash

 include/uapi/linux/bpf.h |  7 +++
 net/core/filter.c| 22 +-
 2 files changed, 28 insertions(+), 1 deletion(-)

-- 
1.9.3



[PATCH 2/7] hv_netvsc: dev hold/put reference to VF

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

The netvsc driver holds a pointer to the virtual function network device if
managing SR-IOV association. In order to ensure that the VF network device
does not disappear, it should be using dev_hold/dev_put to get a reference
count.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 2360e70..e74dbcc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1262,6 +1262,8 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
 * Take a reference on the module.
 */
try_module_get(THIS_MODULE);
+
+   dev_hold(vf_netdev);
net_device_ctx->vf_netdev = vf_netdev;
return NOTIFY_OK;
 }
@@ -1376,6 +1378,7 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
netvsc_inject_disable(net_device_ctx);
net_device_ctx->vf_netdev = NULL;
+   dev_put(vf_netdev);
module_put(THIS_MODULE);
return NOTIFY_OK;
 }
-- 
1.7.4.1



[PATCH 4/7] hv_netvsc: improve VF device matching

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

The code to associate netvsc and VF devices can be made less error prone
by using a better matching algorithms.

On registration, use the permanent address which avoids any possible
issues caused by device MAC address being changed. For all other callbacks,
search by the netdevice pointer value to ensure getting the correct
network device.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/netvsc_drv.c |   60 +-
 1 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 849b566..8768219 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1215,22 +1215,44 @@ static void netvsc_free_netdev(struct net_device 
*netdev)
free_netdev(netdev);
 }
 
-static struct net_device *get_netvsc_net_device(char *mac)
+static struct net_device *get_netvsc_bymac(const u8 *mac)
 {
-   struct net_device *dev, *found = NULL;
+   struct net_device *dev;
 
ASSERT_RTNL();
 
for_each_netdev(_net, dev) {
-   if (memcmp(dev->dev_addr, mac, ETH_ALEN) == 0) {
-   if (dev->netdev_ops != _ops)
-   continue;
-   found = dev;
-   break;
-   }
+   if (dev->netdev_ops != _ops)
+   continue;   /* not a netvsc device */
+
+   if (ether_addr_equal(mac, dev->perm_addr))
+   return dev;
+   }
+
+   return NULL;
+}
+
+static struct net_device *get_netvsc_byref(const struct net_device *vf_netdev)
+{
+   struct net_device *dev;
+
+   ASSERT_RTNL();
+
+   for_each_netdev(_net, dev) {
+   struct net_device_context *net_device_ctx;
+
+   if (dev->netdev_ops != _ops)
+   continue;   /* not a netvsc device */
+
+   net_device_ctx = netdev_priv(dev);
+   if (net_device_ctx->nvdev == NULL)
+   continue;   /* device is removed */
+
+   if (net_device_ctx->vf_netdev == vf_netdev)
+   return dev; /* a match */
}
 
-   return found;
+   return NULL;
 }
 
 static int netvsc_register_vf(struct net_device *vf_netdev)
@@ -1239,12 +1261,15 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
struct net_device_context *net_device_ctx;
struct netvsc_device *netvsc_dev;
 
+   if (vf_netdev->addr_len != ETH_ALEN)
+   return NOTIFY_DONE;
+
/*
 * We will use the MAC address to locate the synthetic interface to
 * associate with the VF interface. If we don't find a matching
 * synthetic interface, move on.
 */
-   ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+   ndev = get_netvsc_bymac(vf_netdev->perm_addr);
if (!ndev)
return NOTIFY_DONE;
 
@@ -1284,16 +1309,13 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
struct netvsc_device *netvsc_dev;
struct net_device_context *net_device_ctx;
 
-   ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+   ndev = get_netvsc_byref(vf_netdev);
if (!ndev)
return NOTIFY_DONE;
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
 
-   if (!netvsc_dev || !net_device_ctx->vf_netdev)
-   return NOTIFY_DONE;
-
netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
netvsc_inject_enable(net_device_ctx);
 
@@ -1322,16 +1344,13 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
struct netvsc_device *netvsc_dev;
struct net_device_context *net_device_ctx;
 
-   ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+   ndev = get_netvsc_byref(vf_netdev);
if (!ndev)
return NOTIFY_DONE;
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
 
-   if (!netvsc_dev || !net_device_ctx->vf_netdev)
-   return NOTIFY_DONE;
-
netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
netvsc_inject_disable(net_device_ctx);
netvsc_switch_datapath(ndev, false);
@@ -1351,14 +1370,13 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
struct netvsc_device *netvsc_dev;
struct net_device_context *net_device_ctx;
 
-   ndev = get_netvsc_net_device(vf_netdev->dev_addr);
+   ndev = get_netvsc_byref(vf_netdev);
if (!ndev)
return NOTIFY_DONE;
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
-   if (!netvsc_dev || !net_device_ctx->vf_netdev)
-   return NOTIFY_DONE;
+
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
netvsc_inject_disable(net_device_ctx);

[PATCH 5/7] hv_netvsc: use RCU to protect vf_netdev

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

The vf_netdev pointer in the netvsc device context can simply be protected
by RCU because network device destruction is already RCU synchronized.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h |2 +-
 drivers/net/hyperv/netvsc_drv.c |   29 +++--
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 284b97b..6b79487 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -695,7 +695,7 @@ struct net_device_context {
bool start_remove;
 
/* State to manage the associated VF interface. */
-   struct net_device *vf_netdev;
+   struct net_device __rcu *vf_netdev;
bool vf_inject;
atomic_t vf_use_cnt;
/* 1: allocated, serial number is valid. 0: not allocated */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 8768219..dde17c0 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -667,8 +667,8 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 {
struct net_device *net = hv_get_drvdata(device_obj);
struct net_device_context *net_device_ctx = netdev_priv(net);
+   struct net_device *vf_netdev;
struct sk_buff *skb;
-   struct sk_buff *vf_skb;
struct netvsc_stats *rx_stats;
u32 bytes_recvd = packet->total_data_buflen;
int ret = 0;
@@ -676,9 +676,12 @@ int netvsc_recv_callback(struct hv_device *device_obj,
if (!net || net->reg_state != NETREG_REGISTERED)
return NVSP_STAT_FAIL;
 
-   if (READ_ONCE(net_device_ctx->vf_inject)) {
+   vf_netdev = rcu_dereference(net_device_ctx->vf_netdev);
+   if (vf_netdev) {
+   struct sk_buff *vf_skb;
+
atomic_inc(_device_ctx->vf_use_cnt);
-   if (!READ_ONCE(net_device_ctx->vf_inject)) {
+   if (!net_device_ctx->vf_inject) {
/*
 * We raced; just move on.
 */
@@ -694,13 +697,12 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 * the host). Deliver these via the VF interface
 * in the guest.
 */
-   vf_skb = netvsc_alloc_recv_skb(net_device_ctx->vf_netdev,
+   vf_skb = netvsc_alloc_recv_skb(vf_netdev,
   packet, csum_info, *data,
   vlan_tci);
if (vf_skb != NULL) {
-   ++net_device_ctx->vf_netdev->stats.rx_packets;
-   net_device_ctx->vf_netdev->stats.rx_bytes +=
-   bytes_recvd;
+   ++vf_netdev->stats.rx_packets;
+   vf_netdev->stats.rx_bytes += bytes_recvd;
netif_receive_skb(vf_skb);
} else {
++net->stats.rx_dropped;
@@ -1232,7 +1234,7 @@ static struct net_device *get_netvsc_bymac(const u8 *mac)
return NULL;
 }
 
-static struct net_device *get_netvsc_byref(const struct net_device *vf_netdev)
+static struct net_device *get_netvsc_byref(struct net_device *vf_netdev)
 {
struct net_device *dev;
 
@@ -1248,7 +1250,7 @@ static struct net_device *get_netvsc_byref(const struct 
net_device *vf_netdev)
if (net_device_ctx->nvdev == NULL)
continue;   /* device is removed */
 
-   if (net_device_ctx->vf_netdev == vf_netdev)
+   if (rtnl_dereference(net_device_ctx->vf_netdev) == vf_netdev)
return dev; /* a match */
}
 
@@ -1275,7 +1277,7 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
-   if (!netvsc_dev || net_device_ctx->vf_netdev)
+   if (!netvsc_dev || rtnl_dereference(net_device_ctx->vf_netdev))
return NOTIFY_DONE;
 
netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
@@ -1285,7 +1287,7 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
try_module_get(THIS_MODULE);
 
dev_hold(vf_netdev);
-   net_device_ctx->vf_netdev = vf_netdev;
+   rcu_assign_pointer(net_device_ctx->vf_netdev, vf_netdev);
return NOTIFY_OK;
 }
 
@@ -1379,7 +1381,8 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
 
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
netvsc_inject_disable(net_device_ctx);
-   net_device_ctx->vf_netdev = NULL;
+
+   RCU_INIT_POINTER(net_device_ctx->vf_netdev, NULL);
dev_put(vf_netdev);
module_put(THIS_MODULE);
return NOTIFY_OK;
@@ -1433,8 +1436,6 @@ static int netvsc_probe(struct hv_device *dev,
  

[PATCH 6/7] hv_netvsc: remove VF in flight counters

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

Since VF reference is now protected by RCU, no longer need the VF usage
counter and can use device flags to see whether to inject or not.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h |3 +-
 drivers/net/hyperv/netvsc_drv.c |   81 ++-
 2 files changed, 21 insertions(+), 63 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 6b79487..1d49740 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -696,8 +696,7 @@ struct net_device_context {
 
/* State to manage the associated VF interface. */
struct net_device __rcu *vf_netdev;
-   bool vf_inject;
-   atomic_t vf_use_cnt;
+
/* 1: allocated, serial number is valid. 0: not allocated */
u32 vf_alloc;
/* Serial number of the VF to team with */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index dde17c0..9375d82 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -670,50 +670,20 @@ int netvsc_recv_callback(struct hv_device *device_obj,
struct net_device *vf_netdev;
struct sk_buff *skb;
struct netvsc_stats *rx_stats;
-   u32 bytes_recvd = packet->total_data_buflen;
-   int ret = 0;
 
-   if (!net || net->reg_state != NETREG_REGISTERED)
+   if (net->reg_state != NETREG_REGISTERED)
return NVSP_STAT_FAIL;
 
+   /*
+* If necessary, inject this packet into the VF interface.
+* On Hyper-V, multicast and brodcast packets are only delivered
+* to the synthetic interface (after subjecting these to
+* policy filters on the host). Deliver these via the VF
+* interface in the guest.
+*/
vf_netdev = rcu_dereference(net_device_ctx->vf_netdev);
-   if (vf_netdev) {
-   struct sk_buff *vf_skb;
-
-   atomic_inc(_device_ctx->vf_use_cnt);
-   if (!net_device_ctx->vf_inject) {
-   /*
-* We raced; just move on.
-*/
-   atomic_dec(_device_ctx->vf_use_cnt);
-   goto vf_injection_done;
-   }
-
-   /*
-* Inject this packet into the VF inerface.
-* On Hyper-V, multicast and brodcast packets
-* are only delivered on the synthetic interface
-* (after subjecting these to policy filters on
-* the host). Deliver these via the VF interface
-* in the guest.
-*/
-   vf_skb = netvsc_alloc_recv_skb(vf_netdev,
-  packet, csum_info, *data,
-  vlan_tci);
-   if (vf_skb != NULL) {
-   ++vf_netdev->stats.rx_packets;
-   vf_netdev->stats.rx_bytes += bytes_recvd;
-   netif_receive_skb(vf_skb);
-   } else {
-   ++net->stats.rx_dropped;
-   ret = NVSP_STAT_FAIL;
-   }
-   atomic_dec(_device_ctx->vf_use_cnt);
-   return ret;
-   }
-
-vf_injection_done:
-   rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
+   if (vf_netdev && (vf_netdev->flags & IFF_UP))
+   net = vf_netdev;
 
/* Allocate a skb - TODO direct I/O to pages? */
skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data, vlan_tci);
@@ -721,9 +691,17 @@ vf_injection_done:
++net->stats.rx_dropped;
return NVSP_STAT_FAIL;
}
-   skb_record_rx_queue(skb, channel->
-   offermsg.offer.sub_channel_index);
 
+   if (net != vf_netdev)
+   skb_record_rx_queue(skb,
+   channel->offermsg.offer.sub_channel_index);
+
+   /*
+* Even if injecting the packet, record the statistics
+* on the synthetic device because modifying the VF device
+* statistics will not work correctly.
+*/
+   rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
u64_stats_update_begin(_stats->syncp);
rx_stats->packets++;
rx_stats->bytes += packet->total_data_buflen;
@@ -1291,20 +1269,6 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
return NOTIFY_OK;
 }
 
-static void netvsc_inject_enable(struct net_device_context *net_device_ctx)
-{
-   net_device_ctx->vf_inject = true;
-}
-
-static void netvsc_inject_disable(struct net_device_context *net_device_ctx)
-{
-   net_device_ctx->vf_inject = false;
-
-   /* Wait for currently active users to drain out. */
-   while (atomic_read(_device_ctx->vf_use_cnt) != 0)
-   udelay(50);
-}
-
 static int 

[PATCH net-next 0/7] hv_netvsc changes

2016-09-22 Thread sthemmin
From: Stephen Hemminger 

These are mostly about improving the handling of interaction between
the virtual network device (netvsc) and the SR-IOV VF network device.

Stephen Hemminger (7):
  hv_netvsc: use consume_skb
  hv_netvsc: dev hold/put reference to VF
  hv_netvsc: simplify callback event code
  hv_netvsc: improve VF device matching
  hv_netvsc: use RCU to protect vf_netdev
  hv_netvsc: remove VF in flight counters
  hv_netvsc: count multicast packets received

 drivers/net/hyperv/hyperv_net.h |7 +-
 drivers/net/hyperv/netvsc.c |4 +-
 drivers/net/hyperv/netvsc_drv.c |  188 +-
 3 files changed, 90 insertions(+), 109 deletions(-)

-- 
1.7.4.1



Re: [PATCH net 1/2] act_ife: Fix external mac header on encode

2016-09-22 Thread Jamal Hadi Salim

On 16-09-22 08:55 AM, Yotam Gigi wrote:

On ife encode side, external mac header is copied from the original packet
and may be overridden if the user requests. Before, the mac header copy
was done from memory region that might not be accessible anymore, as
skb_cow_head might free it and copy the packet. This led to random values
in the external mac header once the values were not set by user.

This fix takes the internal mac header from the packet, after the call to
skb_cow_head.



Since this depends on the previous patch, can you double check for me? I 
will test later, but here's a very simple test case:


sudo $TC qdisc del dev $ETH root handle 1: prio
sudo $TC qdisc add dev $ETH root handle 1: prio

#set mark of decimal 17 and allow sending out
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

I am not going to comment on your other patch, but i suggest you
test with with this (encoding at least two TLVs):

sudo $TC qdisc del dev $ETH root handle 1: prio
sudo $TC qdisc add dev $ETH root handle 1: prio
#Override mark and send prio of 0x33 (unfortunately
#skbedit is not very consistent 33 means 0x33)
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
u32 match ip protocol 1 0xff flowid 1:2 \
action skbedit prio 33 \
action ife encode \
type 0xDEAD \
use mark 12 \
allow prio \
dst 02:15:15:15:15:15


cheers,
jamal


Re: [Patch net-next] net_sched: check NULL on error path in route4_change()

2016-09-22 Thread Jamal Hadi Salim

On 16-09-18 06:52 PM, Cong Wang wrote:

On error path in route4_change(), 'f' could be NULL,
so we should check NULL before calling tcf_exts_destroy().

Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of 
tcf_exts_init()")
Reported-by: kbuild test robot 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 


Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: [PATCH net-next v2 0/3] add support for RGMII on GMAC0 through TRGMII hardware module

2016-09-22 Thread David Miller
From: Sergei Shtylyov 
Date: Thu, 22 Sep 2016 20:08:47 +0300

>Despite my comments? Sigh...

Sorry, I thought he had addressed your feedback in v2.

I'll wait longer next time.


[PATCH net] tcp: fix a compile error in DBGUNDO()

2016-09-22 Thread Eric Dumazet
From: Eric Dumazet 

If DBGUNDO() is enabled (FASTRETRANS_DEBUG > 1), a compile
error will happen, since inet6_sk(sk)->daddr became sk->sk_v6_daddr

Fixes: efe4208f47f9 ("ipv6: make lookups simpler and faster")
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 08323bd95f2a..a756b8749a26 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2329,10 +2329,9 @@ static void DBGUNDO(struct sock *sk, const char *msg)
}
 #if IS_ENABLED(CONFIG_IPV6)
else if (sk->sk_family == AF_INET6) {
-   struct ipv6_pinfo *np = inet6_sk(sk);
pr_debug("Undo %s %pI6/%u c%u l%u ss%u/%u p%u\n",
 msg,
->daddr, ntohs(inet->inet_dport),
+>sk_v6_daddr, ntohs(inet->inet_dport),
 tp->snd_cwnd, tcp_left_out(tp),
 tp->snd_ssthresh, tp->prior_ssthresh,
 tp->packets_out);




Re: [PATCH net-next] tcp: add tcp_add_backlog()

2016-09-22 Thread Marcelo Ricardo Leitner
On Sat, Aug 27, 2016 at 07:37:54AM -0700, Eric Dumazet wrote:
> +bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
> +{
> + u32 limit = sk->sk_rcvbuf + sk->sk_sndbuf;
 ^^^
...
> + if (!skb->data_len)
> + skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> +
> + if (unlikely(sk_add_backlog(sk, skb, limit))) {
...
> - } else if (unlikely(sk_add_backlog(sk, skb,
> -sk->sk_rcvbuf + sk->sk_sndbuf))) {
 ^ [1]
> - bh_unlock_sock(sk);
> - __NET_INC_STATS(net, LINUX_MIB_TCPBACKLOGDROP);
> + } else if (tcp_add_backlog(sk, skb)) {

Hi Eric, after this patch, do you think we still need to add sk_sndbuf
as a stretching factor to the backlog here?

It was added by [1] and it was justified that the (s)ack packets were
just too big for the rx buf size. Maybe this new patch alone is enough
already, as such packets will have a very small truesize then.

  Marcelo

[1] da882c1f2eca ("tcp: sk_add_backlog() is too agressive for TCP")



Re: [PATCH] net: VRF: Fix receiving multicast traffic

2016-09-22 Thread David Ahern
On 9/22/16 4:10 PM, Mark Tomlinson wrote:
> 
> On 09/23/2016 03:14 AM, David Ahern wrote:
>>
>> l3mdev devices do not support IPv4 multicast so checking mcast against that 
>> device should not be working at all. For that reason I was fine with the 
>> change in the previous patch. ie., you want the real ingress device there 
>> not the vrf device.
>>
>> What test are you running that says your previous patch broke something?
> Although we do not expect any multicast routing to work in an l3mdev, 
> (IGMP snooping or PIM), we still want to have multicast packets 
> delivered for protocols such as RIP. This was working before my previous 
> patch, but these multicast packets are now dropped. This current patch 
> fixes that again, hopefully still with the benefits of my first patch.
> 

can you discern which check is making that happen?

It does not make sense to look at the in_device of a vrf device for mcast 
addresses. For IPv6 linklocal and mcast is specifically blocked. IPv4 should do 
the same. So, how is RIP getting the packet at all?


Re: [PATCH iproute2] ss: Support displaying and filtering on socket marks.

2016-09-22 Thread Stephen Hemminger
On Thu, 22 Sep 2016 01:02:50 +0900
Lorenzo Colitti  wrote:

> This allows the user to dump sockets with a given mark (via
> "fwmark = 0x1234/0x1234" or "fwmark = 12345", etc.) , and to
> display the socket marks of dumped sockets.
> 
> The relevant kernel commits are: d545caca827b ("net: inet: diag:
> expose the socket mark to privileged processes.") and
> - a52e95abf772 ("net: diag: allow socket bytecode filters to
> match socket marks")
> 
> Signed-off-by: Lorenzo Colitti 

Applied to net-next.



Re: [PATCH iproute2 2/2] ip rule: add selector support

2016-09-22 Thread Hangbin Liu
2016-09-22 16:45 GMT+08:00 Phil Sutter :
> On Thu, Sep 22, 2016 at 02:28:49PM +0800, Hangbin Liu wrote:
> [...]
>> diff --git a/man/man8/ip-rule.8 b/man/man8/ip-rule.8
>> index 3508d80..ec0e31d 100644
>> --- a/man/man8/ip-rule.8
>> +++ b/man/man8/ip-rule.8
>> @@ -15,7 +15,8 @@ ip-rule \- routing policy database management
>>
>>  .ti -8
>>  .B  ip rule
>> -.RB "[ " list " ]"
>> +.RB "[ " list
>> +.I "[ " SELECTOR " ]]"
>
> This makes the brackets cursive, too. Better use this instead:
>
> | .RI "[ " SELECTOR " ]]"

Thanks Phil, I'm not familiar with man doc syntax :)

Cheers
Hangbin


Re: [PATCH] net: VRF: Fix receiving multicast traffic

2016-09-22 Thread Mark Tomlinson

On 09/23/2016 10:41 AM, David Ahern wrote:
> On 9/22/16 4:10 PM, Mark Tomlinson wrote:
>> On 09/23/2016 03:14 AM, David Ahern wrote:
>>> l3mdev devices do not support IPv4 multicast so checking mcast against that 
>>> device should not be working at all. For that reason I was fine with the 
>>> change in the previous patch. ie., you want the real ingress device there 
>>> not the vrf device.
>>>
>>> What test are you running that says your previous patch broke something?
>> Although we do not expect any multicast routing to work in an l3mdev,
>> (IGMP snooping or PIM), we still want to have multicast packets
>> delivered for protocols such as RIP. This was working before my previous
>> patch, but these multicast packets are now dropped. This current patch
>> fixes that again, hopefully still with the benefits of my first patch.
>>
> can you discern which check is making that happen?
>
> It does not make sense to look at the in_device of a vrf device for mcast 
> addresses. For IPv6 linklocal and mcast is specifically blocked. IPv4 should 
> do the same. So, how is RIP getting the packet at all?
This might be due to some other changes we've made for VRF and multicast 
but haven't sent upstream. In particular, a change to do_ip_setsockopt() 
and its handling of IP_MULTICAST_IF as well as IP_ADD/DROP_MEMBERSHIP. I 
am guessing that without these changes, we wouldn't be able to receive 
multicast packets in RIP. With our changes, the in_dev->mc_list does 
contain the RIP MC address (224.0.0.9) in the master interface, and so 
the function ip_check_mc_rcu() returns success with the master only.

Our RIP daemon is VRF-aware. So it does use setsockopt(SO_BINDTODEVICE, 
"vrf-master") when running in a VRF. Without following it all the way 
down, I believe that it is this that allows the multicast lookup at the 
top of ip_check_mc_rcu() to succeed on the vrf-master, but not the 
ingress interface. That is, in_dev->mc_list does contain 224.0.0.9 only 
on the vrf-master. Provided the lookup in ip_check_mc_rcu() succeeds (im 
!= NULL), this function can return success.

Are you interested in the other patches at the moment?


Re: [PATCH net-next 0/4] net: dsa: add port fast ageing

2016-09-22 Thread Florian Fainelli


On 09/22/2016 01:49 PM, Vivien Didelot wrote:
> Today the DSA drivers are in charge of flushing the MAC addresses
> associated to a port when its STP state changes from Learning or
> Forwarding, to Disabled or Blocking or Listening.
> 
> This makes the drivers more complex and hides this generic switch logic.
> 
> This patchset introduces a new optional port_fast_age operation to
> dsa_switch_ops, to move this logic to the DSA layer and keep drivers
> simple. b53 and mv88e6xxx are updated accordingly.

This looks good, just one minor thing, both the b53 and mv88e6xxx can
actually return an error from fast ageing a port, should we account for
that? Not that we would be doing something about it though...

Reviewed-by: Florian Fainelli 

> 
> Vivien Didelot (4):
>   net: dsa: add port STP state helper
>   net: dsa: add port fast ageing
>   net: dsa: b53: implement DSA port fast ageing
>   net: dsa: mv88e6xxx: implement DSA port fast ageing
> 
>  drivers/net/dsa/b53/b53_common.c | 31 ++-
>  drivers/net/dsa/mv88e6xxx/chip.c | 45 
> 
>  include/net/dsa.h|  2 ++
>  net/dsa/slave.c  | 35 ---
>  4 files changed, 64 insertions(+), 49 deletions(-)
> 


[net-next v2 03/10] i40e: return correct opcode to VF

2016-09-22 Thread Jeff Kirsher
From: Mitch Williams 

This conditional is backward, so the driver responds back to the VF with
the wrong opcode. Do the old switcheroo to fix this.

Change-ID: I384035b0fef8a3881c176de4b4672009b3400b25
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index da34235..611fc87 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2217,8 +2217,8 @@ static int i40e_vc_iwarp_qvmap_msg(struct i40e_vf *vf, u8 
*msg, u16 msglen,
 error_param:
/* send the response to the VF */
return i40e_vc_send_resp_to_vf(vf,
-  config ? I40E_VIRTCHNL_OP_RELEASE_IWARP_IRQ_MAP :
-  I40E_VIRTCHNL_OP_CONFIG_IWARP_IRQ_MAP,
+  config ? I40E_VIRTCHNL_OP_CONFIG_IWARP_IRQ_MAP :
+  I40E_VIRTCHNL_OP_RELEASE_IWARP_IRQ_MAP,
   aq_ret);
 }
 
-- 
2.7.4



[net-next v2 07/10] i40evf: Fix link state event handling

2016-09-22 Thread Jeff Kirsher
From: Sridhar Samudrala 

Currently disabling the link state from PF via
ip link set enp5s0f0 vf 0 state disable
doesn't disable the CARRIER on the VF.

This patch updates the carrier and starts/stops the tx queues based on the
link state notification from PF.

  PF: enp5s0f0, VF: enp5s2
  #modprobe i40e
  #echo 2 > /sys/class/net/enp5s0f0/device/sriov_numvfs
  #ip link set enp5s2 up
  #ip -d link show enp5s2
  175: enp5s2:  mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1000
  link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64
  #ip link set enp5s0f0 vf 0 state disable
  #ip -d link show enp5s0f0
  171: enp5s0f0:  mtu 1500 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  link/ether 68:05:ca:2e:72:68 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64 numtxqueues 72 numrxqueues 72 portid 6805ca2e7268
  vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state disable, trust 
off
  vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto, trust off
  #ip -d link show enp5s2
  175: enp5s2:  mtu 1500 qdisc mq state DOWN 
mode DEFAULT group default qlen 1000
   link/ether ea:4d:60:bc:6f:85 brd ff:ff:ff:ff:ff:ff promiscuity 0 
addrgenmode eui64 numtxqueues 16 numrxqueues 16

Signed-off-by: Sridhar Samudrala 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c |  4 
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 10 +++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index f751f7b..e0a8cd8 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1037,6 +1037,7 @@ void i40evf_down(struct i40evf_adapter *adapter)
 
netif_carrier_off(netdev);
netif_tx_disable(netdev);
+   adapter->link_up = false;
i40evf_napi_disable_all(adapter);
i40evf_irq_disable(adapter);
 
@@ -1731,6 +1732,7 @@ static void i40evf_reset_task(struct work_struct *work)
set_bit(__I40E_DOWN, >vsi.state);
netif_carrier_off(netdev);
netif_tx_disable(netdev);
+   adapter->link_up = false;
i40evf_napi_disable_all(adapter);
i40evf_irq_disable(adapter);
i40evf_free_traffic_irqs(adapter);
@@ -1769,6 +1771,7 @@ continue_reset:
if (netif_running(adapter->netdev)) {
netif_carrier_off(netdev);
netif_tx_stop_all_queues(netdev);
+   adapter->link_up = false;
i40evf_napi_disable_all(adapter);
}
i40evf_irq_disable(adapter);
@@ -2457,6 +2460,7 @@ static void i40evf_init_task(struct work_struct *work)
goto err_sw_init;
 
netif_carrier_off(netdev);
+   adapter->link_up = false;
 
if (!adapter->netdev_registered) {
err = register_netdev(netdev);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index cc6cb30..ddf478d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -898,8 +898,14 @@ void i40evf_virtchnl_completion(struct i40evf_adapter 
*adapter,
vpe->event_data.link_event.link_status) {
adapter->link_up =
vpe->event_data.link_event.link_status;
+   if (adapter->link_up) {
+   netif_tx_start_all_queues(netdev);
+   netif_carrier_on(netdev);
+   } else {
+   netif_tx_stop_all_queues(netdev);
+   netif_carrier_off(netdev);
+   }
i40evf_print_link_message(adapter);
-   netif_tx_stop_all_queues(netdev);
}
break;
case I40E_VIRTCHNL_EVENT_RESET_IMPENDING:
@@ -974,8 +980,6 @@ void i40evf_virtchnl_completion(struct i40evf_adapter 
*adapter,
case I40E_VIRTCHNL_OP_ENABLE_QUEUES:
/* enable transmits */
i40evf_irq_enable(adapter, true);
-   netif_tx_start_all_queues(adapter->netdev);
-   netif_carrier_on(adapter->netdev);
break;
case I40E_VIRTCHNL_OP_DISABLE_QUEUES:
i40evf_free_all_tx_resources(adapter);
-- 
2.7.4



[net-next v2 04/10] i40e: Fix to check for NULL

2016-09-22 Thread Jeff Kirsher
From: Carolyn Wyborny 

This patch fixes an issue in the virt channel code, where a return
from i40e_find_vsi_from_id was not checked for NULL when applicable.
Without this patch, there is a risk for panic and static analysis
tools complain. This patch fixes the problem by adding the check
and adding an additional input check for similar reasons.

Change-ID: I7e9be88eb7a3addb50eadc451c8336d9e06f5394
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 611fc87..2ab5355 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -502,8 +502,16 @@ static int i40e_config_vsi_tx_queue(struct i40e_vf *vf, 
u16 vsi_id,
u32 qtx_ctl;
int ret = 0;
 
+   if (!i40e_vc_isvalid_vsi_id(vf, info->vsi_id)) {
+   ret = -ENOENT;
+   goto error_context;
+   }
pf_queue_id = i40e_vc_get_pf_queue_id(vf, vsi_id, vsi_queue_id);
vsi = i40e_find_vsi_from_id(pf, vsi_id);
+   if (!vsi) {
+   ret = -ENOENT;
+   goto error_context;
+   }
 
/* clear the context structure first */
memset(_ctx, 0, sizeof(struct i40e_hmc_obj_txq));
@@ -1476,7 +1484,8 @@ static int i40e_vc_config_promiscuous_mode_msg(struct 
i40e_vf *vf,
 
vsi = i40e_find_vsi_from_id(pf, info->vsi_id);
if (!test_bit(I40E_VF_STAT_ACTIVE, >vf_states) ||
-   !i40e_vc_isvalid_vsi_id(vf, info->vsi_id)) {
+   !i40e_vc_isvalid_vsi_id(vf, info->vsi_id) ||
+   !vsi) {
aq_ret = I40E_ERR_PARAM;
goto error_param;
}
-- 
2.7.4



[net-next v2 08/10] i40evf: remove unnecessary error checking against i40evf_up_complete

2016-09-22 Thread Jeff Kirsher
From: Bimmy Pujari 

Function i40evf_up_complete() always returns success. Changed this to a
void type and removed the code that checks the return status and prints
an error message.

Change-ID: I8c400f174786b9c855f679e470f35af292fb50ad
Signed-off-by: Bimmy Pujari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index e0a8cd8..9906775 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1007,7 +1007,7 @@ static void i40evf_configure(struct i40evf_adapter 
*adapter)
  * i40evf_up_complete - Finish the last steps of bringing up a connection
  * @adapter: board private structure
  **/
-static int i40evf_up_complete(struct i40evf_adapter *adapter)
+static void i40evf_up_complete(struct i40evf_adapter *adapter)
 {
adapter->state = __I40EVF_RUNNING;
clear_bit(__I40E_DOWN, >vsi.state);
@@ -1016,7 +1016,6 @@ static int i40evf_up_complete(struct i40evf_adapter 
*adapter)
 
adapter->aq_required |= I40EVF_FLAG_AQ_ENABLE_QUEUES;
mod_timer_pending(>watchdog_timer, jiffies + 1);
-   return 0;
 }
 
 /**
@@ -1827,9 +1826,7 @@ continue_reset:
 
i40evf_configure(adapter);
 
-   err = i40evf_up_complete(adapter);
-   if (err)
-   goto reset_err;
+   i40evf_up_complete(adapter);
 
i40evf_irq_enable(adapter, true);
} else {
@@ -2059,9 +2056,7 @@ static int i40evf_open(struct net_device *netdev)
i40evf_add_filter(adapter, adapter->hw.mac.addr);
i40evf_configure(adapter);
 
-   err = i40evf_up_complete(adapter);
-   if (err)
-   goto err_req_irq;
+   i40evf_up_complete(adapter);
 
i40evf_irq_enable(adapter, true);
 
-- 
2.7.4



[net-next v2 01/10] i40e: fix setting user defined RSS hash key

2016-09-22 Thread Jeff Kirsher
From: Alan Brady 

Previously, when using ethtool to change the RSS hash key, ethtool would
report back saying the old key was still being used and no error was
reported.  It was unclear whether it was being reported incorrectly or
being set incorrectly.  Debugging revealed 'i40e_set_rxfh()' returned
zero immediately instead of setting the key because a user defined
indirection table is not supplied when changing the hash key.

This fix instead changes it such that if an indirection table is not
supplied, then a default one is created and the hash key is now
correctly set.

Change-ID: Iddb621897ecf208650272b7ee46702cad7b69a71
Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 12 +++-
 drivers/net/ethernet/intel/i40e/i40e_main.c|  6 ++
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 19103a6..30aaee4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -701,6 +701,8 @@ void i40e_do_reset_safe(struct i40e_pf *pf, u32 
reset_flags);
 void i40e_do_reset(struct i40e_pf *pf, u32 reset_flags);
 int i40e_config_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size);
 int i40e_get_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, u16 lut_size);
+void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
+  u16 rss_table_size, u16 rss_size);
 struct i40e_vsi *i40e_find_vsi_from_id(struct i40e_pf *pf, u16 id);
 void i40e_update_stats(struct i40e_vsi *vsi);
 void i40e_update_eth_stats(struct i40e_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1835186..af28a8c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2922,15 +2922,13 @@ static int i40e_set_rxfh(struct net_device *netdev, 
const u32 *indir,
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
+   struct i40e_pf *pf = vsi->back;
u8 *seed = NULL;
u16 i;
 
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
-   if (!indir)
-   return 0;
-
if (key) {
if (!vsi->rss_hkey_user) {
vsi->rss_hkey_user = kzalloc(I40E_HKEY_ARRAY_SIZE,
@@ -2948,8 +2946,12 @@ static int i40e_set_rxfh(struct net_device *netdev, 
const u32 *indir,
}
 
/* Each 32 bits pointed by 'indir' is stored with a lut entry */
-   for (i = 0; i < I40E_HLUT_ARRAY_SIZE; i++)
-   vsi->rss_lut_user[i] = (u8)(indir[i]);
+   if (indir)
+   for (i = 0; i < I40E_HLUT_ARRAY_SIZE; i++)
+   vsi->rss_lut_user[i] = (u8)(indir[i]);
+   else
+   i40e_fill_rss_lut(pf, vsi->rss_lut_user, I40E_HLUT_ARRAY_SIZE,
+ vsi->rss_size);
 
return i40e_config_rss(vsi, seed, vsi->rss_lut_user,
   I40E_HLUT_ARRAY_SIZE);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 61b0fc4..69b9e30 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -57,8 +57,6 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, bool 
reinit);
 static int i40e_setup_misc_vector(struct i40e_pf *pf);
 static void i40e_determine_queue_usage(struct i40e_pf *pf);
 static int i40e_setup_pf_filter_control(struct i40e_pf *pf);
-static void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
- u16 rss_table_size, u16 rss_size);
 static void i40e_fdir_sb_setup(struct i40e_pf *pf);
 static int i40e_veb_get_bw_info(struct i40e_veb *veb);
 
@@ -8244,8 +8242,8 @@ int i40e_get_rss(struct i40e_vsi *vsi, u8 *seed, u8 *lut, 
u16 lut_size)
  * @rss_table_size: Lookup table size
  * @rss_size: Range of queue number for hashing
  */
-static void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
- u16 rss_table_size, u16 rss_size)
+void i40e_fill_rss_lut(struct i40e_pf *pf, u8 *lut,
+  u16 rss_table_size, u16 rss_size)
 {
u16 i;
 
-- 
2.7.4



[net-next v2 00/10][pull request] 40GbE Intel Wired LAN Driver Updates 2016-09-22

2016-09-22 Thread Jeff Kirsher
This series contains updates to i40e and i40evf only.

Sridhar fixes link state event handling by updating the carrier and
starts/stops the Tx queues based on the link state notification from PF.

Brady fixes an issue where a user defined RSS hash key was not being
set because a user defined indirection table is not supplied when changing
the hash key, so if an indirection table is not supplied now, then a
default one is created and the hash key is correctly set.  Also fixed
an issue where when NPAR was enabled, we were still using pf->mac_seid
to perform the dump port query. Instead, go through the VSI to determine
the correct ID to use in either case.

Mitch provides one fix where a conditional return code was reversed, so
he does a "switheroo" to fix the issue.

Carolyn has two fixes, first fixes an issue in the virt channel code,
where a return code was not checked for NULL when applicable.  Second,
fixes an issue where we were byte swapping the port parameter, then
byte swapping it again in function execution.

Colin Ian King fixes a potential NULL pointer dereference.

Bimmy changes up i40evf_up_complete() to be void since it always returns
success anyways, which allows cleaning up of code which checked the
return code from this function.

Alex fixed an issue where the driver was incorrectly assuming that we
would always be pulling no more than 1 descriptor from each fragment.
So to correct this, we just need to make certain to test all the way to
the end of the fragments as it is possible for us to span 2 descriptors
in the block before us so we need to guarantee that even the last 6
descriptors have enough data to fill a full frame.

v2: dropped patches 1-3, 10 and 12 from the original series since Or
Gerlitz pointed out several areas of improvement in the implementation
of the VF Port representor netdev.  Sridhar is re-working the series
for later submission.

The following are changes since commit cdd0766d7da19085e88df86d1e5e21d9fe3d374f:
  Merge branch 'ftgmac100-ast2500-support'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (2):
  i40e: fix setting user defined RSS hash key
  i40e: fix "dump port" command when NPAR enabled

Alexander Duyck (1):
  i40e: Limit TX descriptor count in cases where frag size is greater
than 16K

Bimmy Pujari (1):
  i40evf: remove unnecessary error checking against i40evf_up_complete

Carolyn Wyborny (2):
  i40e: Fix to check for NULL
  i40e: Fix for extra byte swap in tunnel setup

Colin Ian King (1):
  i40e: avoid potential null pointer dereference when assigning len

Lihong Yang (1):
  i40evf: remove unnecessary error checking against i40e_shutdown_adminq

Mitch Williams (1):
  i40e: return correct opcode to VF

Sridhar Samudrala (1):
  i40evf: Fix link state event handling

 drivers/net/ethernet/intel/i40e/i40e.h  |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c  |  7 ++-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c  | 12 +++-
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |  7 ++-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c  | 15 ---
 drivers/net/ethernet/intel/i40evf/i40e_common.c |  3 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c   |  7 ++-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 18 --
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 10 +++---
 10 files changed, 53 insertions(+), 40 deletions(-)

-- 
2.7.4



[net-next v2 06/10] i40e: avoid potential null pointer dereference when assigning len

2016-09-22 Thread Jeff Kirsher
From: Colin Ian King 

There is a sanitcy check for desc being null in the first line of
function i40evf_debug_aq.  However, before that, aq_desc is cast from
desc, and aq_desc is being dereferenced on the assignment of len, so
this could be a potential null pointer deference.  Fix this by moving
the initialization of len to the code block where len is being used
and hence at this point we know it is OK to dereference aq_desc.

Signed-off-by: Colin Ian King 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index 4db0c03..7953c13 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -302,7 +302,6 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
   void *buffer, u16 buf_len)
 {
struct i40e_aq_desc *aq_desc = (struct i40e_aq_desc *)desc;
-   u16 len = le16_to_cpu(aq_desc->datalen);
u8 *buf = (u8 *)buffer;
u16 i = 0;
 
@@ -326,6 +325,8 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
   le32_to_cpu(aq_desc->params.external.addr_low));
 
if ((buffer != NULL) && (aq_desc->datalen != 0)) {
+   u16 len = le16_to_cpu(aq_desc->datalen);
+
i40e_debug(hw, mask, "AQ CMD Buffer:\n");
if (buf_len < len)
len = buf_len;
-- 
2.7.4



[net-next v2 09/10] i40e: Limit TX descriptor count in cases where frag size is greater than 16K

2016-09-22 Thread Jeff Kirsher
From: Alexander Duyck 

The i40e driver was incorrectly assuming that we would always be pulling
no more than 1 descriptor from each fragment.  It is in fact possible for
us to end up with the case where 2 descriptors worth of data may be pulled
when a frame is larger than one of the pieces generated when aligning the
payload to either 4K or pieces smaller than 16K.

To adjust for this we just need to make certain to test all the way to the
end of the fragments as it is possible for us to span 2 descriptors in the
block before us so we need to guarantee that even the last 6 descriptors
have enough data to fill a full frame.

Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 7 ++-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 7 ++-
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index f8d6623..bf7bb7c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2621,9 +2621,7 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
return false;
 
/* We need to walk through the list and validate that each group
-* of 6 fragments totals at least gso_size.  However we don't need
-* to perform such validation on the last 6 since the last 6 cannot
-* inherit any data from a descriptor after them.
+* of 6 fragments totals at least gso_size.
 */
nr_frags -= I40E_MAX_BUFFER_TXD - 2;
frag = _shinfo(skb)->frags[0];
@@ -2654,8 +2652,7 @@ bool __i40e_chk_linearize(struct sk_buff *skb)
if (sum < 0)
return true;
 
-   /* use pre-decrement to avoid processing last fragment */
-   if (!--nr_frags)
+   if (!nr_frags--)
break;
 
sum -= skb_frag_size(stale++);
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 0130458..e3427eb 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1832,9 +1832,7 @@ bool __i40evf_chk_linearize(struct sk_buff *skb)
return false;
 
/* We need to walk through the list and validate that each group
-* of 6 fragments totals at least gso_size.  However we don't need
-* to perform such validation on the last 6 since the last 6 cannot
-* inherit any data from a descriptor after them.
+* of 6 fragments totals at least gso_size.
 */
nr_frags -= I40E_MAX_BUFFER_TXD - 2;
frag = _shinfo(skb)->frags[0];
@@ -1865,8 +1863,7 @@ bool __i40evf_chk_linearize(struct sk_buff *skb)
if (sum < 0)
return true;
 
-   /* use pre-decrement to avoid processing last fragment */
-   if (!--nr_frags)
+   if (!nr_frags--)
break;
 
sum -= skb_frag_size(stale++);
-- 
2.7.4



[net-next v2 02/10] i40e: fix "dump port" command when NPAR enabled

2016-09-22 Thread Jeff Kirsher
From: Alan Brady 

When using the debugfs to issue the "dump port" command
with NPAR enabled, the firmware reports back with invalid argument.

The issue occurs because the pf->mac_seid was used to perform the query.
This is fine when NPAR is disabled because the switch ID == pf->mac_seid,
however this is not the case when NPAR is enabled.  This fix instead
goes through the VSI to determine the correct ID to use in either case.

Change-ID: I0cd67913a7f2c4a2962e06d39e32e7447cc55b6a
Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 05cf9a7..8555f04 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1054,6 +1054,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
struct i40e_dcbx_config *r_cfg =
>hw.remote_dcbx_config;
int i, ret;
+   u32 switch_id;
 
bw_data = kzalloc(sizeof(
struct i40e_aqc_query_port_ets_config_resp),
@@ -1063,8 +1064,12 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
goto command_write_done;
}
 
+   vsi = pf->vsi[pf->lan_vsi];
+   switch_id =
+   vsi->info.switch_id & I40E_AQ_VSI_SW_ID_MASK;
+
ret = i40e_aq_query_port_ets_config(>hw,
-   pf->mac_seid,
+   switch_id,
bw_data, NULL);
if (ret) {
dev_info(>pdev->dev,
-- 
2.7.4



[net-next v2 05/10] i40e: Fix for extra byte swap in tunnel setup

2016-09-22 Thread Jeff Kirsher
From: Carolyn Wyborny 

This patch fixes an issue where we were byte swapping the port
parameter, then byte swapping it again in function execution.
Obviously, that's unnecessary, so take it out of the function calls.
Without this patch, the udp based tunnel configuration would
not be correct.

Change-ID: I788d83c5bd5732170f1a81dbfa0b1ac3ca8ea5b7
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 69b9e30..53cde5b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7154,9 +7154,9 @@ static void i40e_sync_udp_filters_subtask(struct i40e_pf 
*pf)
pf->pending_udp_bitmap &= ~BIT_ULL(i);
port = pf->udp_ports[i].index;
if (port)
-   ret = i40e_aq_add_udp_tunnel(hw, ntohs(port),
-pf->udp_ports[i].type,
-NULL, NULL);
+   ret = i40e_aq_add_udp_tunnel(hw, port,
+   pf->udp_ports[i].type,
+   NULL, NULL);
else
ret = i40e_aq_del_udp_tunnel(hw, i, NULL);
 
-- 
2.7.4



[net-next v2 10/10] i40evf: remove unnecessary error checking against i40e_shutdown_adminq

2016-09-22 Thread Jeff Kirsher
From: Lihong Yang 

The i40e_shutdown_adminq function never returns failure. There is no need to
check the non-0 return value. Clean up the unnecessary error checking and
warning against it.

Change-ID: Ibb616f09cfb93bd1a872ebf3241a15fb8354b31b
Signed-off-by: Lihong Yang 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 9906775..99833f3 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1785,8 +1785,7 @@ continue_reset:
i40evf_free_all_tx_resources(adapter);
 
/* kill and reinit the admin queue */
-   if (i40evf_shutdown_adminq(hw))
-   dev_warn(>pdev->dev, "Failed to shut down adminq\n");
+   i40evf_shutdown_adminq(hw);
adapter->current_op = I40E_VIRTCHNL_OP_UNKNOWN;
err = i40evf_init_adminq(hw);
if (err)
-- 
2.7.4



[PATCHv2 iproute2 2/2] ip rule: add selector support

2016-09-22 Thread Hangbin Liu
Signed-off-by: Hangbin Liu 
---
 ip/iprule.c| 180 +++--
 man/man8/ip-rule.8 |   6 +-
 2 files changed, 180 insertions(+), 6 deletions(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index e18505f..42fb6af 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -41,7 +42,7 @@ static void usage(void)
 {
fprintf(stderr, "Usage: ip rule { add | del } SELECTOR ACTION\n");
fprintf(stderr, "   ip rule { flush | save | restore }\n");
-   fprintf(stderr, "   ip rule [ list ]\n");
+   fprintf(stderr, "   ip rule [ list [ SELECTOR ]]\n");
fprintf(stderr, "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ 
tos TOS ] [ fwmark FWMARK[/MASK] ]\n");
fprintf(stderr, "[ iif STRING ] [ oif STRING ] [ pref 
NUMBER ] [ l3mdev ]\n");
fprintf(stderr, "ACTION := [ table TABLE_ID ]\n");
@@ -55,6 +56,105 @@ static void usage(void)
exit(-1);
 }
 
+static struct
+{
+   int not;
+   int l3mdev;
+   int iifmask, oifmask;
+   unsigned int tb;
+   unsigned int tos, tosmask;
+   unsigned int pref, prefmask;
+   unsigned int fwmark, fwmask;
+   char iif[IFNAMSIZ];
+   char oif[IFNAMSIZ];
+   inet_prefix src;
+   inet_prefix dst;
+} filter;
+
+static bool filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
+{
+   struct rtmsg *r = NLMSG_DATA(n);
+   inet_prefix src = { .family = r->rtm_family };
+   inet_prefix dst = { .family = r->rtm_family };
+   __u32 table;
+
+   if (preferred_family != AF_UNSPEC && r->rtm_family != preferred_family)
+   return false;
+
+   if (filter.prefmask &&
+   filter.pref ^ (tb[FRA_PRIORITY] ? rta_getattr_u32(tb[FRA_PRIORITY]) 
: 0))
+   return false;
+   if (filter.not && !(r->rtm_flags & FIB_RULE_INVERT))
+   return false;
+
+   if (filter.src.family) {
+   if (tb[FRA_SRC]) {
+   memcpy(, RTA_DATA(tb[FRA_SRC]),
+  (r->rtm_src_len + 7) / 8);
+   }
+   if (filter.src.family != r->rtm_family ||
+   filter.src.bitlen > r->rtm_src_len ||
+   inet_addr_match(, , filter.src.bitlen))
+   return false;
+   }
+
+   if (filter.dst.family) {
+   if (tb[FRA_DST]) {
+   memcpy(, RTA_DATA(tb[FRA_DST]),
+  (r->rtm_dst_len + 7) / 8);
+   }
+   if (filter.dst.family != r->rtm_family ||
+   filter.dst.bitlen > r->rtm_dst_len ||
+   inet_addr_match(, , filter.dst.bitlen))
+   return false;
+   }
+
+   if (filter.tosmask && filter.tos ^ r->rtm_tos)
+   return false;
+
+   if (filter.fwmark) {
+   __u32 mark = 0;
+   if (tb[FRA_FWMARK])
+   mark = rta_getattr_u32(tb[FRA_FWMARK]);
+   if (filter.fwmark ^ mark)
+   return false;
+   }
+   if (filter.fwmask) {
+   __u32 mask = 0;
+   if (tb[FRA_FWMASK])
+   mask = rta_getattr_u32(tb[FRA_FWMASK]);
+   if (filter.fwmask ^ mask)
+   return false;
+   }
+
+   if (filter.iifmask) {
+   if (tb[FRA_IFNAME]) {
+   if (strcmp(filter.iif, rta_getattr_str(tb[FRA_IFNAME])) 
!= 0)
+   return false;
+   } else {
+   return false;
+   }
+   }
+
+   if (filter.oifmask) {
+   if (tb[FRA_OIFNAME]) {
+   if (strcmp(filter.oif, 
rta_getattr_str(tb[FRA_OIFNAME])) != 0)
+   return false;
+   } else {
+   return false;
+   }
+   }
+
+   if (filter.l3mdev && !(tb[FRA_L3MDEV] && 
rta_getattr_u8(tb[FRA_L3MDEV])))
+   return false;
+
+   table = rtm_get_table(r, tb);
+   if (filter.tb > 0 && filter.tb ^ table)
+   return false;
+
+   return true;
+}
+
 int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 {
FILE *fp = (FILE *)arg;
@@ -77,6 +177,9 @@ int print_rule(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
host_len = af_bit_len(r->rtm_family);
 
+   if(!filter_nlmsg(n, tb, host_len))
+   return 0;
+
if (n->nlmsg_type == RTM_DELRULE)
fprintf(fp, "Deleted ");
 
@@ -287,9 +390,9 @@ static int iprule_list_flush_or_save(int argc, char **argv, 
int action)
if (af == AF_UNSPEC)
af = AF_INET;
 
-   if (argc > 0) {
-   fprintf(stderr,
-   "\"ip rule list/flush/save\" does not take 

[PATCHv2 iproute2 1/2] ip rule: merge ip rule flush and list, save together

2016-09-22 Thread Hangbin Liu
iprule_flush() and iprule_list_or_save() both call function
rtnl_wilddump_request() and rtnl_dump_filter(). So merge them
together just like other files do.

Signed-off-by: Hangbin Liu 
---
 ip/iprule.c | 121 +++-
 1 file changed, 54 insertions(+), 67 deletions(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 70562c5..e18505f 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -27,6 +27,12 @@
 #include "utils.h"
 #include "ip_common.h"
 
+enum list_action {
+   IPRULE_LIST,
+   IPRULE_FLUSH,
+   IPRULE_SAVE,
+};
+
 extern struct rtnl_handle rth;
 
 static void usage(void) __attribute__((noreturn));
@@ -243,24 +249,61 @@ static int save_rule(const struct sockaddr_nl *who,
return ret == n->nlmsg_len ? 0 : ret;
 }
 
-static int iprule_list_or_save(int argc, char **argv, int save)
+static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
+ void *arg)
+{
+   struct rtnl_handle rth2;
+   struct rtmsg *r = NLMSG_DATA(n);
+   int len = n->nlmsg_len;
+   struct rtattr *tb[FRA_MAX+1];
+
+   len -= NLMSG_LENGTH(sizeof(*r));
+   if (len < 0)
+   return -1;
+
+   parse_rtattr(tb, FRA_MAX, RTM_RTA(r), len);
+
+   if (tb[FRA_PRIORITY]) {
+   n->nlmsg_type = RTM_DELRULE;
+   n->nlmsg_flags = NLM_F_REQUEST;
+
+   if (rtnl_open(, 0) < 0)
+   return -1;
+
+   if (rtnl_talk(, n, NULL, 0) < 0)
+   return -2;
+
+   rtnl_close();
+   }
+
+   return 0;
+}
+
+static int iprule_list_flush_or_save(int argc, char **argv, int action)
 {
-   rtnl_filter_t filter = print_rule;
+   rtnl_filter_t filter_fn;
int af = preferred_family;
 
if (af == AF_UNSPEC)
af = AF_INET;
 
if (argc > 0) {
-   fprintf(stderr, "\"ip rule %s\" does not take any arguments.\n",
-   save ? "save" : "show");
+   fprintf(stderr,
+   "\"ip rule list/flush/save\" does not take any 
arguments\n");
return -1;
}
 
-   if (save) {
+   switch (action) {
+   case IPRULE_SAVE:
if (save_rule_prep())
return -1;
-   filter = save_rule;
+   filter_fn = save_rule;
+   break;
+   case IPRULE_FLUSH:
+   filter_fn = flush_rule;
+   break;
+   default:
+   filter_fn = print_rule;
}
 
if (rtnl_wilddump_request(, af, RTM_GETRULE) < 0) {
@@ -268,7 +311,7 @@ static int iprule_list_or_save(int argc, char **argv, int 
save)
return 1;
}
 
-   if (rtnl_dump_filter(, filter, stdout) < 0) {
+   if (rtnl_dump_filter(, filter_fn, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return 1;
}
@@ -511,72 +554,16 @@ static int iprule_modify(int cmd, int argc, char **argv)
return 0;
 }
 
-
-static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
- void *arg)
-{
-   struct rtnl_handle rth2;
-   struct rtmsg *r = NLMSG_DATA(n);
-   int len = n->nlmsg_len;
-   struct rtattr *tb[FRA_MAX+1];
-
-   len -= NLMSG_LENGTH(sizeof(*r));
-   if (len < 0)
-   return -1;
-
-   parse_rtattr(tb, FRA_MAX, RTM_RTA(r), len);
-
-   if (tb[FRA_PRIORITY]) {
-   n->nlmsg_type = RTM_DELRULE;
-   n->nlmsg_flags = NLM_F_REQUEST;
-
-   if (rtnl_open(, 0) < 0)
-   return -1;
-
-   if (rtnl_talk(, n, NULL, 0) < 0)
-   return -2;
-
-   rtnl_close();
-   }
-
-   return 0;
-}
-
-static int iprule_flush(int argc, char **argv)
-{
-   int af = preferred_family;
-
-   if (af == AF_UNSPEC)
-   af = AF_INET;
-
-   if (argc > 0) {
-   fprintf(stderr, "\"ip rule flush\" does not allow arguments\n");
-   return -1;
-   }
-
-   if (rtnl_wilddump_request(, af, RTM_GETRULE) < 0) {
-   perror("Cannot send dump request");
-   return 1;
-   }
-
-   if (rtnl_dump_filter(, flush_rule, NULL) < 0) {
-   fprintf(stderr, "Flush terminated\n");
-   return 1;
-   }
-
-   return 0;
-}
-
 int do_iprule(int argc, char **argv)
 {
if (argc < 1) {
-   return iprule_list_or_save(0, NULL, 0);
+   return iprule_list_flush_or_save(0, NULL, IPRULE_LIST);
} else if (matches(argv[0], "list") == 0 ||
   matches(argv[0], "lst") == 0 ||
   matches(argv[0], "show") == 0) {
-   return iprule_list_or_save(argc-1, argv+1, 0);
+   return iprule_list_flush_or_save(argc-1, argv+1, IPRULE_LIST);
} else if (matches(argv[0], 

[PATCHv2 iproute2 0/2] ip rule: merger iprule_flush and add selector support

2016-09-22 Thread Hangbin Liu
When merge iprule_flush() and iprule_list_or_save(). Renamed
rtnl_filter_t filter to filter_fn because we want to use global
variable 'filter' to filter nlmsg in the next patch.

Hangbin Liu (2):
  ip rule: merge ip rule flush and list, save together
  ip rule: add selector support

 ip/iprule.c| 295 +
 man/man8/ip-rule.8 |   6 +-
 2 files changed, 231 insertions(+), 70 deletions(-)

-- 
2.5.5



Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-22 Thread Shmulik Ladkani
Hi,

On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim  wrote:
> On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
> > From: Shmulik Ladkani 
> >
> > Up until now, 'action mirred' supported only egress actions (either
> > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> >
> > This patch implements the corresponding ingress actions
> > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> >
> > This allows attaching filters whose target is to hand matching skbs into
> > the rx processing of a specified device.
> 
> Thank you for doing this. There was something that made me remove
> initial support for this feature - I am blanking out right now but
> will find my notes and give more details.

Thanks Jamal, appreciate any details.

Was wondering why it's missing, googled a bit with no meaningful
results, so speculated the following:

Some time long ago, initial 'mirred' purpose was to facilitate ifb.
Therefore 'egress redirect' was implemented. Jamal probably left the
'ingress' support for a later time :)

One interesting usecase for 'ingress redirect' is creating "rx bouncing"
construct (like macvlan/macvtap/ipvlan) but applied according to custom
logic.

> It may be around preventing loops maybe.

Could be, but personally, I treat these constructs as (powerful)
building blocks, and "with great power comes great responsibility".

Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p 
 # tc filter add dev v0p parent : basic \
action mirred egress redirect dev v0

Regards,
Shmulik


Re: [PATCH net-next v2 1/3] net: ethernet: mediatek: add extension of phy-mode for TRGMII

2016-09-22 Thread Sean Wang
Date: Thu, 22 Sep 2016 14:30:53 +0300, Sergei Shtylyov 
 wrote:
>>Hello.
>
>On 9/22/2016 5:33 AM, sean.w...@mediatek.com wrote:
>
>> From: Sean Wang 
>>
>> adds PHY-mode "trgmii" as an extension for the operation mode of the 
>> PHY interface for PHY_INTERFACE_MODE_TRGMII.

.. deleted

>>  switch (of_get_phy_mode(np)) {
>> +case PHY_INTERFACE_MODE_TRGMII:
>> +mac->trgmii = true;
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>>  case PHY_INTERFACE_MODE_RGMII_RXID:
>>  case PHY_INTERFACE_MODE_RGMII_ID:
>> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h 
>> b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
>> index 7c5e534..e3b9525 100644
>> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
>> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
>> @@ -529,6 +529,8 @@ struct mtk_eth {
>>   * @hw: Backpointer to our main datastruture
>>   * @hw_stats:   Packet statistics counter
>>   * @phy_dev:The attached PHY if available
>> + * @trgmii  Indicate if the MAC uses TRGMII connected to internal
>> +switch
>>   */
>>  struct mtk_mac {
>>  int id;
>> @@ -539,6 +541,7 @@ struct mtk_mac {
>>  struct phy_device   *phy_dev;
>>  __be32  hwlro_ip[MTK_MAX_LRO_IP_CNT];
>>  int hwlro_ip_cnt;
>> +booltrgmii;
>
> I don't see where this is used.

I set trgmii as below
switch (of_get_phy_mode(np)) {
case PHY_INTERFACE_MODE_TRGMII:
mac->trgmii = true;
case PHY_INTERFACE_MODE_RGMII_TXID:

>[...]
>> diff --git a/include/linux/phy.h b/include/linux/phy.h index 
>> 2d24b28..e25f183 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -80,6 +80,7 @@ typedef enum {
>>  PHY_INTERFACE_MODE_XGMII,
>>  PHY_INTERFACE_MODE_MOCA,
>>  PHY_INTERFACE_MODE_QSGMII,
>> +PHY_INTERFACE_MODE_TRGMII,
>>  PHY_INTERFACE_MODE_MAX,
>>  } phy_interface_t;
>>
>> @@ -123,6 +124,8 @@ static inline const char *phy_modes(phy_interface_t 
>> interface)
>>  return "moca";
>>  case PHY_INTERFACE_MODE_QSGMII:
>>  return "qsgmii";
>> +case PHY_INTERFACE_MODE_TRGMII:
>> +return "trgmii";
>>  default:
>>  return "unknown";
>>  }
>
>I think this should be done in a separate phylib patch.

this patch is applied, so I am so little confused how to do this.
next time I will note placing modification for generic layer
into separate patch.

>
>MBR, Sergei
>


RE: [PATCH net] act_ife: Add support for machines with hard_header_len != mac_len

2016-09-22 Thread Yotam Gigi
>-Original Message-
>From: Jamal Hadi Salim [mailto:j...@mojatatu.com]
>Sent: Friday, September 23, 2016 1:40 AM
>To: Yotam Gigi ; da...@davemloft.net;
>netdev@vger.kernel.org; Roman Mashak 
>Subject: Re: [PATCH net] act_ife: Add support for machines with hard_header_len
>!= mac_len
>
>On 16-09-21 08:54 AM, Yotam Gigi wrote:
>> Without that fix, the following could occur:
>>  - On encode ingress, the total amount of skb_pushes (in lines 751 and
>>753) was more than specified in cow.
>>  - On machines with hard_header_len > mac_len, the packet format was not
>
>Just curious: What hardware would this be?

On mlxsw, in order to send a packet there needed to be added small tx header. 
In 
order to tell the kernel to reserve that space in the allocated skb, we set 
that 
hard_header_len field to include the tx header in addition to the mac header.

>
>
>> Fixes: ef6980b6becb ("net sched: introduce IFE action")
>> Signed-off-by: Yotam Gigi 
>> ---
>>  net/sched/act_ife.c | 34 +-
>>  1 file changed, 25 insertions(+), 9 deletions(-)
>>
>> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
>> index e87cd81..27b19ca 100644
>> --- a/net/sched/act_ife.c
>> +++ b/net/sched/act_ife.c
>> @@ -708,11 +708,13 @@ static int tcf_ife_encode(struct sk_buff *skb, const
>struct tc_action *a,
>> where ORIGDATA = original ethernet header ...
>>   */
>>  u16 metalen = ife_get_sz(skb, ife);
>> -int hdrm = metalen + skb->dev->hard_header_len + IFE_METAHDRLEN;
>> -unsigned int skboff = skb->dev->hard_header_len;
>>  u32 at = G_TC_AT(skb->tc_verd);
>> -int new_len = skb->len + hdrm;
>>  bool exceed_mtu = false;
>> +unsigned int skboff;
>> +int total_push;
>> +int reserve;
>> +int new_len;
>> +int hdrm;
>>  int err;
>>
>>  if (at & AT_EGRESS) {
>> @@ -724,6 +726,22 @@ static int tcf_ife_encode(struct sk_buff *skb, const 
>> struct
>tc_action *a,
>>  bstats_update(>tcf_bstats, skb);
>>  tcf_lastuse_update(>tcf_tm);
>>
>> +if (at & AT_EGRESS) {
>> +/* on egress, reserve space for hard_header_len instead of
>> + * mac_len
>> + */
>> +skb_reset_mac_len(skb);
>
>The skb_reset_mac_len() above is unneeded.
>
>> +hdrm = metalen + skb->mac_len + IFE_METAHDRLEN;
>
>Can you move this line outside of the if? It appears on the else
>so factoring it out is useful.
>
>> +total_push = hdrm;
>> +reserve = metalen + skb->dev->hard_header_len +
>IFE_METAHDRLEN;
>> +} else {
>> +/* on ingress, push mac_len as it already get parsed from tc */
>> +hdrm = metalen + skb->mac_len + IFE_METAHDRLEN;
>> +total_push = hdrm + skb->mac_len;
>> +reserve = total_push;
>> +}
>> +new_len =  skb->len + hdrm;
>> +
>>  if (!metalen) { /* no metadata to send */
>>  /* abuse overlimits to count when we allow packet
>>   * with no metadata
>> @@ -742,19 +760,17 @@ static int tcf_ife_encode(struct sk_buff *skb, const
>struct tc_action *a,
>>
>>  iethh = eth_hdr(skb);
>>
>> -err = skb_cow_head(skb, hdrm);
>> +err = skb_cow_head(skb, reserve);
>>  if (unlikely(err)) {
>>  ife->tcf_qstats.drops++;
>>  spin_unlock(>tcf_lock);
>>  return TC_ACT_SHOT;
>>  }
>>
>> -if (!(at & AT_EGRESS))
>> -skb_push(skb, skb->dev->hard_header_len);
>> -
>> -__skb_push(skb, hdrm);
>> +__skb_push(skb, total_push);
>>  memcpy(skb->data, iethh, skb->mac_len);
>>  skb_reset_mac_header(skb);
>> +skboff += skb->mac_len;
>
>Above looks dangerous. Did the compiler not warn?
>Maybe init skboff to skb->mac_len at the top.

That's look weird. I will fix it and repost in the next couple of days.

>
>Otherwise the ingress bits look good. Thanks!
>
>Please fix above and resend with:
>Signed-off-by: Jamal Hadi Salim 
>
>cheers,
>jamal


[PATCH v2 net-next 0/2] bnx2x: page allocation failure

2016-09-22 Thread Jason Baron
Hi,

While configuring ~500 multicast addrs, we ran into high order
page allocation failures. They don't need to be high order, and
thus I'm proposing to split them into at most PAGE_SIZE allocations.

Below is a sample failure.

Thanks,

-Jason

[1201902.617882] bnx2x: [bnx2x_set_mc_list:12374(eth0)]Failed to create 
multicast MACs list: -12
[1207325.695021] kworker/1:0: page allocation failure: order:2, mode:0xc020
[1207325.702059] CPU: 1 PID: 15805 Comm: kworker/1:0 Tainted: GW
[1207325.712940] Hardware name: SYNNEX CORPORATION 1x8-X4i SSD 10GE/S5512LE, 
BIOS V8.810 05/16/2013
[1207325.722284] Workqueue: events bnx2x_sp_rtnl_task [bnx2x]
[1207325.728206]   88012d873a78 8267f7c7 
c020
[1207325.736754]   88012d873b08 8212f8e0 
fffc0003
[1207325.745301]  88041ffecd80 88040030 0002 
c0206800da13
[1207325.753846] Call Trace:
[1207325.756789]  [] dump_stack+0x4d/0x63
[1207325.762426]  [] warn_alloc_failed+0xe0/0x130
[1207325.768756]  [] ? wakeup_kswapd+0x48/0x140
[1207325.774914]  [] __alloc_pages_nodemask+0x2bc/0x970
[1207325.781761]  [] alloc_pages_current+0x91/0x100
[1207325.788260]  [] alloc_kmem_pages+0xe/0x10
[1207325.794329]  [] kmalloc_order+0x18/0x50
[1207325.800227]  [] kmalloc_order_trace+0x26/0xb0
[1207325.806642]  [] ? _xfer_secondary_pool+0xa8/0x1a0
[1207325.813404]  [] __kmalloc+0x19a/0x1b0
[1207325.819142]  [] bnx2x_set_rx_mode_inner+0x3d5/0x590 
[bnx2x]
[1207325.827000]  [] bnx2x_sp_rtnl_task+0x28d/0x760 [bnx2x]
[1207325.834197]  [] process_one_work+0x134/0x3c0
[1207325.840522]  [] worker_thread+0x121/0x460
[1207325.846585]  [] ? process_one_work+0x3c0/0x3c0
[1207325.853089]  [] kthread+0xc9/0xe0
[1207325.858459]  [] ? notify_die+0x10/0x40
[1207325.864263]  [] ? kthread_create_on_node+0x180/0x180
[1207325.871288]  [] ret_from_fork+0x42/0x70
[1207325.877183]  [] ? kthread_create_on_node+0x180/0x180

v2:
 -make use of list_next_entry()
 -only use PAGE_SIZE allocations

Jason Baron (2):
  bnx2x: allocate mac filtering 'mcast_list' in PAGE_SIZE increments
  bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  79 +--
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c   | 123 ---
 2 files changed, 137 insertions(+), 65 deletions(-)

-- 
2.6.1



[PATCH v2 net-next 1/2] bnx2x: allocate mac filtering 'mcast_list' in PAGE_SIZE increments

2016-09-22 Thread Jason Baron
From: Jason Baron 

Currently, we can have high order page allocations that specify
GFP_ATOMIC when configuring multicast MAC address filters.

For example, we have seen order 2 page allocation failures with
~500 multicast addresses configured.

Convert the allocation for 'mcast_list' to be done in PAGE_SIZE
increments.

Signed-off-by: Jason Baron 
Cc: Yuval Mintz 
Cc: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 79 +++-
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index dab61a81a3ba..20fe6a8c35c1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12563,43 +12563,64 @@ static int bnx2x_close(struct net_device *dev)
return 0;
 }
 
-static int bnx2x_init_mcast_macs_list(struct bnx2x *bp,
- struct bnx2x_mcast_ramrod_params *p)
+struct bnx2x_mcast_list_elem_group
 {
-   int mc_count = netdev_mc_count(bp->dev);
-   struct bnx2x_mcast_list_elem *mc_mac =
-   kcalloc(mc_count, sizeof(*mc_mac), GFP_ATOMIC);
-   struct netdev_hw_addr *ha;
+   struct list_head mcast_group_link;
+   struct bnx2x_mcast_list_elem mcast_elems[];
+};
 
-   if (!mc_mac) {
-   BNX2X_ERR("Failed to allocate mc MAC list\n");
-   return -ENOMEM;
+#define MCAST_ELEMS_PER_PG \
+   ((PAGE_SIZE - sizeof(struct bnx2x_mcast_list_elem_group)) / \
+   sizeof(struct bnx2x_mcast_list_elem))
+
+static void bnx2x_free_mcast_macs_list(struct list_head *mcast_group_list)
+{
+   struct bnx2x_mcast_list_elem_group *current_mcast_group;
+
+   while (!list_empty(mcast_group_list)) {
+   current_mcast_group = list_first_entry(mcast_group_list,
+ struct bnx2x_mcast_list_elem_group,
+ mcast_group_link);
+   list_del(_mcast_group->mcast_group_link);
+   free_page((unsigned long)current_mcast_group);
}
+}
 
-   INIT_LIST_HEAD(>mcast_list);
+static int bnx2x_init_mcast_macs_list(struct bnx2x *bp,
+ struct bnx2x_mcast_ramrod_params *p,
+ struct list_head *mcast_group_list)
+{
+   struct bnx2x_mcast_list_elem *mc_mac;
+   struct netdev_hw_addr *ha;
+   struct bnx2x_mcast_list_elem_group *current_mcast_group = NULL;
+   int mc_count = netdev_mc_count(bp->dev);
+   int offset = 0;
 
+   INIT_LIST_HEAD(>mcast_list);
netdev_for_each_mc_addr(ha, bp->dev) {
+   if (!offset) {
+   current_mcast_group =
+   (struct bnx2x_mcast_list_elem_group *)
+   __get_free_page(GFP_ATOMIC);
+   if (!current_mcast_group) {
+   bnx2x_free_mcast_macs_list(mcast_group_list);
+   BNX2X_ERR("Failed to allocate mc MAC list\n");
+   return -ENOMEM;
+   }
+   list_add(_mcast_group->mcast_group_link,
+mcast_group_list);
+   }
+   mc_mac = _mcast_group->mcast_elems[offset];
mc_mac->mac = bnx2x_mc_addr(ha);
list_add_tail(_mac->link, >mcast_list);
-   mc_mac++;
+   offset++;
+   if (offset == MCAST_ELEMS_PER_PG)
+   offset = 0;
}
-
p->mcast_list_len = mc_count;
-
return 0;
 }
 
-static void bnx2x_free_mcast_macs_list(
-   struct bnx2x_mcast_ramrod_params *p)
-{
-   struct bnx2x_mcast_list_elem *mc_mac =
-   list_first_entry(>mcast_list, struct bnx2x_mcast_list_elem,
-link);
-
-   WARN_ON(!mc_mac);
-   kfree(mc_mac);
-}
-
 /**
  * bnx2x_set_uc_list - configure a new unicast MACs list.
  *
@@ -12647,6 +12668,7 @@ static int bnx2x_set_uc_list(struct bnx2x *bp)
 
 static int bnx2x_set_mc_list_e1x(struct bnx2x *bp)
 {
+   LIST_HEAD(mcast_group_list);
struct net_device *dev = bp->dev;
struct bnx2x_mcast_ramrod_params rparam = {NULL};
int rc = 0;
@@ -12662,7 +12684,7 @@ static int bnx2x_set_mc_list_e1x(struct bnx2x *bp)
 
/* then, configure a new MACs list */
if (netdev_mc_count(dev)) {
-   rc = bnx2x_init_mcast_macs_list(bp, );
+   rc = bnx2x_init_mcast_macs_list(bp, , _group_list);
if (rc)
return rc;
 
@@ -12673,7 +12695,7 @@ static int bnx2x_set_mc_list_e1x(struct bnx2x *bp)
BNX2X_ERR("Failed to set a new multicast configuration: 
%d\n",
  

[PATCH v2 net-next 2/2] bnx2x: allocate mac filtering pending list in PAGE_SIZE increments

2016-09-22 Thread Jason Baron
From: Jason Baron 

Currently, we can have high order page allocations that specify
GFP_ATOMIC when configuring multicast MAC address filters.

For example, we have seen order 2 page allocation failures with
~500 multicast addresses configured.

Convert the allocation for the pending list to be done in PAGE_SIZE
increments.

Signed-off-by: Jason Baron 
Cc: Yuval Mintz 
Cc: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c | 123 +
 1 file changed, 86 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
index d468380c2a23..4947a9cbf0c1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
@@ -2606,8 +2606,23 @@ struct bnx2x_mcast_bin_elem {
int type; /* BNX2X_MCAST_CMD_SET_{ADD, DEL} */
 };
 
+union bnx2x_mcast_elem {
+   struct bnx2x_mcast_bin_elem bin_elem;
+   struct bnx2x_mcast_mac_elem mac_elem;
+};
+
+struct bnx2x_mcast_elem_group {
+   struct list_head mcast_group_link;
+   union bnx2x_mcast_elem mcast_elems[];
+};
+
+#define MCAST_MAC_ELEMS_PER_PG \
+   ((PAGE_SIZE - sizeof(struct bnx2x_mcast_elem_group)) / \
+   sizeof(union bnx2x_mcast_elem))
+
 struct bnx2x_pending_mcast_cmd {
struct list_head link;
+   struct list_head group_head;
int type; /* BNX2X_MCAST_CMD_X */
union {
struct list_head macs_head;
@@ -2638,16 +2653,29 @@ static int bnx2x_mcast_wait(struct bnx2x *bp,
return 0;
 }
 
+static void bnx2x_free_groups(struct list_head *mcast_group_list)
+{
+   struct bnx2x_mcast_elem_group *current_mcast_group;
+
+   while (!list_empty(mcast_group_list)) {
+   current_mcast_group = list_first_entry(mcast_group_list,
+ struct bnx2x_mcast_elem_group,
+ mcast_group_link);
+   list_del(_mcast_group->mcast_group_link);
+   free_page((unsigned long)current_mcast_group);
+   }
+}
+
 static int bnx2x_mcast_enqueue_cmd(struct bnx2x *bp,
   struct bnx2x_mcast_obj *o,
   struct bnx2x_mcast_ramrod_params *p,
   enum bnx2x_mcast_cmd cmd)
 {
-   int total_sz;
struct bnx2x_pending_mcast_cmd *new_cmd;
-   struct bnx2x_mcast_mac_elem *cur_mac = NULL;
struct bnx2x_mcast_list_elem *pos;
-   int macs_list_len = 0, macs_list_len_size;
+   struct bnx2x_mcast_elem_group *elem_group;
+   struct bnx2x_mcast_mac_elem *mac_elem;
+   int total_elems = 0, macs_list_len = 0, offset = 0;
 
/* When adding MACs we'll need to store their values */
if (cmd == BNX2X_MCAST_CMD_ADD || cmd == BNX2X_MCAST_CMD_SET)
@@ -2657,50 +2685,61 @@ static int bnx2x_mcast_enqueue_cmd(struct bnx2x *bp,
if (!p->mcast_list_len)
return 0;
 
-   /* For a set command, we need to allocate sufficient memory for all
-* the bins, since we can't analyze at this point how much memory would
-* be required.
-*/
-   macs_list_len_size = macs_list_len *
-sizeof(struct bnx2x_mcast_mac_elem);
-   if (cmd == BNX2X_MCAST_CMD_SET) {
-   int bin_size = BNX2X_MCAST_BINS_NUM *
-  sizeof(struct bnx2x_mcast_bin_elem);
-
-   if (bin_size > macs_list_len_size)
-   macs_list_len_size = bin_size;
-   }
-   total_sz = sizeof(*new_cmd) + macs_list_len_size;
-
/* Add mcast is called under spin_lock, thus calling with GFP_ATOMIC */
-   new_cmd = kzalloc(total_sz, GFP_ATOMIC);
-
+   new_cmd = kzalloc(sizeof(*new_cmd), GFP_ATOMIC);
if (!new_cmd)
return -ENOMEM;
 
-   DP(BNX2X_MSG_SP, "About to enqueue a new %d command. 
macs_list_len=%d\n",
-  cmd, macs_list_len);
-
INIT_LIST_HEAD(_cmd->data.macs_head);
-
+   INIT_LIST_HEAD(_cmd->group_head);
new_cmd->type = cmd;
new_cmd->done = false;
 
+   DP(BNX2X_MSG_SP, "About to enqueue a new %d command. 
macs_list_len=%d\n",
+  cmd, macs_list_len);
+
switch (cmd) {
case BNX2X_MCAST_CMD_ADD:
case BNX2X_MCAST_CMD_SET:
-   cur_mac = (struct bnx2x_mcast_mac_elem *)
- ((u8 *)new_cmd + sizeof(*new_cmd));
-
-   /* Push the MACs of the current command into the pending command
-* MACs list: FIFO
+   /* For a set command, we need to allocate sufficient memory for
+* all the bins, since we can't analyze at this point how much
+* memory would be required.
 */
+   total_elems = macs_list_len;
+   if (cmd == 

Re: [PATCH] L2TP:Adjust intf MTU, add underlay L3, overlay L2

2016-09-22 Thread R. Parameswaran


On Thu, 22 Sep 2016, Derek Fawcus wrote:

> On Wed, Sep 21, 2016 at 02:11:04pm -0700, R. Parameswaran wrote:
> > 
> [snip]
> 
> > @@ -206,6 +209,46 @@ static void l2tp_eth_show(struct seq_file *m, void
> > *arg)
> >  }
> >  #endif
> [snip]
> 
> > +
> >  static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id,
> > u32 peer_session_id, struct l2tp_session_cfg *cfg)
> >  {
> > struct net_device *dev;
> > @@ -255,11 +298,8 @@ static int l2tp_eth_create(struct net *net, u32
> > tunnel_id, u32 session_id, u32 p
> > }
> > 
> 
> Your diff has whitespace errors,  probably where your MUA has decided to do
> 'intelligent' line wrapping.
> You should (re)send from a proper MUA which does not suffer from this issue.
> 
> DF
> 

Reposted the patch fixing this, and after rebasing the patch to the 
dmiller 'net' tree, verified that 'git am -c' applies the reposted patch 
successfully (after email header is removed) - thanks for identifying 
this.

regards,

Ramkumar


Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()

2016-09-22 Thread Sowmini Varadhan
On (09/22/16 01:52), David Miller wrote:
> Alternatively we can do Alexander Duyck's trick, by pushing
> the headers into the frag list, forcing a pull and realignment
> by the next protocol layer.

What is the "Alexander Duyck trick" (hints about module or commit id,
where this can be found, please)?

Is this basically about, e.g., putting the vxlanhdr in its own
skb_frag_t, or something else?

--Sowmini


[PATCH iproute2 1/2] ip rule: merge ip rule flush and list, save together

2016-09-22 Thread Hangbin Liu
iprule_flush() and iprule_list_or_save() both call function
rtnl_wilddump_request() and rtnl_dump_filter(). So merge them
together just like other files do.

Signed-off-by: Hangbin Liu 
---
 ip/iprule.c | 121 +++-
 1 file changed, 54 insertions(+), 67 deletions(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 70562c5..e18505f 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -27,6 +27,12 @@
 #include "utils.h"
 #include "ip_common.h"
 
+enum list_action {
+   IPRULE_LIST,
+   IPRULE_FLUSH,
+   IPRULE_SAVE,
+};
+
 extern struct rtnl_handle rth;
 
 static void usage(void) __attribute__((noreturn));
@@ -243,24 +249,61 @@ static int save_rule(const struct sockaddr_nl *who,
return ret == n->nlmsg_len ? 0 : ret;
 }
 
-static int iprule_list_or_save(int argc, char **argv, int save)
+static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
+ void *arg)
+{
+   struct rtnl_handle rth2;
+   struct rtmsg *r = NLMSG_DATA(n);
+   int len = n->nlmsg_len;
+   struct rtattr *tb[FRA_MAX+1];
+
+   len -= NLMSG_LENGTH(sizeof(*r));
+   if (len < 0)
+   return -1;
+
+   parse_rtattr(tb, FRA_MAX, RTM_RTA(r), len);
+
+   if (tb[FRA_PRIORITY]) {
+   n->nlmsg_type = RTM_DELRULE;
+   n->nlmsg_flags = NLM_F_REQUEST;
+
+   if (rtnl_open(, 0) < 0)
+   return -1;
+
+   if (rtnl_talk(, n, NULL, 0) < 0)
+   return -2;
+
+   rtnl_close();
+   }
+
+   return 0;
+}
+
+static int iprule_list_flush_or_save(int argc, char **argv, int action)
 {
-   rtnl_filter_t filter = print_rule;
+   rtnl_filter_t filter_fn;
int af = preferred_family;
 
if (af == AF_UNSPEC)
af = AF_INET;
 
if (argc > 0) {
-   fprintf(stderr, "\"ip rule %s\" does not take any arguments.\n",
-   save ? "save" : "show");
+   fprintf(stderr,
+   "\"ip rule list/flush/save\" does not take any 
arguments\n");
return -1;
}
 
-   if (save) {
+   switch (action) {
+   case IPRULE_SAVE:
if (save_rule_prep())
return -1;
-   filter = save_rule;
+   filter_fn = save_rule;
+   break;
+   case IPRULE_FLUSH:
+   filter_fn = flush_rule;
+   break;
+   default:
+   filter_fn = print_rule;
}
 
if (rtnl_wilddump_request(, af, RTM_GETRULE) < 0) {
@@ -268,7 +311,7 @@ static int iprule_list_or_save(int argc, char **argv, int 
save)
return 1;
}
 
-   if (rtnl_dump_filter(, filter, stdout) < 0) {
+   if (rtnl_dump_filter(, filter_fn, stdout) < 0) {
fprintf(stderr, "Dump terminated\n");
return 1;
}
@@ -511,72 +554,16 @@ static int iprule_modify(int cmd, int argc, char **argv)
return 0;
 }
 
-
-static int flush_rule(const struct sockaddr_nl *who, struct nlmsghdr *n,
- void *arg)
-{
-   struct rtnl_handle rth2;
-   struct rtmsg *r = NLMSG_DATA(n);
-   int len = n->nlmsg_len;
-   struct rtattr *tb[FRA_MAX+1];
-
-   len -= NLMSG_LENGTH(sizeof(*r));
-   if (len < 0)
-   return -1;
-
-   parse_rtattr(tb, FRA_MAX, RTM_RTA(r), len);
-
-   if (tb[FRA_PRIORITY]) {
-   n->nlmsg_type = RTM_DELRULE;
-   n->nlmsg_flags = NLM_F_REQUEST;
-
-   if (rtnl_open(, 0) < 0)
-   return -1;
-
-   if (rtnl_talk(, n, NULL, 0) < 0)
-   return -2;
-
-   rtnl_close();
-   }
-
-   return 0;
-}
-
-static int iprule_flush(int argc, char **argv)
-{
-   int af = preferred_family;
-
-   if (af == AF_UNSPEC)
-   af = AF_INET;
-
-   if (argc > 0) {
-   fprintf(stderr, "\"ip rule flush\" does not allow arguments\n");
-   return -1;
-   }
-
-   if (rtnl_wilddump_request(, af, RTM_GETRULE) < 0) {
-   perror("Cannot send dump request");
-   return 1;
-   }
-
-   if (rtnl_dump_filter(, flush_rule, NULL) < 0) {
-   fprintf(stderr, "Flush terminated\n");
-   return 1;
-   }
-
-   return 0;
-}
-
 int do_iprule(int argc, char **argv)
 {
if (argc < 1) {
-   return iprule_list_or_save(0, NULL, 0);
+   return iprule_list_flush_or_save(0, NULL, IPRULE_LIST);
} else if (matches(argv[0], "list") == 0 ||
   matches(argv[0], "lst") == 0 ||
   matches(argv[0], "show") == 0) {
-   return iprule_list_or_save(argc-1, argv+1, 0);
+   return iprule_list_flush_or_save(argc-1, argv+1, IPRULE_LIST);
} else if (matches(argv[0], 

[PATCH iproute2 2/2] ip rule: add selector support

2016-09-22 Thread Hangbin Liu
Signed-off-by: Hangbin Liu 
---
 ip/iprule.c| 180 +++--
 man/man8/ip-rule.8 |   6 +-
 2 files changed, 180 insertions(+), 6 deletions(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index e18505f..42fb6af 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -41,7 +42,7 @@ static void usage(void)
 {
fprintf(stderr, "Usage: ip rule { add | del } SELECTOR ACTION\n");
fprintf(stderr, "   ip rule { flush | save | restore }\n");
-   fprintf(stderr, "   ip rule [ list ]\n");
+   fprintf(stderr, "   ip rule [ list [ SELECTOR ]]\n");
fprintf(stderr, "SELECTOR := [ not ] [ from PREFIX ] [ to PREFIX ] [ 
tos TOS ] [ fwmark FWMARK[/MASK] ]\n");
fprintf(stderr, "[ iif STRING ] [ oif STRING ] [ pref 
NUMBER ] [ l3mdev ]\n");
fprintf(stderr, "ACTION := [ table TABLE_ID ]\n");
@@ -55,6 +56,105 @@ static void usage(void)
exit(-1);
 }
 
+static struct
+{
+   int not;
+   int l3mdev;
+   int iifmask, oifmask;
+   unsigned int tb;
+   unsigned int tos, tosmask;
+   unsigned int pref, prefmask;
+   unsigned int fwmark, fwmask;
+   char iif[IFNAMSIZ];
+   char oif[IFNAMSIZ];
+   inet_prefix src;
+   inet_prefix dst;
+} filter;
+
+static bool filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
+{
+   struct rtmsg *r = NLMSG_DATA(n);
+   inet_prefix src = { .family = r->rtm_family };
+   inet_prefix dst = { .family = r->rtm_family };
+   __u32 table;
+
+   if (preferred_family != AF_UNSPEC && r->rtm_family != preferred_family)
+   return false;
+
+   if (filter.prefmask &&
+   filter.pref ^ (tb[FRA_PRIORITY] ? rta_getattr_u32(tb[FRA_PRIORITY]) 
: 0))
+   return false;
+   if (filter.not && !(r->rtm_flags & FIB_RULE_INVERT))
+   return false;
+
+   if (filter.src.family) {
+   if (tb[FRA_SRC]) {
+   memcpy(, RTA_DATA(tb[FRA_SRC]),
+  (r->rtm_src_len + 7) / 8);
+   }
+   if (filter.src.family != r->rtm_family ||
+   filter.src.bitlen > r->rtm_src_len ||
+   inet_addr_match(, , filter.src.bitlen))
+   return false;
+   }
+
+   if (filter.dst.family) {
+   if (tb[FRA_DST]) {
+   memcpy(, RTA_DATA(tb[FRA_DST]),
+  (r->rtm_dst_len + 7) / 8);
+   }
+   if (filter.dst.family != r->rtm_family ||
+   filter.dst.bitlen > r->rtm_dst_len ||
+   inet_addr_match(, , filter.dst.bitlen))
+   return false;
+   }
+
+   if (filter.tosmask && filter.tos ^ r->rtm_tos)
+   return false;
+
+   if (filter.fwmark) {
+   __u32 mark = 0;
+   if (tb[FRA_FWMARK])
+   mark = rta_getattr_u32(tb[FRA_FWMARK]);
+   if (filter.fwmark ^ mark)
+   return false;
+   }
+   if (filter.fwmask) {
+   __u32 mask = 0;
+   if (tb[FRA_FWMASK])
+   mask = rta_getattr_u32(tb[FRA_FWMASK]);
+   if (filter.fwmask ^ mask)
+   return false;
+   }
+
+   if (filter.iifmask) {
+   if (tb[FRA_IFNAME]) {
+   if (strcmp(filter.iif, rta_getattr_str(tb[FRA_IFNAME])) 
!= 0)
+   return false;
+   } else {
+   return false;
+   }
+   }
+
+   if (filter.oifmask) {
+   if (tb[FRA_OIFNAME]) {
+   if (strcmp(filter.oif, 
rta_getattr_str(tb[FRA_OIFNAME])) != 0)
+   return false;
+   } else {
+   return false;
+   }
+   }
+
+   if (filter.l3mdev && !(tb[FRA_L3MDEV] && 
rta_getattr_u8(tb[FRA_L3MDEV])))
+   return false;
+
+   table = rtm_get_table(r, tb);
+   if (filter.tb > 0 && filter.tb ^ table)
+   return false;
+
+   return true;
+}
+
 int print_rule(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 {
FILE *fp = (FILE *)arg;
@@ -77,6 +177,9 @@ int print_rule(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 
host_len = af_bit_len(r->rtm_family);
 
+   if(!filter_nlmsg(n, tb, host_len))
+   return 0;
+
if (n->nlmsg_type == RTM_DELRULE)
fprintf(fp, "Deleted ");
 
@@ -287,9 +390,9 @@ static int iprule_list_flush_or_save(int argc, char **argv, 
int action)
if (af == AF_UNSPEC)
af = AF_INET;
 
-   if (argc > 0) {
-   fprintf(stderr,
-   "\"ip rule list/flush/save\" does not take 

[PATCH iproute2 0/2] ip rule: merger iprule_flush and add selector support

2016-09-22 Thread Hangbin Liu
When merge iprule_flush() and iprule_list_or_save(). Renamed
rtnl_filter_t filter to filter_fn because we want to use global
variable 'filter' to filter nlmsg in the next patch.

Hangbin Liu (2):
  ip rule: merge ip rule flush and list, save together
  ip rule: add selector support

 ip/iprule.c| 295 +
 man/man8/ip-rule.8 |   6 +-
 2 files changed, 231 insertions(+), 70 deletions(-)

-- 
2.5.5



RE: [RFC v2 06/12] qedr: Add support for QP verbs

2016-09-22 Thread Amrani, Ram
> 
> Do you have a git tree?
> 
We don't have a publicly accessible git tree.



Re: [PATCH RFC 1/3] xdp: Infrastructure to generalize XDP

2016-09-22 Thread Eric Dumazet
On Thu, 2016-09-22 at 15:14 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 21 Sep 2016 21:56:58 +0200
> Jesper Dangaard Brouer  wrote:
> 
> > > > I'm not opposed to running non-BPF code at XDP. I'm against adding
> > > > a linked list of hook consumers.  
> > 
> > I also worry about the performance impact of a linked list.  We should
> > simple benchmark it instead of discussing it! ;-)
> 
> (Note, there are some stability issue with this RFC patchset, when
> removing the xdp program, that I had to workaround/patch)
> 
> 
> I've started benchmarking this and I only see added cost of 2.89ns from
> these patches, at these crazy speeds it does correspond to -485Kpps.

I claim the methodology is too biased.

At full speed, all the extra code is hot in caches, and your core has
full access to memory bus anyway. Even branch predictor has fresh
information.

Now, in a mixed workload, where all cores compete to access L2/L3 and
RAM, things might be very different.

Testing icache/dcache pressure is not a matter of measuring how many
Kpps you add or remove on a hot path.

A latency test, when other cpus are busy reading/writing all over
memory, and your caches are cold, would be useful.





Re: [PATCH net-next] net/vxlan: Avoid unaligned access in vxlan_build_skb()

2016-09-22 Thread David Miller
From: Jiri Benc 
Date: Tue, 20 Sep 2016 19:09:29 +0200

> But the point stands, we have much greater problems here than VXLAN.
> And I don't think that wrapping all IP address accesses into
> get/put_unaligned all around the stack is the solution.

Right, and I don't like marking things as packed either.

We need something that really solves the problem.  We can't
change the existing protocols, but we can perhaps change the
geometry of the SKB when we deal with such protocols.

For example, we can memmove() to align the headers at skb->data and
then for the skb->data portion past the headers we insert a frag
pointing to it at the front of the frag list.

So we "memmove" down, creating a gap, and then past the gap
is the post-header area which gets inserted into the head of
the SKB's fraglist.

That will align all of the subsequent headers and avoid the
unaligned accesses after the vxlan header.

Alternatively we can do Alexander Duyck's trick, by pushing
the headers into the frag list, forcing a pull and realignment
by the next protocol layer.

This is so much better than the little hacks sprinkled all
over the problem and tackles the fundamental issue.

Thanks.


Re: [PATCH v2 0/2] make POSIX timers optional

2016-09-22 Thread David Miller
From: Nicolas Pitre 
Date: Tue, 20 Sep 2016 15:56:38 -0400

> Many embedded systems don't need the full POSIX timer support.
> Configuring them out provides a nice kernel image size reduction.
> 
> When POSIX timers are configured out, the PTP clock subsystem should be
> left out as well. However a bunch of ethernet drivers currently *select*
> it in their Kconfig entries. Therefore some more tweaks were needed to
> break that hard dependency for those drivers to still be configured in
> if desired.
> 
> It was agreed that the best path upstream for those patches is via
> John Stultz's timer tree.

Acked-by: David S. Miller 


Re: [PATCH net-next] MAINTAINERS: Update b44 maintainer.

2016-09-22 Thread David Miller
From: Michael Chan 
Date: Tue, 20 Sep 2016 23:33:15 -0400

> Taking over as maintainer since Gary Zambrano is no longer working
> for Broadcom.
> 
> Signed-off-by: Michael Chan 

Applied, thanks.


Re: [PATCH net-next] tcp: implement TSQ for retransmits

2016-09-22 Thread David Miller
From: Eric Dumazet 
Date: Tue, 20 Sep 2016 22:45:58 -0700

> From: Eric Dumazet 
> 
> We saw sch_fq drops caused by the per flow limit of 100 packets and TCP
> when dealing with large cwnd and bursts of retransmits.
> 
> Even after increasing the limit to 1000, and even after commit
> 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time"),
> we can still have these drops.
> 
> Under certain conditions, TCP can spend a considerable amount of
> time queuing thousands of skbs in a single tcp_xmit_retransmit_queue()
> invocation, incurring latency spikes and stalls of other softirq
> handlers.
> 
> This patch implements TSQ for retransmits, limiting number of packets
> and giving more chance for scheduling packets in both ways.
> 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Neal Cardwell 

Applied.


Re: pull request (net): ipsec 2016-09-21

2016-09-22 Thread David Miller
From: Steffen Klassert 
Date: Wed, 21 Sep 2016 13:05:42 +0200

> 1) Propagate errors on security context allocation.
>From Mathias Krause.
> 
> 2) Fix inbound policy checks for inter address family tunnels.
>From Thomas Zeitlhofer.
> 
> 3) Fix an old memory leak on aead algorithm usage.
>From Ilan Tayari.
> 
> 4) A recent patch fixed a possible NULL pointer dereference
>but broke the vti6 input path.
>Fix from Nicolas Dichtel.
> 
> Please pull or let me know if there are problems.

Pulled, thanks a lot Steffen.


Re: [PATCH net-next 0/9] rxrpc: Preparation for slow-start algorithm

2016-09-22 Thread David Howells
I'm going to post a V2 for this.  I've used a couple of 64-bit division
operators rather than calling the appropriate function (which is fine on
x86_64) and managed to transpose the last two patches (causing an undefined
symbol in one of them).

David


[PATCH net-next 0/9] rxrpc: Preparation for slow-start algorithm [ver #2]

2016-09-22 Thread David Howells

Here are some patches that prepare for improvements in ACK generation and
for the implementation of the slow-start part of the protocol:

 (1) Stop storing the protocol header in the Tx socket buffers, but rather
 generate it on the fly.  This potentially saves a little space and
 makes it easier to alter the header just before transmission (the
 flags may get altered and the serial number has to be changed).

 (2) Mask off the Tx buffer annotations and add a flag to record which ones
 have already been resent.

 (3) Track RTT on a per-peer basis for use in future changes.  Tracepoints
 are added to log this.

 (4) Send PING ACKs in response to incoming calls to elicit a PING-RESPONSE
 ACK from which RTT data can be calculated.  The response also carries
 other useful information.

 (5) Expedite PING-RESPONSE ACK generation from sendmsg.  If we're actively
 using sendmsg, this allows us, under some circumstances, to avoid
 having to rely on the background work item to run to generate this
 ACK.

 This requires ktime_sub_ms() to be added.

 (6) Set the REQUEST-ACK flag on some DATA packets to elicit ACK-REQUESTED
 ACKs from which RTT data can be calculated.

 (7) Limit the use of pings and ACK requests for RTT determination.

Changes:

 (V2) Don't use the C division operator for 64-bit division.  One instance
  should use do_div() and the other should be using nsecs_to_jiffies().

  The last two patches got transposed, leading to an undefined symbol
  in one of them.

  Reported-by: kbuild test robot <l...@intel.com>

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160922-v2

David
---
David Howells (9):
  rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs
  rxrpc: Add re-sent Tx annotation
  rxrpc: Add per-peer RTT tracker
  rxrpc: Send pings to get RTT data
  rxrpc: Expedite ping response transmission
  rxrpc: Add ktime_sub_ms()
  rxrpc: Obtain RTT data by requesting ACKs on DATA packets
  rxrpc: Reduce the number of ACK-Requests sent
  rxrpc: Reduce the number of PING ACKs sent


 include/linux/ktime.h|5 ++
 include/trace/events/rxrpc.h |   61 ++
 net/rxrpc/ar-internal.h  |   47 -
 net/rxrpc/call_event.c   |   56 ++--
 net/rxrpc/conn_object.c  |1 
 net/rxrpc/input.c|  100 ++--
 net/rxrpc/misc.c |   25 ++---
 net/rxrpc/output.c   |  117 --
 net/rxrpc/peer_event.c   |   41 +++
 net/rxrpc/peer_object.c  |1 
 net/rxrpc/rxkad.c|8 +--
 net/rxrpc/sendmsg.c  |   56 
 net/rxrpc/sysctl.c   |2 -
 13 files changed, 390 insertions(+), 130 deletions(-)



Re: [PATCHv3 net-next 0/2] Preparation for mv88e6390

2016-09-22 Thread David Miller
From: Andrew Lunn 
Date: Wed, 21 Sep 2016 01:40:30 +0200

> These two patches are a couple of preparation steps for supporting the
> the MV88E6390 family of chips. This is a new generation from Marvell,
> and will need more feature flags than are currently available in an
> unsigned long. Expand to an unsigned long long. The MV88E6390 also
> places its port registers somewhere else, so add a wrapper around port
> register access.
> 
> v2:
>  Rework wrappers to use mv88e6xxx_{read|write}
>  Simpliy some (err < ) to (err)
> Add Reviewed by tag.
> 
> v3::
>  reg = reg & foo -> reg &= foo
>  Fix over zealous s/ret/err

Series applied, thanks.


Re: pull-request: can 2016-09-21

2016-09-22 Thread David Miller
From: Marc Kleine-Budde 
Date: Wed, 21 Sep 2016 10:43:54 +0200

> this is another pull request of one patch for the upcoming linux-4.8 release.
> 
> Marek Vasut fixes the CAN-FD bit rate switch in the ifi driver by configuring
> the transmitter delay.

Pulled, thanks.


Re: [patch net-next 6/6] doc: update switchdev L3 section

2016-09-22 Thread Ido Schimmel
On Wed, Sep 21, 2016 at 01:53:14PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> This is to reflect the change of FIB offload infrastructure from
> switchdev objects to FIB notifier.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  Documentation/networking/switchdev.txt | 27 ++-
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/networking/switchdev.txt 
> b/Documentation/networking/switchdev.txt
> index 44235e8..c956ab8 100644
> --- a/Documentation/networking/switchdev.txt
> +++ b/Documentation/networking/switchdev.txt
> @@ -314,30 +314,29 @@ the kernel, with the device doing the FIB lookup and 
> forwarding.  The device
>  does a longest prefix match (LPM) on FIB entries matching route prefix and
>  forwards the packet to the matching FIB entry's nexthop(s) egress ports.
>  
> -To program the device, the driver implements support for
> -SWITCHDEV_OBJ_IPV[4|6]_FIB object using switchdev_port_obj_xxx ops.
> -switchdev_port_obj_add is used for both adding a new FIB entry to the device,
> -or modifying an existing entry on the device.
> +To program the device, the driver has to register a FIB notifier handler
> +using register_fib_notifier. There are following events available:

"The following events are available:" maybe?

> +FIB_EVENT_ENTRY_ADD: used for both adding a new FIB entry to the device,
> + or modifying an existing entry on the device.
> +FIB_EVENT_ENTRY_DEL: used for removing a FIB entry
> +FIB_EVENT_RULE_ADD, FIB_EVENT_RULE_DEL: used to propagate FIB rule changes
>  
> -XXX: Currently, only SWITCHDEV_OBJ_ID_IPV4_FIB objects are supported.
> +FIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:
>  
> -SWITCHDEV_OBJ_ID_IPV4_FIB object passes:
> -
> - struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
> + struct fib_entry_notifier_info {
> + struct fib_notifier_info info; /* must be first */
>   u32 dst;
>   int dst_len;
>   struct fib_info *fi;
>   u8 tos;
>   u8 type;
> - u32 nlflags;
>   u32 tb_id;
> - } ipv4_fib;
> + u32 nlflags;
> + };
>  
>  to add/modify/delete IPv4 dst/dest_len prefix on table tb_id.  The *fi
>  structure holds details on the route and route's nexthops.  *dev is one of 
> the
> -port netdevs mentioned in the routes next hop list.  If the output port 
> netdevs
> -referenced in the route's nexthop list don't all have the same switch ID, the
> -driver is not called to add/modify/delete the FIB entry.
> +port netdevs mentioned in the routes next hop list.

s/routes/route's/ ?

Reviewed-by: Ido Schimmel 

Thanks!

>  
>  Routes offloaded to the device are labeled with "offload" in the ip route
>  listing:
> @@ -355,6 +354,8 @@ listing:
>   12.0.0.4 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
>   192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15
>  
> +The "offload" flag is set in case at least one device offloads the FIB entry.
> +
>  XXX: add/mod/del IPv6 FIB API
>  
>  Nexthop Resolution
> -- 
> 2.5.5
> 


Re: [PATCH ipsec-next] xfrm: state lookup can be lockless

2016-09-22 Thread Steffen Klassert
On Tue, Sep 20, 2016 at 03:45:26PM +0200, Florian Westphal wrote:
> This is called from the packet input path, we get lock contention
> if many cpus handle ipsec in parallel.
> 
> After recent rcu conversion it is safe to call __xfrm_state_lookup
> without the spinlock.
> 
> Signed-off-by: Florian Westphal 

Applied to ipsec-next, thanks a lot!


[GIT] Networking

2016-09-22 Thread David Miller

Mostly small bits scattered all over the place, which is usually
how things go this late in the -rc series.

1) Proper driver init device resets in bnx2, from Baoquan He.

2) Fix accounting overflow in __tcp_retransmit_skb(), sk_forward_alloc,
   and ip_idents_reserve, from Eric Dumazet.

3) Fix crash in bna driver ethtool stats handling, from Ivan Vecera.

4) Missing check of skb_linearize() return value in mac80211, from
   Johannes Berg.

5) Endianness fix in nf_table_trace dumps, from Liping Zhang.

6) SSN comparison fix in SCTP, from Marcelo Ricardo Leitner.

7) Update DSA and b44 MAINTAINERS entries.

8) Make input path of vti6 driver work again, from Nicolas Dichtel.

9) Off-by-one in mlx4, from Sebastian Ott.

10) Fix fallback route lookup handling in ipv6, from Vincent Bernat.

11) Fix stack corruption on probe in qed driver, from Yuval Mintz.

12) PHY init fixes in r8152 from Hayes Wang.

13) Missing SKB free in irda_accept error path, from Phil Turnbull.

Please pull, thanks a lot!

The following changes since commit da499f8f5385c181e29978fdaab15a58de185302:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-09-12 
07:56:06 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 7e32b44361abc77fbc01f2b97b045c405b2583e5:

  tcp: properly account Fast Open SYN-ACK retrans (2016-09-22 03:33:01 -0400)


Andrew Lunn (1):
  MAINTAINERS: Add an entry for the core network DSA code

Baoquan He (1):
  bnx2: Reset device during driver initialization

Beni Lev (1):
  iwlwifi: mvm: update TX queue before making a copy of the skb

Christophe Jaillet (1):
  drivers: net: phy: xgene: Fix 'remove' function

David S. Miller (11):
  Merge git://git.kernel.org/.../pablo/nf
  Merge tag 'mac80211-for-davem-2016-09-13' of 
git://git.kernel.org/.../jberg/mac80211
  Merge tag 'batadv-net-for-davem-20160914' of 
git://git.open-mesh.org/linux-merge
  Merge branch 'qeth-fixes'
  Merge tag 'mac80211-for-davem-2016-09-16' of 
git://git.kernel.org/.../jberg/mac80211
  Merge branch 'mlx5-fixes'
  Merge tag 'linux-can-fixes-for-4.8-20160919' of 
git://git.kernel.org/.../mkl/linux-can
  Merge branch 'r8152-phy-fixes'
  Merge tag 'wireless-drivers-for-davem-2016-09-20' of 
git://git.kernel.org/.../kvalo/wireless-drivers
  Merge tag 'linux-can-fixes-for-4.8-20160921' of 
git://git.kernel.org/.../mkl/linux-can
  Merge branch 'master' of git://git.kernel.org/.../klassert/ipsec

Eric Dumazet (3):
  tcp: fix overflow in __tcp_retransmit_skb()
  net: avoid sk_forward_alloc overflows
  net: get rid of an signed integer overflow in ip_idents_reserve()

Fabio Estevam (1):
  can: flexcan: fix resume function

Felix Fietkau (2):
  mac80211: fix tim recalculation after PS response
  mac80211: fix sequence number assignment for PS response frames

Filipe Manco (1):
  xen-netback: fix error handling on netback_probe()

Gao Feng (1):
  netfilter: synproxy: Check oom when adding synproxy and seqadj ct 
extensions

Giuseppe CAVALLARO (1):
  stmmac: fix PWRDWN into the PMT register for global unicast.

Hans Wippel (1):
  qeth: restore device features after recovery

Hariprasad Shenai (1):
  cxgb4/cxgb4vf: Allocate more queues for 25G and 100G adapter

Ilan Tayari (1):
  xfrm: Fix memory leak of aead algorithm name

Ivan Mikhaylov (2):
  net/ibm/emac: add set mac addr callback
  net/ibm/emac: add mutex to 'set multicast list'

Ivan Vecera (2):
  bna: add missing per queue ethtool stat
  bna: fix crash in bnad_get_strings()

Johannes Berg (3):
  nl80211: validate number of probe response CSA counters
  mac80211: check skb_linearize() return value
  mac80211: reject TSPEC TIDs (TSIDs) for aggregation

Kalle Valo (1):
  Merge tag 'iwlwifi-for-kalle-2016-09-15' of 
git://git.kernel.org/.../iwlwifi/iwlwifi-fixes

Kamal Heib (1):
  net/mlx4_core: Fix to clean devlink resources

Linus Lüssing (1):
  batman-adv: fix elp packet data reservation

Liping Zhang (2):
  netfilter: nf_tables_trace: fix endiness when dump chain policy
  netfilter: nft_chain_route: re-route before skb is queued to userspace

Marcelo Ricardo Leitner (1):
  sctp: fix SSN comparision

Marek Vasut (1):
  net: can: ifi: Configure transmitter delay

Mark Tomlinson (1):
  net: VRF: Pass original iif to ip_route_input()

Mathias Krause (1):
  xfrm_user: propagate sec ctx allocation errors

Michael Chan (1):
  MAINTAINERS: Update b44 maintainer.

Nicolas Dichtel (1):
  vti6: fix input path

Nikolay Aleksandrov (1):
  ipmr, ip6mr: return lastuse relative to now

Or Gerlitz (2):
  net/mlx5: E-Switch, Fix error flow in the SRIOV e-switch init code
  net/mlx5: E-Switch, Handle mode change failures

Pablo Neira Ayuso (1):
  

[PATCH nf v4] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack

2016-09-22 Thread fgao
From: Gao Feng 

It is valid that the TCP RST packet which does not set ack flag, and bytes
of ack number are zero. But current seqadj codes would adjust the "0" ack
to invalid ack number. Actually seqadj need to check the ack flag before
adjust it for these RST packets.

The following is my test case

client is 10.26.98.245, and add one iptable rule:
iptables  -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
--connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
tcp-reset
This iptables rule could generate on TCP RST without ack flag.

server:10.172.135.55
Enable the synproxy with seqadjust by the following iptables rules
iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
-m tcp --syn -j CT --notrack

iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
--mss 1460
iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT

The following is my test result.

1. packet trace on client
root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
nop,wscale 7], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
options [nop,nop,TS val 452367885 ecr 15643479], length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
options [nop,nop,TS val 15643479 ecr 452367885], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
win 0, length 0

2. seqadj log on server
[62873.867319] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.867644] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.869040] Adjusting sequence number from 3695959830->3695959830,
ack from 0->55618628

To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
one TCP RST packet without ack.

Signed-off-by: Gao Feng 
---
 v4: Don't invoke nf_ct_sack_adjust when no ack flag
 v3: Add the reproduce steps and packet trace
 v2: Regenerate because the first patch is removed
 v1: Initial patch

 net/netfilter/nf_conntrack_seqadj.c | 37 -
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/nf_conntrack_seqadj.c 
b/net/netfilter/nf_conntrack_seqadj.c
index dff0f0c..80ab429 100644
--- a/net/netfilter/nf_conntrack_seqadj.c
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -169,7 +169,7 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
s32 seqoff, ackoff;
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
struct nf_ct_seqadj *this_way, *other_way;
-   int res;
+   int res = 1;
 
this_way  = >seq[dir];
other_way = >seq[!dir];
@@ -184,27 +184,30 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
else
seqoff = this_way->offset_before;
 
-   if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
- other_way->correction_pos))
-   ackoff = other_way->offset_after;
-   else
-   ackoff = other_way->offset_before;
-
newseq = htonl(ntohl(tcph->seq) + seqoff);
-   newack = htonl(ntohl(tcph->ack_seq) - ackoff);
-
inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
-   inet_proto_csum_replace4(>check, skb, tcph->ack_seq, newack,
-false);
+   pr_debug("Adjusting sequence number from %u->%u\n",
+ntohl(tcph->seq), ntohl(newseq));
+   tcph->seq = newseq;
 
-   pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
-ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
-ntohl(newack));
+   if (likely(tcph->ack)) {
+   if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
+ other_way->correction_pos))
+   ackoff = other_way->offset_after;
+   else
+   ackoff = other_way->offset_before;
 
-   tcph->seq = newseq;
-   tcph->ack_seq = newack;
+   newack = htonl(ntohl(tcph->ack_seq) - ackoff);
+   inet_proto_csum_replace4(>check, skb, tcph->ack_seq,
+newack, false);
+   pr_debug("Adjusting ack number from %u->%u, ack from %u->%u\n",
+ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),

[PATCH nf v5] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack

2016-09-22 Thread fgao
From: Gao Feng 

It is valid that the TCP RST packet which does not set ack flag, and bytes
of ack number are zero. But current seqadj codes would adjust the "0" ack
to invalid ack number. Actually seqadj need to check the ack flag before
adjust it for these RST packets.

The following is my test case

client is 10.26.98.245, and add one iptable rule:
iptables  -I INPUT -p tcp --sport 12345 -m connbytes --connbytes 2:
--connbytes-dir reply --connbytes-mode packets -j REJECT --reject-with
tcp-reset
This iptables rule could generate on TCP RST without ack flag.

server:10.172.135.55
Enable the synproxy with seqadjust by the following iptables rules
iptables -t raw -A PREROUTING -i eth0 -p tcp -d 10.172.135.55 --dport 12345
-m tcp --syn -j CT --notrack

iptables -A INPUT -i eth0 -p tcp -d 10.172.135.55 --dport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7
--mss 1460
iptables -A OUTPUT -o eth0 -p tcp -s 10.172.135.55 --sport 12345 -m conntrack
--ctstate INVALID,UNTRACKED -m tcp --tcp-flags SYN,RST,ACK SYN,ACK -j ACCEPT

The following is my test result.

1. packet trace on client
root@routers:/tmp# tcpdump -i eth0 tcp port 12345 -n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [S], seq 3695959829,
win 29200, options [mss 1460,sackOK,TS val 452367884 ecr 0,nop,wscale 7],
length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [S.], seq 546723266,
ack 3695959830, win 0, options [mss 1460,sackOK,TS val 15643479 ecr 452367884,
nop,wscale 7], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [.], ack 1, win 229,
options [nop,nop,TS val 452367885 ecr 15643479], length 0
IP 10.172.135.55.12345 > 10.26.98.245.45154: Flags [.], ack 1, win 226,
options [nop,nop,TS val 15643479 ecr 452367885], length 0
IP 10.26.98.245.45154 > 10.172.135.55.12345: Flags [R], seq 3695959830,
win 0, length 0

2. seqadj log on server
[62873.867319] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.867644] Adjusting sequence number from 602341895->546723267,
ack from 3695959830->3695959830
[62873.869040] Adjusting sequence number from 3695959830->3695959830,
ack from 0->55618628

To summarize, it is clear that the seqadj codes adjust the 0 ack when receive
one TCP RST packet without ack.

Signed-off-by: Gao Feng 
---
 v5: Use goto to decrease the patch size
 v4: Don't invoke nf_ct_sack_adjust when no ack flag
 v3: Add the reproduce steps and packet trace
 v2: Regenerate because the first patch is removed
 v1: Initial patch

 net/netfilter/nf_conntrack_seqadj.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_conntrack_seqadj.c 
b/net/netfilter/nf_conntrack_seqadj.c
index dff0f0c..08d0640 100644
--- a/net/netfilter/nf_conntrack_seqadj.c
+++ b/net/netfilter/nf_conntrack_seqadj.c
@@ -169,7 +169,7 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
s32 seqoff, ackoff;
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
struct nf_ct_seqadj *this_way, *other_way;
-   int res;
+   int res = 1;
 
this_way  = >seq[dir];
other_way = >seq[!dir];
@@ -184,27 +184,31 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
else
seqoff = this_way->offset_before;
 
+   newseq = htonl(ntohl(tcph->seq) + seqoff);
+   inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
+   pr_debug("Adjusting sequence number from %u->%u\n",
+ntohl(tcph->seq), ntohl(newseq));
+   tcph->seq = newseq;
+
+   if (unlikely(!tcph->ack))
+   goto out;
+
if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
  other_way->correction_pos))
ackoff = other_way->offset_after;
else
ackoff = other_way->offset_before;
 
-   newseq = htonl(ntohl(tcph->seq) + seqoff);
newack = htonl(ntohl(tcph->ack_seq) - ackoff);
-
-   inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
inet_proto_csum_replace4(>check, skb, tcph->ack_seq, newack,
 false);
-
-   pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
+   pr_debug("Adjusting ack number from %u->%u, ack from %u->%u\n",
 ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
 ntohl(newack));
-
-   tcph->seq = newseq;
tcph->ack_seq = newack;
 
res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);
+out:
spin_unlock_bh(>lock);
 
return res;
-- 
1.9.1



Re: [PATCH net-next V2 0/8] mlx5e XDP support

2016-09-22 Thread David Miller
From: Tariq Toukan 
Date: Wed, 21 Sep 2016 12:19:41 +0300

> This series adds XDP support in mlx5e driver.
> This includes the use cases: XDP_DROP, XDP_PASS, and XDP_TX.
> 
> Single stream performance tests show 16.5 Mpps for XDP_DROP,
> and 12.4 Mpps for XDP_TX, with nice scalability for multiple streams/rings.
> 
> This rate of XDP_DROP is lower than the 32 Mpps we got in previous
> implementation, when Striding RQ was used.
> 
> We moved to non-Striding RQ, as some XDP_TX requirements (like headroom,
> packet-per-page) cannot be satisfied with the current Striding RQ HW,
> and we decided to fully support both DROP/TX.
> 
> Few directions are considered in order to enable the faster rate for XDP_DROP,
> e.g a possibility for users to enable Striding RQ so they choose optimized
> XDP_DROP on the price of partial XDP_TX functionality, or some HW changes.
> 
> Series generated against net-next commit:
> cf714ac147e0 'ipvlan: Fix dependency issue'

Series applied, thanks.


Re: [PATCH net-next v2 0/6] ftgmac100 support for ast2500

2016-09-22 Thread David Miller
From: Joel Stanley 
Date: Thu, 22 Sep 2016 08:34:57 +0930

> Hello Dave,
> 
> This series adds support to the ftgmac100 driver for the Aspeed ast2400 and
> ast2500 SoCs. In particular, they ensure the driver works correctly on the
> ast2500 where the MAC block has seen some changes in register layout.
> 
> They have been tested on ast2400 and ast2500 systems with the NCSI stack and
> with a directly attached PHY.
> 
> V2 reworks the two patches relating to PHYSTS_CHG into the one patch that
> disables the interrupt instead of playing with interrupt sensitivity. I kept
> patch 4 'net/faraday: Clear stale interrupts' which was first introduced to
> clear the stale PHYSTS_CHG interrupt, as it helps keep us safe from unhygienic
> (vendor) bootloaders.

Series applied, thanks.


Re: [PATCH net-next] net: ethernet: mediatek: fix missing changes merged for conflicts overlapping commits

2016-09-22 Thread David Miller
From: 
Date: Tue, 20 Sep 2016 23:53:24 +0800

> From: Sean Wang 
> 
> add the missing commits about
> 1)
> Commit d3bd1ce4db8e843dce421e2f8f123e5251a9c7d3
> ("remove redundant free_irq for devm_request_ir allocated irq")
> 2)
> Commit 7c6b0d76fa02213393815e3b6d5e4a415bf3f0e2
> ("fix logic unbalance between probe and remove")
> 
> during merge for conflicts overlapping commits by
> Commit b20b378d49926b82c0a131492fa8842156e0e8a9
> ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
> 
> Signed-off-by: Sean Wang 

Applied, thanks for fixing this up for me.


Re: [PATCH 1/2] net: ethernet: hisilicon: hns: use phydev from struct net_device

2016-09-22 Thread David Miller
From: Philippe Reynes 
Date: Tue, 20 Sep 2016 22:30:11 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 2/2] net: ethernet: hisilicon: hns: use new api ethtool_{get|set}_link_ksettings

2016-09-22 Thread David Miller
From: Philippe Reynes 
Date: Tue, 20 Sep 2016 22:30:12 +0200

> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [patch net-next 5/6] switchdev: remove FIB offload infrastructure

2016-09-22 Thread Ido Schimmel
On Wed, Sep 21, 2016 at 01:53:13PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Since this is now taken care of by FIB notifier, remove the code, with
> all unused dependencies.
> 
> Signed-off-by: Jiri Pirko 

[...]

> -static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi)
> -{
> - struct switchdev_attr attr = {
> - .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID,
> - };
> - struct switchdev_attr prev_attr;
> - struct net_device *dev = NULL;
> - int nhsel;
> -
> - ASSERT_RTNL();
> -
> - /* For this route, all nexthop devs must be on the same switch. */
> -
> - for (nhsel = 0; nhsel < fi->fib_nhs; nhsel++) {
> - const struct fib_nh *nh = >fib_nh[nhsel];
> -
> - if (!nh->nh_dev)
> - return NULL;
> -
> - dev = switchdev_get_lowest_dev(nh->nh_dev);
> - if (!dev)
> - return NULL;
> -
> - attr.orig_dev = dev;
> - if (switchdev_port_attr_get(dev, ))
> - return NULL;
> -
> - if (nhsel > 0 &&
> - !netdev_phys_item_id_same(_attr.u.ppid, ))
> - return NULL;
> -
> - prev_attr = attr;
> - }
> -
> - return dev;
> -}

[...]

> -int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
> -u8 tos, u8 type, u32 nlflags, u32 tb_id)
> -{
> - struct switchdev_obj_ipv4_fib ipv4_fib = {
> - .obj.id = SWITCHDEV_OBJ_ID_IPV4_FIB,
> - .dst = dst,
> - .dst_len = dst_len,
> - .fi = fi,
> - .tos = tos,
> - .type = type,
> - .nlflags = nlflags,
> - .tb_id = tb_id,
> - };
> - struct net_device *dev;
> - int err = 0;
> -
> - /* Don't offload route if using custom ip rules or if
> -  * IPv4 FIB offloading has been disabled completely.
> -  */
> -
> -#ifdef CONFIG_IP_MULTIPLE_TABLES
> - if (fi->fib_net->ipv4.fib_has_custom_rules)
> - return 0;
> -#endif
> -
> - if (fi->fib_net->ipv4.fib_offload_disabled)
> - return 0;
> -
> - dev = switchdev_get_dev_by_nhs(fi);

Since this is now removed I believe we should perform this check inside
the drivers. For mlxsw we can simply iterate over the nexthops and make
sure each has a RIF.

> - if (!dev)
> - return 0;
> -
> - ipv4_fib.obj.orig_dev = dev;
> - err = switchdev_port_obj_add(dev, _fib.obj);
> - if (!err)
> - fib_info_offload_inc(fi);
> -
> - return err == -EOPNOTSUPP ? 0 : err;
> -}


Re: [PATCH net] tcp: fix under-accounting retransmit SNMP counters

2016-09-22 Thread David Miller
From: Yuchung Cheng 
Date: Wed, 21 Sep 2016 16:16:14 -0700

> This patch fixes these under-accounting SNMP rtx stats
> LINUX_MIB_TCPFORWARDRETRANS
> LINUX_MIB_TCPFASTRETRANS
> LINUX_MIB_TCPSLOWSTARTRETRANS
> when retransmitting TSO packets
> 
> Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time")
> Signed-off-by: Yuchung Cheng 

Applied.


Re: [PATCH net] tcp: properly account Fast Open SYN-ACK retrans

2016-09-22 Thread David Miller
From: Yuchung Cheng 
Date: Wed, 21 Sep 2016 16:16:15 -0700

> Since the TFO socket is accepted right off SYN-data, the socket
> owner can call getsockopt(TCP_INFO) to collect ongoing SYN-ACK
> retransmission or timeout stats (i.e., tcpi_total_retrans,
> tcpi_retransmits). Currently those stats are only updated
> upon handshake completes. This patch fixes it.
> 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Soheil Hassas Yeganeh 

Applied.


Re: [PATCH nf v3] netfilter: seqadj: Fix the wrong ack adjust for the RST packet without ack

2016-09-22 Thread Pablo Neira Ayuso
On top of Eric's comments.

On Thu, Sep 22, 2016 at 10:22:45AM +0800, f...@ikuai8.com wrote:
> diff --git a/net/netfilter/nf_conntrack_seqadj.c 
> b/net/netfilter/nf_conntrack_seqadj.c
> index dff0f0c..3bd9c7e 100644
> --- a/net/netfilter/nf_conntrack_seqadj.c
> +++ b/net/netfilter/nf_conntrack_seqadj.c
> @@ -179,30 +179,34 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
>  
>   tcph = (void *)skb->data + protoff;
>   spin_lock_bh(>lock);
> +
>   if (after(ntohl(tcph->seq), this_way->correction_pos))
>   seqoff = this_way->offset_after;
>   else
>   seqoff = this_way->offset_before;
>  
> - if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
> -   other_way->correction_pos))
> - ackoff = other_way->offset_after;
> - else
> - ackoff = other_way->offset_before;
> -
>   newseq = htonl(ntohl(tcph->seq) + seqoff);
> - newack = htonl(ntohl(tcph->ack_seq) - ackoff);
> -
>   inet_proto_csum_replace4(>check, skb, tcph->seq, newseq, false);
> - inet_proto_csum_replace4(>check, skb, tcph->ack_seq, newack,
> -  false);
> -
> - pr_debug("Adjusting sequence number from %u->%u, ack from %u->%u\n",
> -  ntohl(tcph->seq), ntohl(newseq), ntohl(tcph->ack_seq),
> -  ntohl(newack));
>  
> + pr_debug("Adjusting sequence number from %u->%u\n",
> +  ntohl(tcph->seq), ntohl(newseq));
>   tcph->seq = newseq;
> - tcph->ack_seq = newack;
> +
> + if (likely(tcph->ack)) {

I'd suggest:

if (!tcph->ack)
goto out;

given gcc sets goto branch as unlikely already, then you place an "out"
label...

> + if (after(ntohl(tcph->ack_seq) - other_way->offset_before,
> +   other_way->correction_pos))
> + ackoff = other_way->offset_after;
> + else
> + ackoff = other_way->offset_before;
> +
> + newack = htonl(ntohl(tcph->ack_seq) - ackoff);
> + inet_proto_csum_replace4(>check, skb, tcph->ack_seq,
> +  newack, false);
> +
> + pr_debug("Adjusting ack number from %u->%u\n",
> +  ntohl(tcph->ack_seq), ntohl(newack));
> + tcph->ack_seq = newack;
> + }
>  
>   res = nf_ct_sack_adjust(skb, protoff, tcph, ct, ctinfo);

out:<- here

>   spin_unlock_bh(>lock);

This will get you a smaller patch fix.


Re: [PATCH] ptp_clock: future-proofing drivers against PTP subsystem becoming optional

2016-09-22 Thread David Miller
From: Nicolas Pitre 
Date: Tue, 20 Sep 2016 19:25:58 -0400 (EDT)

> 
> Drivers must be ready to accept NULL from ptp_clock_register() if the
> PTP clock subsystem is configured out.
> 
> This patch documents that and ensures that all drivers cope well
> with a NULL return.
> 
> Signed-off-by: Nicolas Pitre 
> Reviewed-by: Eugenia Emantayev 
> 
> ---
> 
> Let's have the basics merged now and work out the actual Kconfig issue 
> separately. Richard, if you agree with this patch, I think this could go 
> via the netdev tree.

Applied to net-next, thanks.


Re: [patch net-next 3/6] mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls

2016-09-22 Thread Ido Schimmel
On Wed, Sep 21, 2016 at 01:53:11PM +0200, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Until now, in order to offload a FIB entry to HW we use switchdev op.
> However that has limits. Mainly in case we need to make the HW aware of
> all route prefixes configured in kernel. HW needs to know those in order
> to properly trap appropriate packets and pass the to kernel to do
> the forwarding. Abort mechanism is now handled within the mlxsw driver.

FWIW, I think it's smart to move abort into the driver instead of
flushing all the routes from the namespace as before.

> 
> Signed-off-by: Jiri Pirko 

[...]

> +static void mlxsw_sp_router_fib4_abort(struct mlxsw_sp *mlxsw_sp)
> +{
> + char ralue_pl[MLXSW_REG_RALUE_LEN];
> + struct mlxsw_resources *resources;
> + struct mlxsw_sp_fib_entry *fib_entry;
> + struct mlxsw_sp_fib_entry *tmp;
> + struct mlxsw_sp_vr *vr;
> + int i;
> + int err;
> +
> + resources = mlxsw_core_resources_get(mlxsw_sp->core);
> + for (i = 0; i < resources->max_virtual_routers; i++) {
> + vr = _sp->router.vrs[i];
> + if (!vr->used)
> + continue;
> +
> + list_for_each_entry_safe(fib_entry, tmp,
> +  >fib->entry_list, list) {
> + fib_info_offload_dec(fib_entry->fi);
> + mlxsw_sp_fib_entry_del(mlxsw_sp, fib_entry);
> + mlxsw_sp_fib_entry_remove(fib_entry->vr->fib,
> +   fib_entry);
> + mlxsw_sp_fib_entry_put_all(mlxsw_sp, fib_entry);

If we now do the routing in slow path, then maybe it makes sense to also
flush all the neighbour entries and prevent new neighbours from being
programmed into the device?

> + }
> + }
> + mlxsw_sp->router.aborted = true;
> +
> + mlxsw_reg_ralue_pack4(ralue_pl, MLXSW_SP_L3_PROTO_IPV4,
> +   MLXSW_REG_RALUE_OP_WRITE_WRITE, 0, 0, 0);

I'm not sure about that, but the loop above removed all the tables from
the device and now you are using table 0 again. Will this work w/o
binding some tree to it (0?)?

> + mlxsw_reg_ralue_act_ip2me_pack(ralue_pl);
> + err = mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ralue), ralue_pl);
> + if (err)
> + dev_warn(mlxsw_sp->bus_info->dev, "Failed to set abort 
> trap.\n");
> +}

Thanks


Re: [PATCH next v3 0/2] Rename WORD_TRUNC/ROUND macros and use them

2016-09-22 Thread David Miller
From: Marcelo Ricardo Leitner 
Date: Wed, 21 Sep 2016 08:45:54 -0300

> This patchset aims to rename these macros to a non-confusing name, as
> reported by David Laight and David Miller, and to update all remaining
> places to make use of it, which was 1 last remaining spot.
> 
> v3:
> - Name it SCTP_PAD4 instead of SCTP_ALIGN4, as suggested by David Laight
> v2:
> - fixed 2nd patch summary
> 
> Details on the specific changelogs.

Looks good, applied, thanks!


Re: [PATCH -next] net: dsa: qca8k: fix non static symbol warning

2016-09-22 Thread David Miller
From: Wei Yongjun 
Date: Wed, 21 Sep 2016 15:04:43 +

> From: Wei Yongjun 
> 
> Fixes the following sparse warning:
> 
> drivers/net/dsa/qca8k.c:259:22: warning:
>  symbol 'qca8k_regmap_config' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH -next] cxgb4: Convert to use simple_open()

2016-09-22 Thread David Miller
From: Wei Yongjun 
Date: Wed, 21 Sep 2016 15:09:16 +

> From: Wei Yongjun 
> 
> Remove an open coded simple_open() function and replace file
> operations references to the function with simple_open()
> instead.
> 
> Generated by: scripts/coccinelle/api/simple_open.cocci
> 
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH] net: fec: set mac address unconditionally

2016-09-22 Thread Uwe Kleine-König
Hello,

just a few nitpicks:

On Wed, Sep 21, 2016 at 03:30:55PM +0200, Gavin Schenk wrote:
> Fixes: 9638d19e4816 ("net: fec: add netif status check before set mac 
> address")

This line belongs to in the S-o-B area below.

> If the mac address origin is not dt, you can only safe assign a

s/safe/safely/

> mac address after "link up" of the device. If the link is down the
> clocks are disabled and because of issues assigning registers when
> clocks are down the new mac address is discarded on some soc's. This fix

s/down/off/; s/is discarded/cannot be written in .ndo_set_mac_address()/

> sets the mac address unconditionally in fec_restart(...) and ensures
> consistens between fec registers and the network layer.

s/consistens/consistency/

Other than that:

Acked-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH net] net: get rid of an signed integer overflow in ip_idents_reserve()

2016-09-22 Thread David Miller
From: Eric Dumazet 
Date: Tue, 20 Sep 2016 18:06:17 -0700

> From: Eric Dumazet 
> 
> Jiri Pirko reported an UBSAN warning happening in ip_idents_reserve()
> 
> [] UBSAN: Undefined behaviour in ./arch/x86/include/asm/atomic.h:156:11
> [] signed integer overflow:
> [] -2117905507 + -695755206 cannot be represented in type 'int'
> 
> Since we do not have uatomic_add_return() yet, use atomic_cmpxchg()
> so that the arithmetics can be done using unsigned int.
> 
> Fixes: 04ca6973f7c1 ("ip: make IP identifiers less predictable")
> Signed-off-by: Eric Dumazet 
> Reported-by: Jiri Pirko 
> ---
> David, Jiri, I removed the prandom_u32() stuff in favor of a traditional
> loop to meet stable requirements. Thanks !

Applied.


[PATCH net-next 7/9] rxrpc: Obtain RTT data by requesting ACKs on DATA packets [ver #2]

2016-09-22 Thread David Howells
In addition to sending a PING ACK to gain RTT data, we can set the
RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK.  The
ACK packet contains the serial number of the packet it is in response to,
so we can look through the Tx buffer for a matching DATA packet.

This requires that the data packets be stamped with the time of
transmission as a ktime rather than having the resend_at time in jiffies.

This further requires the resend code to do the resend determination in
ktimes and convert to jiffies to set the timer.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |7 +++
 net/rxrpc/call_event.c  |   19 +--
 net/rxrpc/input.c   |   35 +++
 net/rxrpc/misc.c|6 --
 net/rxrpc/output.c  |7 +--
 net/rxrpc/sendmsg.c |1 -
 net/rxrpc/sysctl.c  |2 +-
 7 files changed, 57 insertions(+), 20 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 8b47f468eb9d..1c4597b2c6cd 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -142,10 +142,7 @@ struct rxrpc_host_header {
  */
 struct rxrpc_skb_priv {
union {
-   unsigned long   resend_at;  /* time in jiffies at which to 
resend */
-   struct {
-   u8  nr_jumbo;   /* Number of jumbo subpackets */
-   };
+   u8  nr_jumbo;   /* Number of jumbo subpackets */
};
union {
unsigned intoffset; /* offset into buffer of next 
read */
@@ -663,6 +660,7 @@ extern const char 
rxrpc_recvmsg_traces[rxrpc_recvmsg__nr_trace][5];
 
 enum rxrpc_rtt_tx_trace {
rxrpc_rtt_tx_ping,
+   rxrpc_rtt_tx_data,
rxrpc_rtt_tx__nr_trace
 };
 
@@ -670,6 +668,7 @@ extern const char 
rxrpc_rtt_tx_traces[rxrpc_rtt_tx__nr_trace][5];
 
 enum rxrpc_rtt_rx_trace {
rxrpc_rtt_rx_ping_response,
+   rxrpc_rtt_rx_requested_ack,
rxrpc_rtt_rx__nr_trace
 };
 
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 34ad967f2d81..adb2ec61e21f 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -142,12 +142,14 @@ static void rxrpc_resend(struct rxrpc_call *call)
struct rxrpc_skb_priv *sp;
struct sk_buff *skb;
rxrpc_seq_t cursor, seq, top;
-   unsigned long resend_at, now;
+   ktime_t now = ktime_get_real(), max_age, oldest, resend_at;
int ix;
u8 annotation, anno_type;
 
_enter("{%d,%d}", call->tx_hard_ack, call->tx_top);
 
+   max_age = ktime_sub_ms(now, rxrpc_resend_timeout);
+
spin_lock_bh(>lock);
 
cursor = call->tx_hard_ack;
@@ -160,8 +162,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 * the packets in the Tx buffer we're going to resend and what the new
 * resend timeout will be.
 */
-   now = jiffies;
-   resend_at = now + rxrpc_resend_timeout;
+   oldest = now;
for (seq = cursor + 1; before_eq(seq, top); seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;
annotation = call->rxtx_annotations[ix];
@@ -175,9 +176,9 @@ static void rxrpc_resend(struct rxrpc_call *call)
sp = rxrpc_skb(skb);
 
if (anno_type == RXRPC_TX_ANNO_UNACK) {
-   if (time_after(sp->resend_at, now)) {
-   if (time_before(sp->resend_at, resend_at))
-   resend_at = sp->resend_at;
+   if (ktime_after(skb->tstamp, max_age)) {
+   if (ktime_before(skb->tstamp, oldest))
+   oldest = skb->tstamp;
continue;
}
}
@@ -186,7 +187,8 @@ static void rxrpc_resend(struct rxrpc_call *call)
call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS | annotation;
}
 
-   call->resend_at = resend_at;
+   resend_at = ktime_sub(ktime_add_ns(oldest, rxrpc_resend_timeout), now);
+   call->resend_at = jiffies + nsecs_to_jiffies(ktime_to_ns(resend_at));
 
/* Now go through the Tx window and perform the retransmissions.  We
 * have to drop the lock for each send.  If an ACK comes in whilst the
@@ -205,15 +207,12 @@ static void rxrpc_resend(struct rxrpc_call *call)
spin_unlock_bh(>lock);
 
if (rxrpc_send_data_packet(call, skb) < 0) {
-   call->resend_at = now + 2;
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
return;
}
 
if (rxrpc_is_client_call(call))
rxrpc_expose_client_call(call);
-   sp = rxrpc_skb(skb);
-   sp->resend_at = now + rxrpc_resend_timeout;
 
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);

[PATCH net-next 3/9] rxrpc: Add per-peer RTT tracker [ver #2]

2016-09-22 Thread David Howells
Add a function to track the average RTT for a peer.  Sources of RTT data
will be added in subsequent patches.

The RTT data will be useful in the future for determining resend timeouts
and for handling the slow-start part of the Rx protocol.

Also add a pair of tracepoints, one to log transmissions to elicit a
response for RTT purposes and one to log responses that contribute RTT
data.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   61 ++
 net/rxrpc/ar-internal.h  |   25 ++---
 net/rxrpc/misc.c |8 ++
 net/rxrpc/peer_event.c   |   41 
 4 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 75a5d8bf50e1..e8f2afbbe0bf 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -353,6 +353,67 @@ TRACE_EVENT(rxrpc_recvmsg,
  __entry->ret)
);
 
+TRACE_EVENT(rxrpc_rtt_tx,
+   TP_PROTO(struct rxrpc_call *call, enum rxrpc_rtt_tx_trace why,
+rxrpc_serial_t send_serial),
+
+   TP_ARGS(call, why, send_serial),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(enum rxrpc_rtt_tx_trace,why )
+   __field(rxrpc_serial_t, send_serial )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->why = why;
+   __entry->send_serial = send_serial;
+  ),
+
+   TP_printk("c=%p %s sr=%08x",
+ __entry->call,
+ rxrpc_rtt_tx_traces[__entry->why],
+ __entry->send_serial)
+   );
+
+TRACE_EVENT(rxrpc_rtt_rx,
+   TP_PROTO(struct rxrpc_call *call, enum rxrpc_rtt_rx_trace why,
+rxrpc_serial_t send_serial, rxrpc_serial_t resp_serial,
+s64 rtt, u8 nr, s64 avg),
+
+   TP_ARGS(call, why, send_serial, resp_serial, rtt, nr, avg),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(enum rxrpc_rtt_rx_trace,why )
+   __field(u8, nr  )
+   __field(rxrpc_serial_t, send_serial )
+   __field(rxrpc_serial_t, resp_serial )
+   __field(s64,rtt )
+   __field(u64,avg )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->why = why;
+   __entry->send_serial = send_serial;
+   __entry->resp_serial = resp_serial;
+   __entry->rtt = rtt;
+   __entry->nr = nr;
+   __entry->avg = avg;
+  ),
+
+   TP_printk("c=%p %s sr=%08x rr=%08x rtt=%lld nr=%u avg=%lld",
+ __entry->call,
+ rxrpc_rtt_rx_traces[__entry->why],
+ __entry->send_serial,
+ __entry->resp_serial,
+ __entry->rtt,
+ __entry->nr,
+ __entry->avg)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index dcf54e3fb478..79c671e552c3 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -258,10 +258,11 @@ struct rxrpc_peer {
 
/* calculated RTT cache */
 #define RXRPC_RTT_CACHE_SIZE 32
-   suseconds_t rtt;/* current RTT estimate (in uS) 
*/
-   unsigned intrtt_point;  /* next entry at which to 
insert */
-   unsigned intrtt_usage;  /* amount of cache actually 
used */
-   suseconds_t rtt_cache[RXRPC_RTT_CACHE_SIZE]; /* calculated 
RTT cache */
+   u64 rtt;/* Current RTT estimate (in nS) 
*/
+   u64 rtt_sum;/* Sum of cache contents */
+   u64 rtt_cache[RXRPC_RTT_CACHE_SIZE]; /* Determined 
RTT cache */
+   u8  rtt_cursor; /* next entry at which to 
insert */
+   u8  rtt_usage;  /* amount of cache actually 
used */
 };
 
 /*
@@ -657,6 +658,20 @@ enum rxrpc_recvmsg_trace {
 
 extern const char rxrpc_recvmsg_traces[rxrpc_recvmsg__nr_trace][5];
 
+enum rxrpc_rtt_tx_trace {
+   rxrpc_rtt_tx_ping,
+   rxrpc_rtt_tx__nr_trace
+};
+
+extern const char rxrpc_rtt_tx_traces[rxrpc_rtt_tx__nr_trace][5];
+
+enum rxrpc_rtt_rx_trace {
+   

[PATCH net-next 9/9] rxrpc: Reduce the number of PING ACKs sent [ver #2]

2016-09-22 Thread David Howells
We don't want to send a PING ACK for every new incoming call as that just
adds to the network traffic.  Instead, we send a PING ACK to the first
three that we receive and then once per second thereafter.

This could probably be made adjustable in future.

Signed-off-by: David Howells 
---

 net/rxrpc/call_event.c |2 +-
 net/rxrpc/input.c  |7 +--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index adb2ec61e21f..6e2ea8f4ae75 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -142,7 +142,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
struct rxrpc_skb_priv *sp;
struct sk_buff *skb;
rxrpc_seq_t cursor, seq, top;
-   ktime_t now = ktime_get_real(), max_age, oldest, resend_at;
+   ktime_t now = ktime_get_real(), max_age, oldest,  resend_at;
int ix;
u8 annotation, anno_type;
 
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index c121949de3c8..cbb5d53f09d7 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -44,9 +44,12 @@ static void rxrpc_send_ping(struct rxrpc_call *call, struct 
sk_buff *skb,
int skew)
 {
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+   ktime_t now = skb->tstamp;
 
-   rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
- true, true);
+   if (call->peer->rtt_usage < 3 ||
+   ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), now))
+   rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
+ true, true);
 }
 
 /*



[PATCH net-next 4/9] rxrpc: Send pings to get RTT data [ver #2]

2016-09-22 Thread David Howells
Send a PING ACK packet to the peer when we get a new incoming call from a
peer we don't have a record for.  The PING RESPONSE ACK packet will tell us
the following about the peer:

 (1) its receive window size

 (2) its MTU sizes

 (3) its support for jumbo DATA packets

 (4) if it supports slow start (similar to RFC 5681)

 (5) an estimate of the RTT

This is necessary because the peer won't normally send us an ACK until it
gets to the Rx phase and we send it a packet, but we would like to know
some of this information before we start sending packets.

A pair of tracepoints are added so that RTT determination can be observed.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |7 +--
 net/rxrpc/input.c   |   48 ++-
 net/rxrpc/misc.c|   11 ++-
 net/rxrpc/output.c  |   22 ++
 4 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 79c671e552c3..8b47f468eb9d 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -403,6 +403,7 @@ enum rxrpc_call_flag {
RXRPC_CALL_EXPOSED, /* The call was exposed to the world */
RXRPC_CALL_RX_LAST, /* Received the last packet (at 
rxtx_top) */
RXRPC_CALL_TX_LAST, /* Last packet in Tx buffer (at 
rxtx_top) */
+   RXRPC_CALL_PINGING, /* Ping in process */
 };
 
 /*
@@ -487,6 +488,8 @@ struct rxrpc_call {
u32 call_id;/* call ID on connection  */
u32 cid;/* connection ID plus channel 
index */
int debug_id;   /* debug ID for printks */
+   unsigned short  rx_pkt_offset;  /* Current recvmsg packet 
offset */
+   unsigned short  rx_pkt_len; /* Current recvmsg packet len */
 
/* Rx/Tx circular buffer, depending on phase.
 *
@@ -530,8 +533,8 @@ struct rxrpc_call {
u16 ackr_skew;  /* skew on packet being ACK'd */
rxrpc_serial_t  ackr_serial;/* serial of packet being ACK'd 
*/
rxrpc_seq_t ackr_prev_seq;  /* previous sequence number 
received */
-   unsigned short  rx_pkt_offset;  /* Current recvmsg packet 
offset */
-   unsigned short  rx_pkt_len; /* Current recvmsg packet len */
+   rxrpc_serial_t  ackr_ping;  /* Last ping sent */
+   ktime_t ackr_ping_time; /* Time last ping sent */
 
/* transmission-phase ACK management */
rxrpc_serial_t  acks_latest;/* serial number of latest ACK 
received */
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index aa261df9fc9e..a0a5bd108c9e 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -37,6 +37,19 @@ static void rxrpc_proto_abort(const char *why,
 }
 
 /*
+ * Ping the other end to fill our RTT cache and to retrieve the rwind
+ * and MTU parameters.
+ */
+static void rxrpc_send_ping(struct rxrpc_call *call, struct sk_buff *skb,
+   int skew)
+{
+   struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+
+   rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
+ true, true);
+}
+
+/*
  * Apply a hard ACK by advancing the Tx window.
  */
 static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
@@ -343,6 +356,32 @@ ack:
 }
 
 /*
+ * Process a ping response.
+ */
+static void rxrpc_input_ping_response(struct rxrpc_call *call,
+ ktime_t resp_time,
+ rxrpc_serial_t orig_serial,
+ rxrpc_serial_t ack_serial)
+{
+   rxrpc_serial_t ping_serial;
+   ktime_t ping_time;
+
+   ping_time = call->ackr_ping_time;
+   smp_rmb();
+   ping_serial = call->ackr_ping;
+
+   if (!test_bit(RXRPC_CALL_PINGING, >flags) ||
+   before(orig_serial, ping_serial))
+   return;
+   clear_bit(RXRPC_CALL_PINGING, >flags);
+   if (after(orig_serial, ping_serial))
+   return;
+
+   rxrpc_peer_add_rtt(call, rxrpc_rtt_rx_ping_response,
+  orig_serial, ack_serial, ping_time, resp_time);
+}
+
+/*
  * Process the extra information that may be appended to an ACK packet
  */
 static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
@@ -438,6 +477,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct 
sk_buff *skb,
struct rxrpc_ackinfo info;
u8 acks[RXRPC_MAXACKS];
} buf;
+   rxrpc_serial_t acked_serial;
rxrpc_seq_t first_soft_ack, hard_ack;
int nr_acks, offset;
 
@@ -449,6 +489,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct 
sk_buff *skb,
}
sp->offset += sizeof(buf.ack);
 
+   acked_serial = 

[PATCH net-next 5/9] rxrpc: Expedite ping response transmission [ver #2]

2016-09-22 Thread David Howells
Expedite the transmission of a response to a PING ACK by sending it from
sendmsg if one is pending.  We're most likely to see a PING ACK during the
client call Tx phase as the other side may use it to determine a number of
parameters, such as the client's receive window size, the RTT and whether
the client is doing slow start (similar to RFC5681).

If we don't expedite it, it's left to the background processing thread to
transmit.

Signed-off-by: David Howells 
---

 net/rxrpc/sendmsg.c |4 
 1 file changed, 4 insertions(+)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 814b17f23971..3c969de3ef05 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -180,6 +180,10 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
 
copied = 0;
do {
+   /* Check to see if there's a ping ACK to reply to. */
+   if (call->ackr_reason == RXRPC_ACK_PING_RESPONSE)
+   rxrpc_send_call_packet(call, RXRPC_PACKET_TYPE_ACK);
+
if (!skb) {
size_t size, chunk, max, space;
 



[PATCH net-next 8/9] rxrpc: Reduce the number of ACK-Requests sent [ver #2]

2016-09-22 Thread David Howells
Reduce the number of ACK-Requests we set on DATA packets that we're sending
to reduce network traffic.  We set the flag on odd-numbered DATA packets to
start off the RTT cache until we have at least three entries in it and then
probe once per second thereafter to keep it topped up.

This could be made tunable in future.

Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA
packets as we transmit them and not stored statically in the sk_buff.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/output.c  |   13 +++--
 net/rxrpc/peer_object.c |1 +
 net/rxrpc/sendmsg.c |2 --
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 1c4597b2c6cd..b13754a6dd7a 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -255,6 +255,7 @@ struct rxrpc_peer {
 
/* calculated RTT cache */
 #define RXRPC_RTT_CACHE_SIZE 32
+   ktime_t rtt_last_req;   /* Time of last RTT request */
u64 rtt;/* Current RTT estimate (in nS) 
*/
u64 rtt_sum;/* Sum of cache contents */
u64 rtt_cache[RXRPC_RTT_CACHE_SIZE]; /* Determined 
RTT cache */
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index db01fbb70d23..282cb1e36d06 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -270,6 +270,12 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct 
sk_buff *skb)
msg.msg_controllen = 0;
msg.msg_flags = 0;
 
+   /* If our RTT cache needs working on, request an ACK. */
+   if ((call->peer->rtt_usage < 3 && sp->hdr.seq & 1) ||
+   ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000),
+ktime_get_real()))
+   whdr.flags |= RXRPC_REQUEST_ACK;
+
if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) {
static int lose;
if ((lose++ & 7) == 7) {
@@ -301,11 +307,14 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, 
struct sk_buff *skb)
 
 done:
if (ret >= 0) {
-   skb->tstamp = ktime_get_real();
+   ktime_t now = ktime_get_real();
+   skb->tstamp = now;
smp_wmb();
sp->hdr.serial = serial;
-   if (whdr.flags & RXRPC_REQUEST_ACK)
+   if (whdr.flags & RXRPC_REQUEST_ACK) {
+   call->peer->rtt_last_req = now;
trace_rxrpc_rtt_tx(call, rxrpc_rtt_tx_data, serial);
+   }
}
_leave(" = %d [%u]", ret, call->peer->maxdata);
return ret;
diff --git a/net/rxrpc/peer_object.c b/net/rxrpc/peer_object.c
index f3e5766910fd..941b724d523b 100644
--- a/net/rxrpc/peer_object.c
+++ b/net/rxrpc/peer_object.c
@@ -244,6 +244,7 @@ static void rxrpc_init_peer(struct rxrpc_peer *peer, 
unsigned long hash_key)
peer->hash_key = hash_key;
rxrpc_assess_MTU_size(peer);
peer->mtu = peer->if_mtu;
+   peer->rtt_last_req = ktime_get_real();
 
switch (peer->srx.transport.family) {
case AF_INET:
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 607223f4f871..ca7c3be60ad2 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -299,8 +299,6 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
else if (call->tx_top - call->tx_hard_ack <
 call->tx_winsize)
sp->hdr.flags |= RXRPC_MORE_PACKETS;
-   if (seq & 1)
-   sp->hdr.flags |= RXRPC_REQUEST_ACK;
 
ret = conn->security->secure_packet(
call, skb, skb->mark, skb->head);



[PATCH net-next 1/9] rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs [ver #2]

2016-09-22 Thread David Howells
Don't store the rxrpc protocol header in sk_buffs on the transmit queue,
but rather generate it on the fly and pass it to kernel_sendmsg() as a
separate iov.  This reduces the amount of storage required.

Note that the security header is still stored in the sk_buff as it may get
encrypted along with the data (and doesn't change with each transmission).

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |5 +--
 net/rxrpc/call_event.c  |   11 +-
 net/rxrpc/conn_object.c |1 -
 net/rxrpc/output.c  |   83 ---
 net/rxrpc/rxkad.c   |8 ++---
 net/rxrpc/sendmsg.c |   51 +
 6 files changed, 71 insertions(+), 88 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 034f525f2235..f021df4a6a22 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -385,10 +385,9 @@ struct rxrpc_connection {
int debug_id;   /* debug ID for printks */
atomic_tserial; /* packet serial number counter 
*/
unsigned inthi_serial;  /* highest serial number 
received */
+   u32 security_nonce; /* response re-use preventer */
u8  size_align; /* data size alignment (for 
security) */
-   u8  header_size;/* rxrpc + security header size 
*/
u8  security_size;  /* security header size */
-   u32 security_nonce; /* response re-use preventer */
u8  security_ix;/* security type */
u8  out_clientflag; /* RXRPC_CLIENT_INITIATED if we 
are client */
 };
@@ -946,7 +945,7 @@ extern const s8 rxrpc_ack_priority[];
  * output.c
  */
 int rxrpc_send_call_packet(struct rxrpc_call *, u8);
-int rxrpc_send_data_packet(struct rxrpc_connection *, struct sk_buff *);
+int rxrpc_send_data_packet(struct rxrpc_call *, struct sk_buff *);
 void rxrpc_reject_packets(struct rxrpc_local *);
 
 /*
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 7d1b99824ed9..6247ce25eb21 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -139,7 +139,6 @@ void rxrpc_propose_ACK(struct rxrpc_call *call, u8 
ack_reason,
  */
 static void rxrpc_resend(struct rxrpc_call *call)
 {
-   struct rxrpc_wire_header *whdr;
struct rxrpc_skb_priv *sp;
struct sk_buff *skb;
rxrpc_seq_t cursor, seq, top;
@@ -201,15 +200,8 @@ static void rxrpc_resend(struct rxrpc_call *call)
skb = call->rxtx_buffer[ix];
rxrpc_get_skb(skb, rxrpc_skb_tx_got);
spin_unlock_bh(>lock);
-   sp = rxrpc_skb(skb);
-
-   /* Each Tx packet needs a new serial number */
-   sp->hdr.serial = atomic_inc_return(>conn->serial);
 
-   whdr = (struct rxrpc_wire_header *)skb->head;
-   whdr->serial = htonl(sp->hdr.serial);
-
-   if (rxrpc_send_data_packet(call->conn, skb) < 0) {
+   if (rxrpc_send_data_packet(call, skb) < 0) {
call->resend_at = now + 2;
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
return;
@@ -217,6 +209,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 
if (rxrpc_is_client_call(call))
rxrpc_expose_client_call(call);
+   sp = rxrpc_skb(skb);
sp->resend_at = now + rxrpc_resend_timeout;
 
rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 3b55aee0c436..e1e83af47866 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -53,7 +53,6 @@ struct rxrpc_connection *rxrpc_alloc_connection(gfp_t gfp)
spin_lock_init(>state_lock);
conn->debug_id = atomic_inc_return(_debug_id);
conn->size_align = 4;
-   conn->header_size = sizeof(struct rxrpc_wire_header);
conn->idle_timestamp = jiffies;
}
 
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 16e18a94ffa6..817fb0e82d6a 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -208,19 +208,42 @@ out:
 /*
  * send a packet through the transport endpoint
  */
-int rxrpc_send_data_packet(struct rxrpc_connection *conn, struct sk_buff *skb)
+int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb)
 {
-   struct kvec iov[1];
+   struct rxrpc_connection *conn = call->conn;
+   struct rxrpc_wire_header whdr;
+   struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
struct msghdr msg;
+   struct kvec iov[2];
+   rxrpc_serial_t serial;
+   size_t len;
int ret, opt;
 
_enter(",{%d}", skb->len);
 
-   iov[0].iov_base = skb->head;
-   iov[0].iov_len = skb->len;
+   /* Each 

[PATCH net-next] net: ethernet: mediatek: remove superfluous local variable for phy address

2016-09-22 Thread sean.wang
From: Sean Wang 

remove the unused variable for parsing PHY address
and the related logic for sanity test which would
be all already handled done when of_mdiobus_register
was called

Reported-by: Nelson Chang 
Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 6b7acf4..1918c39 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -226,17 +226,9 @@ static void mtk_phy_link_adjust(struct net_device *dev)
 static int mtk_phy_connect_node(struct mtk_eth *eth, struct mtk_mac *mac,
struct device_node *phy_node)
 {
-   const __be32 *_addr = NULL;
struct phy_device *phydev;
-   int phy_mode, addr;
+   int phy_mode;
 
-   _addr = of_get_property(phy_node, "reg", NULL);
-
-   if (!_addr || (be32_to_cpu(*_addr) >= 0x20)) {
-   pr_err("%s: invalid phy address\n", phy_node->name);
-   return -EINVAL;
-   }
-   addr = be32_to_cpu(*_addr);
phy_mode = of_get_phy_mode(phy_node);
if (phy_mode < 0) {
dev_err(eth->dev, "incorrect phy-mode %d\n", phy_mode);
-- 
1.9.1



  1   2   3   >