[PATCH net-next] ethtool: Macro definition for SFF-8436/8636 Memory map max sizes

2016-06-06 Thread vidya
From: Vidya Sagar Ravipati 

This patch provides macro definitions for maximum possible
memory map size of QSFP+/QSFP28 EEPROMs as per SFF-8436/SFF-8636

According to SFF-8436/SFF-8636 specs, the common memory map for
managing external cable interfaces is arranged into a single
lower page address space of 128 bytes and multiple upper address pages
of 128 bytes each. The total size of memory map is up to 5 128 byte pages
defined by a "pages valid" register and switched via a "page select"
register for "optional pages". Memory of 256 bytes can be memory mapped
at a time and QSFP+/QSFP28  drivers can export upto 5*128 bytes of
eeprom dump.

QSFP+/QSFP28 Memory layout
   2-Wire Serial Address: 101x
   Lower Page 00h (128 bytes)
   ==
   |Page Select Byte(127)|
   ==
|
V
   
  | || |
  V VV V
   --   --   -
  | Upper| | Upper| | Upper| | Upper  |
  | Page 00h | | Page 01h | | Page 02h | | Page 03h   |
  |  | |(Optional)| |(Optional)| | (Optional) |
  |ID| |   AST| |  User| |  For   |
  |  Fields  | |  Table   | | EEPROM   | |  Cable |
  ---  ---   --  --

Signed-off-by: Vidya Sagar Ravipati 
---
 include/uapi/linux/ethtool.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 5f030b4..ff6ccbd 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1584,9 +1584,15 @@ static inline int ethtool_validate_duplex(__u8 duplex)
 #define ETH_MODULE_SFF_84720x2
 #define ETH_MODULE_SFF_8472_LEN512
 #define ETH_MODULE_SFF_86360x3
+/* min, without optional page */
 #define ETH_MODULE_SFF_8636_LEN256
+/* max, with all optional pages */
+#define ETH_MODULE_SFF_8636_MAX_LEN640
 #define ETH_MODULE_SFF_84360x4
+/* min, without optional page */
 #define ETH_MODULE_SFF_8436_LEN256
+/* max, with all optional pages */
+#define ETH_MODULE_SFF_8436_MAX_LEN640
 
 /* Reset flags */
 /* The reset() operation must clear the flags for the components which
-- 
2.1.4



[PATCH net-next v2 1/4] bpf: fix missing header inclusion

2016-06-06 Thread Zi Shen Lim
Commit 0fc174dea545 ("ebpf: make internal bpf API independent of
CONFIG_BPF_SYSCALL ifdefs") introduced usage of ERR_PTR() in
bpf_prog_get(), however did not include linux/err.h.

Without this patch, when compiling arm64 BPF without CONFIG_BPF_SYSCALL:
...
In file included from arch/arm64/net/bpf_jit_comp.c:21:0:
include/linux/bpf.h: In function 'bpf_prog_get':
include/linux/bpf.h:235:9: error: implicit declaration of function 'ERR_PTR' 
[-Werror=implicit-function-declaration]
  return ERR_PTR(-EOPNOTSUPP);
 ^
include/linux/bpf.h:235:9: warning: return makes pointer from integer without a 
cast [-Wint-conversion]
In file included from include/linux/rwsem.h:17:0,
 from include/linux/mm_types.h:10,
 from include/linux/sched.h:27,
 from arch/arm64/include/asm/compat.h:25,
 from arch/arm64/include/asm/stat.h:23,
 from include/linux/stat.h:5,
 from include/linux/compat.h:12,
 from include/linux/filter.h:10,
 from arch/arm64/net/bpf_jit_comp.c:22:
include/linux/err.h: At top level:
include/linux/err.h:23:35: error: conflicting types for 'ERR_PTR'
 static inline void * __must_check ERR_PTR(long error)
   ^
In file included from arch/arm64/net/bpf_jit_comp.c:21:0:
include/linux/bpf.h:235:9: note: previous implicit declaration of 'ERR_PTR' was 
here
  return ERR_PTR(-EOPNOTSUPP);
 ^
...

Fixes: 0fc174dea545 ("ebpf: make internal bpf API independent of 
CONFIG_BPF_SYSCALL ifdefs")
Suggested-by: Daniel Borkmann 
Signed-off-by: Zi Shen Lim 
Acked-by: Daniel Borkmann 
---
 include/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8ee27b8..1bcae82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct bpf_map;
 
-- 
1.9.1



[PATCH net-next v2 4/4] arm64: bpf: optimize LD_ABS, LD_IND

2016-06-06 Thread Zi Shen Lim
Remove superfluous stack frame, saving us 3 instructions for every
LD_ABS or LD_IND.

Signed-off-by: Zi Shen Lim 
---
 arch/arm64/net/bpf_jit_comp.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 7ae304e..b2fc97a 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -731,11 +731,8 @@ emit_cond_jmp:
emit_a64_mov_i64(r3, size, ctx);
emit(A64_SUB_I(1, r4, fp, STACK_SIZE), ctx);
emit_a64_mov_i64(r5, (unsigned long)bpf_load_pointer, ctx);
-   emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
-   emit(A64_MOV(1, A64_FP, A64_SP), ctx);
emit(A64_BLR(r5), ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx);
-   emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
 
jmp_offset = epilogue_offset(ctx);
check_imm19(jmp_offset);
-- 
1.9.1



[PATCH net-next v2 3/4] arm64: bpf: optimize JMP_CALL

2016-06-06 Thread Zi Shen Lim
Remove superfluous stack frame, saving us 3 instructions for
every JMP_CALL.

Signed-off-by: Zi Shen Lim 
---
 arch/arm64/net/bpf_jit_comp.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 51abc97..7ae304e 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -578,11 +578,8 @@ emit_cond_jmp:
const u64 func = (u64)__bpf_call_base + imm;
 
emit_a64_mov_i64(tmp, func, ctx);
-   emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
-   emit(A64_MOV(1, A64_FP, A64_SP), ctx);
emit(A64_BLR(tmp), ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx);
-   emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
break;
}
/* tail call */
-- 
1.9.1



[PATCH net-next v2 0/4] arm64 BPF JIT updates

2016-06-06 Thread Zi Shen Lim
Updates for arm64 eBPF JIT.
The main addition here is implementation of bpf_tail_call.

#1: Fix missing header inclusion in linux/bpf.h.
#2: Add bpf_tail_call for arm64.
#3,4: Optimizations to reduce instruction count for jitted code.

Changes since v1:
 - Added patch #1 to address build error due to missing header inclusion
   in linux/bpf.h. (Thanks to suggestion and ack by Daniel Borkmann)
   Ordered it ahead of bpf_tail_call patch #2 so build error is not
   triggered.

Zi Shen Lim (4):
  bpf: fix missing header inclusion
  arm64: bpf: implement bpf_tail_call() helper
  arm64: bpf: optimize JMP_CALL
  arm64: bpf: optimize LD_ABS, LD_IND

 arch/arm64/net/bpf_jit.h  |   3 +-
 arch/arm64/net/bpf_jit_comp.c | 111 --
 include/linux/bpf.h   |   1 +
 3 files changed, 99 insertions(+), 16 deletions(-)

-- 
1.9.1



Re: [PATCH net-next 2/3] arm64: bpf: optimize JMP_CALL

2016-06-06 Thread Z Lim
Hi Will,

On Mon, Jun 6, 2016 at 10:05 AM, Will Deacon  wrote:
> On Sat, Jun 04, 2016 at 03:00:29PM -0700, Zi Shen Lim wrote:
>> Remove superfluous stack frame, saving us 3 instructions for
>> every JMP_CALL.
>>
>> Signed-off-by: Zi Shen Lim 
>> ---
>>  arch/arm64/net/bpf_jit_comp.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index 51abc97..7ae304e 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -578,11 +578,8 @@ emit_cond_jmp:
>>   const u64 func = (u64)__bpf_call_base + imm;
>>
>>   emit_a64_mov_i64(tmp, func, ctx);
>> - emit(A64_PUSH(A64_FP, A64_LR, A64_SP), ctx);
>> - emit(A64_MOV(1, A64_FP, A64_SP), ctx);
>>   emit(A64_BLR(tmp), ctx);
>>   emit(A64_MOV(1, r0, A64_R(0)), ctx);
>> - emit(A64_POP(A64_FP, A64_LR, A64_SP), ctx);
>>   break;
>>   }
>
> Is the jitted code intended to be unwindable by standard tools?

Before this patch:
bpf_prologue => push stack frame
...
jmp_call => push stack frame, call bpf_helper*, pop stack frame
...
bpf_epilogue => pop stack frame, ret

Now:
bpf_prologue => push stack frame
...
jmp_call => call bpf_helper*
...
bpf_epilogue => pop stack frame, ret

*Note: bpf_helpers in kernel/bpf/helper.c

So yes, it's still unwindable.

>
> Will


[PATCH net-next v2 0/3] net: vrf: Add support for local traffic to local addresses

2016-06-06 Thread David Ahern
Add support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:

$ ip addr show dev red
33: red:  mtu 65536 qdisc pfifo_fast state UP group 
default qlen 1000
link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff
inet 1.1.1.1/32 scope global red
   valid_lft forever preferred_lft forever
inet6 :1::1/128 scope global
   valid_lft forever preferred_lft forever

$ ip addr show dev eth1
3: eth1:  mtu 1500 qdisc pfifo_fast master red 
state UP group default qlen 1000
link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
   valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
   valid_lft forever preferred_lft forever

$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

$ ping -c1 -I red 1.1.1.1
PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms

--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms

$ ping6 -c1 -I red  2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms

--- 2100:1::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms

$ ping6 -c1 -I red ::1
PING ::1(::1) from :1::1 red: 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.187 ms

--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms

This change also enables use of loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8

$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms

David Ahern (3):
  net: vrf: Minor refactoring for local address patches
  net: vrf: ipv4 support for local traffic to local addresses
  net: vrf: ipv6 support for local traffic to local addresses

 drivers/net/vrf.c| 234 +++
 net/ipv6/ip6_input.c |   1 +
 2 files changed, 202 insertions(+), 33 deletions(-)

-- 
2.1.4



[PATCH net-next 2/3] net: vrf: ipv4 support for local traffic to local addresses

2016-06-06 Thread David Ahern
Add support for locally originated traffic to VRF-local addresses. If
destination device for an skb is the loopback or VRF device then set
its dst to a local version of the VRF cached dst_entry and call netif_rx
to insert the packet onto the rx queue - similar to what is done for
loopback. This patch handles IPv4 support; follow on patch handles IPv6.

With this patch, ping, tcp and udp packets to a local IPv4 address are
successfully routed:

$ ip addr show dev eth1
4: eth1:  mtu 1500 qdisc pfifo_fast master 
red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
   valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
   valid_lft forever preferred_lft forever

$ ping -c1 -I red 10.100.1.1
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data.
64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms

This patch also enables use of IPv4 loopback address on the VRF device:
$ ip addr add dev red 127.0.0.1/8

$ ping -c1 -I red 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 100 --
 1 file changed, 98 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index ec0cb658d9ea..203d3c78a6e5 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -44,6 +44,7 @@
 
 struct net_vrf {
struct rtable __rcu *rth;
+   struct rtable __rcu *rth_local;
struct rt6_info __rcu   *rt6;
u32 tb_id;
 };
@@ -54,9 +55,20 @@ struct pcpu_dstats {
u64 tx_drps;
u64 rx_pkts;
u64 rx_bytes;
+   u64 rx_drps;
struct u64_stats_sync   syncp;
 };
 
+static void vrf_rx_stats(struct net_device *dev, int len)
+{
+   struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
+
+   u64_stats_update_begin(>syncp);
+   dstats->rx_pkts++;
+   dstats->rx_bytes += len;
+   u64_stats_update_end(>syncp);
+}
+
 static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
 {
vrf_dev->stats.tx_errors++;
@@ -91,6 +103,34 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct 
net_device *dev,
return stats;
 }
 
+/* Local traffic destined to local address. Reinsert the packet to rx
+ * path, similar to loopback handling.
+ */
+static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev,
+ struct dst_entry *dst)
+{
+   int len = skb->len;
+
+   skb_orphan(skb);
+
+   skb_dst_set(skb, dst);
+   skb_dst_force(skb);
+
+   /* set pkt_type to avoid skb hitting packet taps twice -
+* once on Tx and again in Rx processing
+*/
+   skb->pkt_type = PACKET_LOOPBACK;
+
+   skb->protocol = eth_type_trans(skb, dev);
+
+   if (likely(netif_rx(skb) == NET_RX_SUCCESS))
+   vrf_rx_stats(dev, len);
+   else
+   this_cpu_inc(dev->dstats->rx_drps);
+
+   return NETDEV_TX_OK;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
   struct net_device *dev)
@@ -169,6 +209,34 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff 
*skb,
}
 
skb_dst_drop(skb);
+
+   /* if dst.dev is loopback or the VRF device again this is locally
+* originated traffic destined to a local address. Short circuit
+* to Rx path using our local dst
+*/
+   if (rt->dst.dev == net->loopback_dev || rt->dst.dev == vrf_dev) {
+   struct net_vrf *vrf = netdev_priv(vrf_dev);
+   struct rtable *rth_local;
+   struct dst_entry *dst = NULL;
+
+   ip_rt_put(rt);
+
+   rcu_read_lock();
+
+   rth_local = rcu_dereference(vrf->rth_local);
+   if (likely(rth_local)) {
+   dst = _local->dst;
+   dst_hold(dst);
+   }
+
+   rcu_read_unlock();
+
+   if (unlikely(!dst))
+   goto err;
+
+   return vrf_local_xmit(skb, vrf_dev, dst);
+   }
+
skb_dst_set(skb, >dst);
 
/* strip the ethernet header added for pass through VRF device */
@@ -375,29 +443,48 @@ static int vrf_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 static void vrf_rtable_release(struct net_vrf *vrf)
 {
struct rtable 

[PATCH net-next 1/3] net: vrf: Minor refactoring for local address patches

2016-06-06 Thread David Ahern
Move the stripping of the ethernet header from is_ip_tx_frame into the
ipv4 and ipv6 outbound functions and collapse vrf_send_v4_prep into
vrf_process_v4_outbound.

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 45 ++---
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index d356f5d0f7b0..ec0cb658d9ea 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -119,6 +119,9 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff 
*skb,
skb_dst_drop(skb);
skb_dst_set(skb, dst);
 
+   /* strip the ethernet header added for pass through VRF device */
+   __skb_pull(skb, skb_network_offset(skb));
+
ret = ip6_local_out(net, skb->sk, skb);
if (unlikely(net_xmit_eval(ret)))
dev->stats.tx_errors++;
@@ -139,29 +142,6 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff 
*skb,
 }
 #endif
 
-static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4,
-   struct net_device *vrf_dev)
-{
-   struct rtable *rt;
-   int err = 1;
-
-   rt = ip_route_output_flow(dev_net(vrf_dev), fl4, NULL);
-   if (IS_ERR(rt))
-   goto out;
-
-   /* TO-DO: what about broadcast ? */
-   if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
-   ip_rt_put(rt);
-   goto out;
-   }
-
-   skb_dst_drop(skb);
-   skb_dst_set(skb, >dst);
-   err = 0;
-out:
-   return err;
-}
-
 static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
   struct net_device *vrf_dev)
 {
@@ -176,10 +156,24 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff 
*skb,
FLOWI_FLAG_SKIP_NH_OIF,
.daddr = ip4h->daddr,
};
+   struct net *net = dev_net(vrf_dev);
+   struct rtable *rt;
 
-   if (vrf_send_v4_prep(skb, , vrf_dev))
+   rt = ip_route_output_flow(net, , NULL);
+   if (IS_ERR(rt))
goto err;
 
+   if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+   ip_rt_put(rt);
+   goto err;
+   }
+
+   skb_dst_drop(skb);
+   skb_dst_set(skb, >dst);
+
+   /* strip the ethernet header added for pass through VRF device */
+   __skb_pull(skb, skb_network_offset(skb));
+
if (!ip4h->saddr) {
ip4h->saddr = inet_select_addr(skb_dst(skb)->dev, 0,
   RT_SCOPE_LINK);
@@ -200,9 +194,6 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff 
*skb,
 
 static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev)
 {
-   /* strip the ethernet header added for pass through VRF device */
-   __skb_pull(skb, skb_network_offset(skb));
-
switch (skb->protocol) {
case htons(ETH_P_IP):
return vrf_process_v4_outbound(skb, dev);
-- 
2.1.4



[PATCH net-next v2 3/3] net: vrf: ipv6 support for local traffic to local addresses

2016-06-06 Thread David Ahern
Add support for locally originated traffic to VRF-local IPv6 addresses.
Similar to IPv4 a local dst is set on the skb and the packet is
reinserted with a call to netif_rx. With this patch, ping, tcp and udp
packets to a local IPv6 address are successfully routed:

$ ip addr show dev eth1
4: eth1:  mtu 1500 qdisc pfifo_fast master 
red state UP group default qlen 1000
link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
   valid_lft forever preferred_lft forever
inet6 2100:1::1/120 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
   valid_lft forever preferred_lft forever

$ ping6 -c1 -I red 2100:1::1
ping6: Warning: source address might be selected on device other than red.
PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms

ip6_input is exported so the VRF driver can use it for the dst input
function. The dst_alloc function for IPv4 defaults to setting the input and
output functions; IPv6's does not. VRF does not need to duplicate the Rx path
so just export the ipv6 input function.

Signed-off-by: David Ahern 
---
v2
- add export of ip6_input

 drivers/net/vrf.c| 89 +---
 net/ipv6/ip6_input.c |  1 +
 2 files changed, 86 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 203d3c78a6e5..1b214ea4619a 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -46,6 +46,7 @@ struct net_vrf {
struct rtable __rcu *rth;
struct rtable __rcu *rth_local;
struct rt6_info __rcu   *rt6;
+   struct rt6_info __rcu   *rt6_local;
u32 tb_id;
 };
 
@@ -157,6 +158,46 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff 
*skb,
goto err;
 
skb_dst_drop(skb);
+
+   /* if dst.dev is loopback or the VRF device again this is locally
+* originated traffic destined to a local address. Short circuit
+* to Rx path using our local dst
+*/
+   if (dst->dev == net->loopback_dev || dst->dev == dev) {
+   struct net_vrf *vrf = netdev_priv(dev);
+   struct rt6_info *rt6_local;
+
+   /* release looked up dst and use cached local dst */
+   dst_release(dst);
+
+   rcu_read_lock();
+
+   rt6_local = rcu_dereference(vrf->rt6_local);
+   if (unlikely(!rt6_local)) {
+   rcu_read_unlock();
+   goto err;
+   }
+
+   /* Ordering issue: cached local dst is created on newlink
+* before the IPv6 initialization. Using the local dst
+* requires rt6i_idev to be set so make sure it is.
+*/
+   if (unlikely(!rt6_local->rt6i_idev)) {
+   rt6_local->rt6i_idev = in6_dev_get(dev);
+   if (!rt6_local->rt6i_idev) {
+   rcu_read_unlock();
+   goto err;
+   }
+   }
+
+   dst = _local->dst;
+   dst_hold(dst);
+
+   rcu_read_unlock();
+
+   return vrf_local_xmit(skb, dev, _local->dst);
+   }
+
skb_dst_set(skb, dst);
 
/* strip the ethernet header added for pass through VRF device */
@@ -336,27 +377,38 @@ static int vrf_output6(struct net *net, struct sock *sk, 
struct sk_buff *skb)
 static void vrf_rt6_release(struct net_vrf *vrf)
 {
struct rt6_info *rt6 = rtnl_dereference(vrf->rt6);
+   struct rt6_info *rt6_local = rtnl_dereference(vrf->rt6_local);
 
-   rcu_assign_pointer(vrf->rt6, NULL);
+   RCU_INIT_POINTER(vrf->rt6, NULL);
+   RCU_INIT_POINTER(vrf->rt6_local, NULL);
+   synchronize_rcu();
 
if (rt6)
dst_release(>dst);
+
+   if (rt6_local) {
+   if (rt6_local->rt6i_idev)
+   in6_dev_put(rt6_local->rt6i_idev);
+
+   dst_release(_local->dst);
+   }
 }
 
 static int vrf_rt6_create(struct net_device *dev)
 {
+   int flags = DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE;
struct net_vrf *vrf = netdev_priv(dev);
struct net *net = dev_net(dev);
struct fib6_table *rt6i_table;
-   struct rt6_info *rt6;
+   struct rt6_info *rt6, *rt6_local;
int rc = -ENOMEM;
 
rt6i_table = fib6_new_table(net, vrf->tb_id);
if (!rt6i_table)
goto out;
 
-   rt6 = ip6_dst_alloc(net, dev,
-   DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE);
+   /* create a dst for routing packets out a VRF device */
+   rt6 = ip6_dst_alloc(net, dev, flags);
if (!rt6)

[PATCH net-next v4 2/2] net: vrf: Add l3mdev rules on first device create

2016-06-06 Thread David Ahern
Add l3mdev rule per address family when the first VRF device is
created. The rules are installed with a default preference of 1000.
Users can replace the default rule as desired.

Signed-off-by: David Ahern 
---
v4
- removed module parameter for pref
- changed num_vrfs to add_fib_rules and only add rules once on first
  device create. This allows the user to delete the default rule if
  there is a preference for a different priority

v3
- per Nik's comment changed num_vrfs from atomic to unsigned int; all
  accesses are with rtnl held

v2
- added EXCL flag and EEXISTS check. Appropriate once the exclude fib rule
  patch is accepted
- changed 3rd arg to vrf_fib_rule from 0/1 to false/true

 drivers/net/vrf.c | 106 +-
 1 file changed, 105 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index d356f5d0f7b0..4bfac107710f 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
@@ -42,6 +43,9 @@
 #define DRV_NAME   "vrf"
 #define DRV_VERSION"1.0"
 
+static bool add_fib_rules = true;
+static u32 rule_pref = 1000; /* default preference for FIB rules */
+
 struct net_vrf {
struct rtable __rcu *rth;
struct rt6_info __rcu   *rt6;
@@ -729,6 +733,91 @@ static const struct ethtool_ops vrf_ethtool_ops = {
.get_drvinfo= vrf_get_drvinfo,
 };
 
+static inline size_t vrf_fib_rule_nl_size(void)
+{
+   size_t sz;
+
+   sz  = NLMSG_ALIGN(sizeof(struct fib_rule_hdr));
+   sz += nla_total_size(sizeof(u8));   /* FRA_L3MDEV */
+   sz += nla_total_size(sizeof(u32));  /* FRA_PRIORITY */
+
+   return sz;
+}
+
+static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool add_it)
+{
+   struct fib_rule_hdr *frh;
+   struct nlmsghdr *nlh;
+   struct sk_buff *skb;
+   int err;
+
+   skb = nlmsg_new(vrf_fib_rule_nl_size(), GFP_KERNEL);
+   if (!skb)
+   return -ENOMEM;
+
+   nlh = nlmsg_put(skb, 0, 0, 0, sizeof(*frh), 0);
+   if (!nlh)
+   goto nla_put_failure;
+
+   /* rule only needs to appear once */
+   nlh->nlmsg_flags &= NLM_F_EXCL;
+
+   frh = nlmsg_data(nlh);
+   memset(frh, 0, sizeof(*frh));
+   frh->family = family;
+   frh->action = FR_ACT_TO_TBL;
+
+   if (nla_put_u32(skb, FRA_L3MDEV, 1))
+   goto nla_put_failure;
+
+   if (nla_put_u32(skb, FRA_PRIORITY, rule_pref))
+   goto nla_put_failure;
+
+   nlmsg_end(skb, nlh);
+
+   /* fib_nl_{new,del}rule handling looks for net from skb->sk */
+   skb->sk = dev_net(dev)->rtnl;
+   if (add_it) {
+   err = fib_nl_newrule(skb, nlh);
+   if (err == -EEXIST)
+   err = 0;
+   } else {
+   err = fib_nl_delrule(skb, nlh);
+   if (err == -ENOENT)
+   err = 0;
+   }
+   nlmsg_free(skb);
+
+   return err;
+
+nla_put_failure:
+   nlmsg_free(skb);
+
+   return -EMSGSIZE;
+}
+
+static int vrf_add_fib_rules(const struct net_device *dev)
+{
+   int err;
+
+   err = vrf_fib_rule(dev, AF_INET,  true);
+   if (err < 0)
+   goto out_err;
+
+   err = vrf_fib_rule(dev, AF_INET6, true);
+   if (err < 0)
+   goto ipv6_err;
+
+   return 0;
+
+ipv6_err:
+   vrf_fib_rule(dev, AF_INET,  false);
+
+out_err:
+   netdev_err(dev, "Failed to add FIB rules.\n");
+   return err;
+}
+
 static void vrf_setup(struct net_device *dev)
 {
ether_setup(dev);
@@ -769,6 +858,7 @@ static int vrf_newlink(struct net *src_net, struct 
net_device *dev,
   struct nlattr *tb[], struct nlattr *data[])
 {
struct net_vrf *vrf = netdev_priv(dev);
+   int err;
 
if (!data || !data[IFLA_VRF_TABLE])
return -EINVAL;
@@ -777,7 +867,21 @@ static int vrf_newlink(struct net *src_net, struct 
net_device *dev,
 
dev->priv_flags |= IFF_L3MDEV_MASTER;
 
-   return register_netdevice(dev);
+   err = register_netdevice(dev);
+   if (err)
+   goto out;
+
+   if (add_fib_rules) {
+   err = vrf_add_fib_rules(dev);
+   if (err) {
+   unregister_netdevice(dev);
+   goto out;
+   }
+   add_fib_rules = false;
+   }
+
+out:
+   return err;
 }
 
 static size_t vrf_nl_getsize(const struct net_device *dev)
-- 
2.1.4



[PATCH net-next v4 0/2] net: vrf: Improve use of FIB rules

2016-06-06 Thread David Ahern
Currently, VRFs require 1 oif and 1 iif rule per address family per
VRF. As the number of VRF devices increases it brings scalability
issues with the increasing rule list. All of the VRF rules have the
same format with the exception of the specific table id to direct the
lookup. Since the table id is available from the oif or iif in the
loopup, the VRF rules can be consolidated to a single rule that pulls
the table from the VRF device.

This solution still allows a user to insert their own rules for VRFs,
including rules with additional attributes. Accordingly, it is backwards
compatible with existing setups and allows other policy routing as
desired.

David Ahern (2):
  net: Add l3mdev rule
  net: vrf: Add l3mdev rules on first device create

 drivers/net/vrf.c  | 106 -
 include/net/fib_rules.h|  24 +-
 include/net/l3mdev.h   |  12 +
 include/uapi/linux/fib_rules.h |   1 +
 net/core/fib_rules.c   |  33 +++--
 net/ipv4/fib_rules.c   |   6 ++-
 net/ipv6/fib6_rules.c  |   6 ++-
 net/l3mdev/l3mdev.c|  38 +++
 8 files changed, 214 insertions(+), 12 deletions(-)

-- 
2.1.4


[PATCH net-next v4 1/2] net: Add l3mdev rule

2016-06-06 Thread David Ahern
Currently, VRFs require 1 oif and 1 iif rule per address family per
VRF. As the number of VRF devices increases it brings scalability
issues with the increasing rule list. All of the VRF rules have the
same format with the exception of the specific table id to direct the
lookup. Since the table id is available from the oif or iif in the
loopup, the VRF rules can be consolidated to a single rule that pulls
the table from the VRF device.

This patch introduces a new rule attribute l3mdev. The l3mdev rule
means the table id used for the lookup is pulled from the L3 master
device (e.g., VRF) rather than being statically defined. With the
l3mdev rule all of the basic VRF FIB rules are reduced to 1 l3mdev
rule per address family (IPv4 and IPv6).

If an admin wishes to insert higher priority rules for specific VRFs
those rules will co-exist with the l3mdev rule. This capability means
current VRF scripts will co-exist with this new simpler implementation.

Currently, the rules list for both ipv4 and ipv6 look like this:
$ ip  ru ls
1000:   from all oif vrf1 lookup 1001
1000:   from all iif vrf1 lookup 1001
1000:   from all oif vrf2 lookup 1002
1000:   from all iif vrf2 lookup 1002
1000:   from all oif vrf3 lookup 1003
1000:   from all iif vrf3 lookup 1003
1000:   from all oif vrf4 lookup 1004
1000:   from all iif vrf4 lookup 1004
1000:   from all oif vrf5 lookup 1005
1000:   from all iif vrf5 lookup 1005
1000:   from all oif vrf6 lookup 1006
1000:   from all iif vrf6 lookup 1006
1000:   from all oif vrf7 lookup 1007
1000:   from all iif vrf7 lookup 1007
1000:   from all oif vrf8 lookup 1008
1000:   from all iif vrf8 lookup 1008
...
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

With the l3mdev rule the list is just the following regardless of the
number of VRFs:
$ ip ru ls
1000:   from all lookup [l3mdev table]
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

(Note: the above pretty print of the rule is based on an iproute2
   prototype. Actual verbage may change)

Signed-off-by: David Ahern 
---
v4
- no change to this patch

v3
- no change to this patch

v2
- if CONFIG_NET_L3_MASTER_DEV is not enabled changed the inline
  l3mdev_fib_rule_match function to return 1 rather than 0 allowing
  the compiler to completely drop the check:
 if (rule->l3mdev && !l3mdev_fib_rule_match())

- moved setting of tb_id down to its use fib4_rule_action which
  addresses Dave's comment about reverse xmas tree order. Same
  change for ipv6 version.

 include/net/fib_rules.h| 24 ++--
 include/net/l3mdev.h   | 12 
 include/uapi/linux/fib_rules.h |  1 +
 net/core/fib_rules.c   | 33 -
 net/ipv4/fib_rules.c   |  6 --
 net/ipv6/fib6_rules.c  |  6 --
 net/l3mdev/l3mdev.c| 38 ++
 7 files changed, 109 insertions(+), 11 deletions(-)

diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 59160de702b6..456e4a6006ab 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -17,7 +17,8 @@ struct fib_rule {
u32 flags;
u32 table;
u8  action;
-   /* 3 bytes hole, try to use */
+   u8  l3mdev;
+   /* 2 bytes hole, try to use */
u32 target;
__be64  tun_id;
struct fib_rule __rcu   *ctarget;
@@ -36,6 +37,7 @@ struct fib_lookup_arg {
void*lookup_ptr;
void*result;
struct fib_rule *rule;
+   u32 table;
int flags;
 #define FIB_LOOKUP_NOREF   1
 #define FIB_LOOKUP_IGNORE_LINKSTATE2
@@ -89,7 +91,8 @@ struct fib_rules_ops {
[FRA_TABLE] = { .type = NLA_U32 }, \
[FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \
[FRA_SUPPRESS_IFGROUP] = { .type = NLA_U32 }, \
-   [FRA_GOTO]  = { .type = NLA_U32 }
+   [FRA_GOTO]  = { .type = NLA_U32 }, \
+   [FRA_L3MDEV]= { .type = NLA_U8 }
 
 static inline void fib_rule_get(struct fib_rule *rule)
 {
@@ -102,6 +105,20 @@ static inline void fib_rule_put(struct fib_rule *rule)
kfree_rcu(rule, rcu);
 }
 
+#ifdef CONFIG_NET_L3_MASTER_DEV
+static inline u32 fib_rule_get_table(struct fib_rule *rule,
+struct fib_lookup_arg *arg)
+{
+   return rule->l3mdev ? arg->table : rule->table;
+}
+#else
+static inline u32 fib_rule_get_table(struct fib_rule *rule,
+struct fib_lookup_arg *arg)
+{
+  

Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device

2016-06-06 Thread Simon Horman
On Thu, Jun 02, 2016 at 03:01:47PM -0700, pravin shelar wrote:
> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>  wrote:
> > * Set skb protocol based on contents of packet. I have observed this is
> >   necessary to get actual protocol of a packet when it is injected into an
> >   internal device e.g. by libnet in which case skb protocol will be set to
> >   ETH_ALL.
> >
> > * Set the mac_len which has been observed to not be set up correctly when
> >   an ARP packet is generated and sent via an openvswitch bridge.
> >   My test case is a scenario where there are two open vswtich bridges.
> >   One outputs to a tunnel port which egresses on the other.
> >
> > The motivation for this is that support for outputting to layer 3 (non-tap)
> > GRE tunnels as implemented by a subsequent patch depends on protocol and
> > mac_len being set correctly on receive.
> >
> > Signed-off-by: Simon Horman 
> >
> > ---
> > v10
> > * Set mac_len
> >
> > v9
> > * New patch
> > ---
> >  net/openvswitch/vport-internal_dev.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/net/openvswitch/vport-internal_dev.c 
> > b/net/openvswitch/vport-internal_dev.c
> > index 2ee48e447b72..f89b1efa88f1 100644
> > --- a/net/openvswitch/vport-internal_dev.c
> > +++ b/net/openvswitch/vport-internal_dev.c
> > @@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct 
> > net_device *netdev)
> >  {
> > int len, err;
> >
> > +   skb->protocol = eth_type_trans(skb, netdev);
> > +   skb_push(skb, ETH_HLEN);
> > +   skb_reset_mac_len(skb);
> > +
> resetting mac-len breaks the assumption about mac_len for referencing
> MPLS header ref: skb_mpls_header().

Thanks I had overlooked this. I think it is actually safe as
the mac_len is recalculated quite soon in key_extract() and IIRC
the most important thing is for mac_len to be 0 or non-zero
for the benefit of ovs_flow_key_extract(). None the less it does
seem untidy and moreover inconsistent with the handling in
netdev_port_receive() by a latter patch which does the following:

eth_type = eth_type_trans(skb, skb->dev);
skb->mac_len = skb->data - skb_mac_header(skb);
__skb_push(skb, skb->mac_len);

if (eth_type == htons(ETH_P_8021Q))
skb->mac_len += VLAN_HLEN;

Perhaps that logic ought to be in a helper used by both internal_dev_xmit()
and netdev_port_receive(). Or somehow centralised in ovs_vport_receive().


[PATCH v3 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-06 Thread thloh
From: Tien Hock Loh 

This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
the dwmac is set to sgmii

Signed-off-by: Tien Hock Loh 

---
v2:
- Refactored the TSE PCS out from the dwmac-socfpga.c file
- Added binding documentation for TSE PCS sgmii adapter
v3:
- Added missing license header for new source files
- Updated tse_pcs.h include headers
- Standardize if statements
---
 .../devicetree/bindings/net/socfpga-dwmac.txt  |   4 +
 drivers/net/ethernet/stmicro/stmmac/Makefile   |   2 +-
 .../net/ethernet/stmicro/stmmac/dwmac-socfpga.c| 140 +--
 drivers/net/ethernet/stmicro/stmmac/tse_pcs.c  | 261 +
 drivers/net/ethernet/stmicro/stmmac/tse_pcs.h  |  36 +++
 5 files changed, 419 insertions(+), 24 deletions(-)
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/tse_pcs.c
 create mode 100644 drivers/net/ethernet/stmicro/stmmac/tse_pcs.h

diff --git a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt 
b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
index 3a9d679..2bc39f1 100644
--- a/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/socfpga-dwmac.txt
@@ -15,6 +15,8 @@ Required properties:
 Optional properties:
 altr,emac-splitter: Should be the phandle to the emac splitter soft IP node if
DWMAC controller is connected emac splitter.
+phy-mode: The phy mode the ethernet operates in
+altr,sgmii_to_sgmii_converter: phandle to the TSE SGMII converter
 
 Example:
 
@@ -28,4 +30,6 @@ gmac0: ethernet@ff70 {
mac-address = [00 00 00 00 00 00];/* Filled in by U-Boot */
clocks = <_0_clk>;
clock-names = "stmmaceth";
+   phy-mode = "sgmii";
+   altr,gmii_to_sgmii_converter = <_1_gmii_to_sgmii_converter_0>;
 };
diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
b/drivers/net/ethernet/stmicro/stmmac/Makefile
index 73c2715..29c1dee 100644
--- a/drivers/net/ethernet/stmicro/stmmac/Makefile
+++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
@@ -6,7 +6,7 @@ stmmac-objs:= stmmac_main.o stmmac_ethtool.o stmmac_mdio.o 
ring_mode.o  \
 
 obj-$(CONFIG_STMMAC_PLATFORM) += stmmac-platform.o
 stmmac-platform-objs:= stmmac_platform.o dwmac-meson.o dwmac-sunxi.o   \
-  dwmac-sti.o dwmac-socfpga.o dwmac-rk.o
+  dwmac-sti.o dwmac-socfpga.o dwmac-rk.o tse_pcs.o
 
 obj-$(CONFIG_STMMAC_PCI) += stmmac-pci.o
 stmmac-pci-objs:= stmmac_pci.o
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
index 3f9588e..88fba4e 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -27,6 +27,8 @@
 #include "stmmac.h"
 #include "stmmac_platform.h"
 
+#include "tse_pcs.h"
+
 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_GMII_MII 0x0
 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RGMII 0x1
 #define SYSMGR_EMACGRP_CTRL_PHYSEL_ENUM_RMII 0x2
@@ -47,48 +49,60 @@ struct socfpga_dwmac {
struct regmap *sys_mgr_base_addr;
struct reset_control *stmmac_rst;
void __iomem *splitter_base;
+   struct tse_pcs pcs;
 };
 
 static void socfpga_dwmac_fix_mac_speed(void *priv, unsigned int speed)
 {
struct socfpga_dwmac *dwmac = (struct socfpga_dwmac *)priv;
void __iomem *splitter_base = dwmac->splitter_base;
+   void __iomem *tse_pcs_base = dwmac->pcs.tse_pcs_base;
+   void __iomem *sgmii_adapter_base = dwmac->pcs.sgmii_adapter_base;
+   struct device *dev = dwmac->dev;
+   struct net_device *ndev = dev_get_drvdata(dev);
+   struct phy_device *phy_dev = ndev->phydev;
u32 val;
 
-   if (!splitter_base)
-   return;
-
-   val = readl(splitter_base + EMAC_SPLITTER_CTRL_REG);
-   val &= ~EMAC_SPLITTER_CTRL_SPEED_MASK;
-
-   switch (speed) {
-   case 1000:
-   val |= EMAC_SPLITTER_CTRL_SPEED_1000;
-   break;
-   case 100:
-   val |= EMAC_SPLITTER_CTRL_SPEED_100;
-   break;
-   case 10:
-   val |= EMAC_SPLITTER_CTRL_SPEED_10;
-   break;
-   default:
-   return;
+   if (splitter_base) {
+   val = readl(splitter_base + EMAC_SPLITTER_CTRL_REG);
+   val &= ~EMAC_SPLITTER_CTRL_SPEED_MASK;
+
+   switch (speed) {
+   case 1000:
+   val |= EMAC_SPLITTER_CTRL_SPEED_1000;
+   break;
+   case 100:
+   val |= EMAC_SPLITTER_CTRL_SPEED_100;
+   break;
+   case 10:
+   val |= EMAC_SPLITTER_CTRL_SPEED_10;
+   break;
+   default:
+   return;
+   }
+   writel(val, splitter_base + EMAC_SPLITTER_CTRL_REG);
}
 
-   writel(val, splitter_base + 

Re: [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets

2016-06-06 Thread Simon Horman
On Thu, Jun 02, 2016 at 03:02:00PM -0700, pravin shelar wrote:
> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>  wrote:
> > Allow push and pop mpls actions to act on layer 3 packets by teaching
> > them not to access non-existent L2 headers of such packets.
> >
> > Signed-off-by: Simon Horman 
> > ---
> > v10
> > * Limit scope of hdr in {push,pop}_mpls()
> >
> > v9
> > * New Patch
> > ---
> >  net/openvswitch/actions.c | 19 ---
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 9a3eb7a0ebf4..15f130e4c22b 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -172,7 +172,8 @@ static int push_mpls(struct sk_buff *skb, struct 
> > sw_flow_key *key,
> >
> > skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN);
> >
> > -   update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> > +   if (skb->mac_len)
> > +   update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype);
> We can move all ethernet related code in this if block. for example memmove().

My assumption is that the memmove() does nothing if skb->mac_len is zero
and from my point of view it seems clean to leave it where it is unless
the code around it also moves.

Is there other code you think could/should be moved into the
if (skb->mac_len) block?

> 
> > if (!skb->inner_protocol)
> > skb_set_inner_protocol(skb, skb->protocol);
> > skb->protocol = mpls->mpls_ethertype;
> > @@ -184,7 +185,6 @@ static int push_mpls(struct sk_buff *skb, struct 
> > sw_flow_key *key,
> >  static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key,
> > const __be16 ethertype)
> >  {
> > -   struct ethhdr *hdr;
> > int err;
> >
> > err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN);
> > @@ -199,11 +199,16 @@ static int pop_mpls(struct sk_buff *skb, struct 
> > sw_flow_key *key,
> > __skb_pull(skb, MPLS_HLEN);
> > skb_reset_mac_header(skb);
> >
> > -   /* skb_mpls_header() is used to locate the ethertype
> > -* field correctly in the presence of VLAN tags.
> > -*/
> > -   hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
> > -   update_ethertype(skb, hdr, ethertype);
> > +   if (skb->mac_len) {
> > +   struct ethhdr *hdr;
> > +
> > +   /* skb_mpls_header() is used to locate the ethertype
> > +* field correctly in the presence of VLAN tags.
> > +*/
> > +   hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN);
> > +   update_ethertype(skb, hdr, ethertype);
> > +   }
> same here.


Re: [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support

2016-06-06 Thread Simon Horman
On Thu, Jun 02, 2016 at 03:02:18PM -0700, pravin shelar wrote:
> On Wed, Jun 1, 2016 at 11:24 PM, Simon Horman
>  wrote:

[...]

> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index 15f130e4c22b..5567529904fa 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -300,6 +300,51 @@ static int set_eth_addr(struct sk_buff *skb, struct 
> > sw_flow_key *flow_key,
> > return 0;
> >  }
> >
> > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key)
> > +{
> > +   /* Pop outermost VLAN tag to skb metadata unless a VLAN tag
> > +* is already present there.
> > +*/
> > +   if ((skb->protocol == htons(ETH_P_8021Q) ||
> > +skb->protocol == htons(ETH_P_8021AD)) &&
> > +   !skb_vlan_tag_present(skb)) {
> > +   int err = skb_vlan_accel(skb);
> > +   if (unlikely(err))
> > +   return err;
> > +   }
> > +
>
> I do not think we can keep just the vlan tag and pop ethernet header.
> There are multiple issues with this.
> First networking stack can not handle suck packet. second issue even
> after this patch OVS can not parse this type of packet. third this
> patch does not allow pop-eth action on vlan tagged packet.
> There is already separate vlan related actions in OVS so lets keep it simple.

I wonder if the best solution is to simply omit handling VLAN tags
in pop_eth for now. As you mention pop_eth is not permitted on such packets.

[...]

> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > index 0ea128eeeab2..2d9777abcfc9 100644
> > --- a/net/openvswitch/flow.c
> > +++ b/net/openvswitch/flow.c
> > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct 
> > sw_flow_key *key)
> >
> > skb_reset_mac_header(skb);
> >
> > -   /* Link layer.  We are guaranteed to have at least the 14 byte 
> > Ethernet
> > -* header in the linear data area.
> > -*/
> > -   eth = eth_hdr(skb);
> > -   ether_addr_copy(key->eth.src, eth->h_source);
> > -   ether_addr_copy(key->eth.dst, eth->h_dest);
> > +   /* Link layer. */
> > +   key->eth.tci = 0;
> > +   if (key->phy.is_layer3) {
> > +   if (skb_vlan_tag_present(skb))
> > +   key->eth.tci = htons(skb->vlan_tci);
> > +   } else {
> > +   eth = eth_hdr(skb);
> eth can be moved to this block.

Thanks, done.

[...]

> > @@ -723,9 +730,19 @@ int ovs_flow_key_extract(const struct ip_tunnel_info 
> > *tun_info,
> > key->phy.skb_mark = skb->mark;
> > ovs_ct_fill_key(skb, key);
> > key->ovs_flow_hash = 0;
> > +   key->phy.is_layer3 = skb->mac_len == 0;
> > key->recirc_id = 0;
> >
> > -   return key_extract(skb, key);
> > +   err = key_extract(skb, key);
> > +   if (err < 0)
> > +   return err;
> > +
> > +   if (key->phy.is_layer3)
> > +   key->eth.type = skb->protocol;
> Now key->eth.type is set in three different function, can you
> centralize in key_extract()?

Sure, I think that the instance above can be trivially moved into
key_extract() and the one in ovs_flow_key_update() can be removed.

[...]

> > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> > index 0bb650f4f219..1e1392c3c0ed 100644
> > --- a/net/openvswitch/flow_netlink.c
> > +++ b/net/openvswitch/flow_netlink.c

[...]

> > @@ -355,6 +359,7 @@ static const struct ovs_len_tbl 
> > ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
> > [OVS_KEY_ATTR_CT_ZONE]   = { .len = sizeof(u16) },
> > [OVS_KEY_ATTR_CT_MARK]   = { .len = sizeof(u32) },
> > [OVS_KEY_ATTR_CT_LABELS] = { .len = sizeof(struct 
> > ovs_key_ct_labels) },
> > +   [OVS_KEY_ATTR_PACKET_ETHERTYPE] = { .len = sizeof(__be16) },
> >  };
> >
> I do not see need for OVS_KEY_ATTR_PACKET_ETHERTYPE, we can use
> existing OVS_KEY_ATTR_ETHERTYPE to serialize the flow key. If there is
> no OVS_KEY_ATTR_ETHERNET attribute then its l3 packet.

The idea of OVS_KEY_ATTR_PACKET_ETHERTYPE is to allow communication of
the L2 type of the packet which is not present in an L3 packet. In terms
of GRE (non-TEB) this correlates to the Protocol Type field in the GRE
header.


Re: [PATCH net-next 1/3] arm64: bpf: implement bpf_tail_call() helper

2016-06-06 Thread Zi Shen Lim
On Mon, Jun 6, 2016 at 1:11 AM, Daniel Borkmann  wrote:
> On 06/06/2016 06:56 AM, Z Lim wrote:
> [...]
>>
>> How about the attached patch? Fixes compilation error on build
>> !CONFIG_BPF_SYSCALL.
>>
>> Also, should this patch be sent to net or net-next (along with this
>> series)?
>
>
> Looks good, feel free to add:
>
> Acked-by: Daniel Borkmann 

Thanks Daniel!

>
> I think net-next along with your series should be fine since the issue
> first appeared there. Thanks, Zi!

Sounds good. I'll include this as patch 1/4 (so it doesn't trip
kbuildbot) when I send out v2.


RE: [PATCH][V2] net: fec: fix spelling mistakes and add missing newline

2016-06-06 Thread Fugang Duan
From: Colin King  Sent: Monday, June 06, 2016 4:22 PM
> To: Fugang Duan ; netdev@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Subject: [PATCH][V2] net: fec: fix spelling mistakes and add missing newline
> 
> From: Colin Ian King 
> 
> trivial fix to spelling mistakes and add missing newline in pr_err messages
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 3c0255e..fea0f33 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -2416,24 +2416,24 @@ fec_enet_set_coalesce(struct net_device *ndev,
> struct ethtool_coalesce *ec)
>   return -EOPNOTSUPP;
> 
>   if (ec->rx_max_coalesced_frames > 255) {
> - pr_err("Rx coalesced frames exceed hardware limiation");
> + pr_err("Rx coalesced frames exceed hardware limitation\n");
>   return -EINVAL;
>   }
> 
>   if (ec->tx_max_coalesced_frames > 255) {
> - pr_err("Tx coalesced frame exceed hardware limiation");
> + pr_err("Tx coalesced frame exceed hardware limitation\n");
>   return -EINVAL;
>   }
> 
>   cycle = fec_enet_us_to_itr_clock(ndev, fep->rx_time_itr);
>   if (cycle > 0x) {
> - pr_err("Rx coalesed usec exceeed hardware limiation");
> + pr_err("Rx coalesced usec exceed hardware limitation\n");
>   return -EINVAL;
>   }
> 
>   cycle = fec_enet_us_to_itr_clock(ndev, fep->tx_time_itr);
>   if (cycle > 0x) {
> - pr_err("Rx coalesed usec exceeed hardware limiation");
> + pr_err("Rx coalesced usec exceed hardware limitation\n");
>   return -EINVAL;
>   }
> 
> --
> 2.8.1

Acked-by: Fugang Duan 


RE: [PATCH v2 1/2] ARM: imx6: disable deeper idle states when FEC is active w/o HW workaround

2016-06-06 Thread Fugang Duan
From: Holger Schurig  Sent: Monday, June 06, 2016 7:04 
PM
> To: Fugang Duan ; Lucas Stach
> ; Shawn Guo 
> Cc: devicet...@vger.kernel.org; netdev@vger.kernel.org; linux-arm-
> ker...@lists.infradead.org; ker...@pengutronix.de; patchwork-
> l...@pengutronix.de
> Subject: RE: [PATCH v2 1/2] ARM: imx6: disable deeper idle states when FEC is
> active w/o HW workaround
> 
> > You just config the gpio irq like below patches:
> > bc20a5d6da71 (ARM: dts: imx6qdl-sabreauto: use GPIO_6 for FEC
> > interrupt.) 6261c4c8f13e (ARM: dts: imx6qdl-sabrelite: use GPIO_6 for
> > FEC interrupt.)
> 
> But this is per-board, e.g. not board-agnostic??!?!   Some board might
> have uses GPIO6 for other things and rendered it unusable for this ...

NXP HW reference design guide include this that reserve GPIO6 for FEC.  

The patch is fine for me, but I think it is unnecessary. 


Re: [PATCH] net: ethernet: cavium: liquidio: response_manager: Remove create_workqueue

2016-06-06 Thread David Miller
From: Bhaktipriya Shridhar 
Date: Sat, 4 Jun 2016 20:21:40 +0530

> alloc_workqueue replaces deprecated create_workqueue().
> 
> A dedicated workqueue has been used since the workitem viz
> (>wk.work which maps to oct_poll_req_completion) is involved
> in normal device operation. WQ_MEM_RECLAIM has been set to guarantee
> forward progress under memory pressure, which is a requirement here.
> Since there are only a fixed number of work items, explicit concurrency
> limit is unnecessary.
> 
> flush_workqueue is unnecessary since destroy_workqueue() itself calls
> drain_workqueue() which flushes repeatedly till the workqueue
> becomes empty. Hence the call to flush_workqueue() has been dropped.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.


Re: [PATCH] net: ethernet: cavium: liquidio: request_manager: Remove create_workqueue

2016-06-06 Thread David Miller
From: Bhaktipriya Shridhar 
Date: Sat, 4 Jun 2016 20:54:00 +0530

> alloc_workqueue replaces deprecated create_workqueue().
> 
> A dedicated workqueue has been used since the workitem viz
> (_wq->wk.work which maps to check_db_timeout) is involved
> in normal device operation. WQ_MEM_RECLAIM has been set to guarantee
> forward progress under memory pressure, which is a requirement here.
> Since there are only a fixed number of work items, explicit concurrency
> limit is unnecessary.
> 
> flush_workqueue is unnecessary since destroy_workqueue() itself calls
> drain_workqueue() which flushes repeatedly till the workqueue
> becomes empty.
> 
> Signed-off-by: Bhaktipriya Shridhar 

Applied.


Re: [Patch net] net_sched: keep backlog updated with qlen

2016-06-06 Thread David Miller
From: Cong Wang 
Date: Fri,  3 Jun 2016 15:05:57 -0700

> For gso_skb we only update qlen, backlog should be updated too.
> 
> Note, it is correct to just update these stats at one layer,
> because the gso_skb is cached there.
> 
> Reported-by: Stas Nichiporovich 
> Fixes: 2f5fb43f ("net_sched: update hierarchical backlog too")
> Cc: Jamal Hadi Salim 
> Signed-off-by: Cong Wang 

Applied, thanks.


Re: [PATCH v3] virtio-net: Add initial MTU advice feature

2016-06-06 Thread David Miller
From: Aaron Conole 
Date: Fri,  3 Jun 2016 16:57:12 -0400

> This commit adds the feature bit and associated mtu device entry for the
> virtio network device.  When a virtio device comes up, it checks the
> feature bit for the VIRTIO_NET_F_MTU feature.  If such feature bit is
> enabled, the driver will read the advised MTU and use it as the initial
> value.
> 
> Signed-off-by: Aaron Conole 

Applied, thanks.


Re: [PATCH net-next 2/2] tcp: add NV congestion control

2016-06-06 Thread Lawrence Brakmo
On 6/6/16, 4:01 PM, "David Miller"  wrote:

>From: Lawrence Brakmo 
>Date: Fri, 3 Jun 2016 13:37:58 -0700
>
>> +module_param(nv_enable, int, 0644);
>> +MODULE_PARM_DESC(nv_enable, "enable NV (congestion avoidance)
>>behavior");
>> +module_param(nv_pad, int, 0644);
>> +MODULE_PARM_DESC(nv_pad, "extra packets above congestion level");
>> +module_param(nv_pad_buffer, int, 0644);
>> +MODULE_PARM_DESC(nv_pad_buffer, "no growth buffer zone");
>> +module_param(nv_reset_period, int, 0644);
>> +MODULE_PARM_DESC(nv_reset_period, "nv_min_rtt reset period (secs)");
>> +module_param(nv_min_cwnd, int, 0644);
>> +MODULE_PARM_DESC(nv_min_cwnd, "NV will not decrease cwnd below this
>>value"
>> + " without losses");
>> +module_param(nv_dec_eval_min_calls, int, 0644);
>> +MODULE_PARM_DESC(nv_dec_eval_min_calls, "Wait for this many data
>>points"
>> + " before declaring congestion");
>> +module_param(nv_inc_eval_min_calls, int, 0644);
>> +MODULE_PARM_DESC(nv_inc_eval_min_calls, "Wait for this many data
>>points"
>> + " before allowing cwnd growth");
>> +module_param(nv_stop_rtt_cnt, int, 0644);
>> +MODULE_PARM_DESC(nv_stop_rtt_cnt, "Wait for this many RTTs before
>>stopping"
>> + " cwnd growth");
>> +module_param(nv_ssthresh_eval_min_calls, int, 0644);
>> +MODULE_PARM_DESC(nv_ssthresh_eval_min_calls, "Wait for this many data
>>points"
>> + " before declaring congestion during initial slow-start");
>> +module_param(nv_rtt_min_cnt, int, 0644);
>> +MODULE_PARM_DESC(nv_rtt_min_cnt, "Wait for this many RTTs before
>>declaring"
>> + " congestion");
>> +module_param(nv_cong_dec_mult, int, 0644);
>> +MODULE_PARM_DESC(nv_cong_dec_mult, "Congestion decrease factor");
>> +module_param(nv_ssthresh_factor, int, 0644);
>> +MODULE_PARM_DESC(nv_ssthresh_factor, "ssthresh factor");
>> +module_param(nv_rtt_factor, int, 0644);
>> +MODULE_PARM_DESC(nv_rtt_factor, "rtt averaging factor");
>> +module_param(nv_rtt_cnt_dec_delta, int, 0644);
>> +MODULE_PARM_DESC(nv_rtt_cnt_dec_delta, "decrease cwnd for this many
>>RTTs"
>> + " every 100 RTTs");
>> +module_param(nv_dec_factor, int, 0644);
>> +MODULE_PARM_DESC(nv_dec_factor, "decrease cwnd every ~192 RTTS by
>>factor/8");
>> +module_param(nv_loss_dec_factor, int, 0644);
>> +MODULE_PARM_DESC(nv_loss_dec_factor, "on loss new cwnd = cwnd * this /
>>1024");
>> +module_param(nv_cwnd_growth_rate_neg, int, 0644);
>> +MODULE_PARM_DESC(nv_cwnd_growth_rate_neg, "Applies when current cwnd
>>growth"
>> + " rate < Reno");
>> +module_param(nv_cwnd_growth_rate_pos, int, 0644);
>> +MODULE_PARM_DESC(nv_cwnd_growth_rate_pos, "Applies when current cwnd
>>growth"
>> + " rate >= Reno");
>> +module_param(nv_min_min_rtt, int, 0644);
>> +MODULE_PARM_DESC(nv_min_min_rtt, "lower bound for ca->nv_min_rtt");
>> +module_param(nv_max_min_rtt, int, 0644);
>> +MODULE_PARM_DESC(nv_max_min_rtt, "upper bound for ca->nv_min_rtt");
>
>That's a disturbingly huge number of module parameters.  Even the first
>one "nv_enable" is superfluous, just hook up another congestion control
>algorithm.
>
>Please trim this down to something reasonable.  The more of these you
>have,
>the more of them people will start wanting to use and depend upon, and
>then
>you're stuck with them whether you like it or not.

I will trim them down to something much smaller. My original intent was to
support experimentation, but those interested can use this version of the
patch to play with a larger number of parameters/tunables.

In regards to the parameter ³nv_enable², its function is to quickly allow
existing NV flows to revert to Reno behavior (by ³echo 0 >
/sys/module/tcp_nv/parameters/nv_enable²). The reason why this may be
necessary, is that under some conditions NV flows competing with more
aggressive flows will perform poorly. People may be more willing to test
NV if there is a quick and simple way to revert if necessary (without
having to restart the flows).

As to why NV could perform poorly, it has to do with the inherent
unfairness between flows that only decrease their cwnd when there are
losses (congestion control) and flows that decrease their cwnd when they
detect queue buildup (congestion avoidance) unless there is network
support. This is specially true when the RTTs for both types are small
(traffic within a DC), but the unfairness decreases as the RTTs of the
congestion control flows increases.

So, is it okay if I leave the parameter ³nv_enable² or would you still
prefer that I remove it?



Re: [ovs-dev] [PATCH net-next] NSH(Network Service Header) implementation

2016-06-06 Thread pravin shelar
On Mon, Jun 6, 2016 at 2:34 AM, Yi Yang  wrote:
> IETF defined NSH(Network Service Header) for Service
> Function Chaining, this is an IETF draft
>
> https://tools.ietf.org/html/draft-ietf-sfc-nsh-05
>
> It will be a IETF standard shortly, this patch implemented
> NSH for Open vSwitch.
>
> Signed-off-by: Johnson Li 
> Signed-off-by: Yi Yang 
> ---
>  drivers/net/vxlan.c  |   7 ++
>  include/net/nsh.h| 117 +++
>  include/uapi/linux/openvswitch.h |  32 +++
>  net/openvswitch/actions.c|  68 +
>  net/openvswitch/flow.c   |  45 -
>  net/openvswitch/flow.h   |  15 +++
>  net/openvswitch/flow_netlink.c   | 202 
> ++-
>  net/openvswitch/vport-netdev.c   |   3 +-
>  net/openvswitch/vport-vxlan.c|  15 +++
>  9 files changed, 501 insertions(+), 3 deletions(-)
>  create mode 100644 include/net/nsh.h
>

...
...
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 9a3eb7a..38e787c 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -38,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "datapath.h"
>  #include "flow.h"
> @@ -259,6 +261,64 @@ static int push_vlan(struct sk_buff *skb, struct 
> sw_flow_key *key,
>  ntohs(vlan->vlan_tci) & ~VLAN_TAG_PRESENT);
>  }
>
...
...
> +
> +static int push_nsh(struct sk_buff *skb, struct sw_flow_key *key,
> +   const struct ovs_action_push_nsh *nsh)
> +{
> +   if (nsh->len > 0 && nsh->len <= 256) {
> +   struct nsh_hdr *nsh_hdr = NULL;
> +
> +   if (skb_cow_head(skb, nsh->len) < 0)
> +   return -ENOMEM;
> +
> +   skb_push(skb, nsh->len);
> +   nsh_hdr = (struct nsh_hdr *)(skb->data);
> +   memcpy(nsh_hdr, nsh->header, nsh->len);
> +
> +   if (!skb->inner_protocol)
> +   skb_set_inner_protocol(skb, skb->protocol);
> +
> +   skb->protocol = htons(ETH_P_NSH); /* 0x894F */
> +   key->eth.type = htons(ETH_P_NSH);
> +   } else {
> +   return -EINVAL;
> +   }
> +
> +   return 0;
> +}

Networking stack or OVS can not handle arbitrary skb-protocol. For
example what happens if OVS has push vlan action or it sends this nsh
packet to net device which can not handle nsh packet? Even networking
stack can not parse such packet for handling offloads in software.
...

> diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> index 5eb7694..3d060c4 100644
> --- a/net/openvswitch/vport-vxlan.c
> +++ b/net/openvswitch/vport-vxlan.c
> @@ -52,6 +52,18 @@ static int vxlan_get_options(const struct vport *vport, 
> struct sk_buff *skb)
> return -EMSGSIZE;
>
> nla_nest_end(skb, exts);
> +   } else if (vxlan->flags & VXLAN_F_GPE) {
> +   struct nlattr *exts;
> +
> +   exts = nla_nest_start(skb, OVS_TUNNEL_ATTR_EXTENSION);
> +   if (!exts)
> +   return -EMSGSIZE;
> +
> +   if (vxlan->flags & VXLAN_F_GPE &&
> +   nla_put_flag(skb, OVS_VXLAN_EXT_GPE))
> +   return -EMSGSIZE;
> +
> +   nla_nest_end(skb, exts);
> }
>
> return 0;
> @@ -59,6 +71,7 @@ static int vxlan_get_options(const struct vport *vport, 
> struct sk_buff *skb)
>
>  static const struct nla_policy exts_policy[OVS_VXLAN_EXT_MAX + 1] = {
> [OVS_VXLAN_EXT_GBP] = { .type = NLA_FLAG, },
> +   [OVS_VXLAN_EXT_GPE] = { .type = NLA_FLAG, },
>  };
>
>  static int vxlan_configure_exts(struct vport *vport, struct nlattr *attr,
> @@ -76,6 +89,8 @@ static int vxlan_configure_exts(struct vport *vport, struct 
> nlattr *attr,
>
> if (exts[OVS_VXLAN_EXT_GBP])
> conf->flags |= VXLAN_F_GBP;
> +   else if (exts[OVS_VXLAN_EXT_GPE])
> +   conf->flags |= VXLAN_F_GPE;
>
This is compatibility code, no need to add new features to this code.
Now we should be directly using net devices.


Re: [PATCH net-next v2 5/5] net: dsa: bcm_sf2: Register our slave MDIO bus

2016-06-06 Thread Andrew Lunn
On Mon, Jun 06, 2016 at 04:14:55PM -0700, Florian Fainelli wrote:
> Register a slave MDIO bus which allows us to divert problematic
> read/writes towards conflicting pseudo-PHY address (30). Do no longer
> rely on DSA's slave_mii_bus, but instead provide our own implementation
> which offers more flexibility as to what to do, and when to register it.
> 
> We need to register it by the time we are able to get access to our
> memory mapped registers, which is not until drv->setup() time. In order
> to avoid forward declarations, we need to re-order the function bodies a
> bit.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

 Andrew


Re: [PATCH net-next v2 4/5] net: dsa: Initialize CPU port ethtool ops per tree

2016-06-06 Thread Andrew Lunn
On Mon, Jun 06, 2016 at 04:14:54PM -0700, Florian Fainelli wrote:
> Now that we can properly support multiple distinct trees in the system,
> using a global variable: dsa_cpu_port_ethtool_ops is getting clobbered
> as soon as the second switch tree gets probed, and we don't want that.
> 
> We need to move this to be dynamically allocated, and since we can't
> really be comparing addresses anymore to determine first time
> initialization versus any other times, just move this to dsa.c and
> dsa2.c where the remainder of the dst/ds initialization happens.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  net/dsa/dsa.c  | 28 
>  net/dsa/dsa2.c |  4 
>  net/dsa/dsa_priv.h |  2 ++
>  net/dsa/slave.c| 10 --
>  4 files changed, 34 insertions(+), 10 deletions(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index ce3b942dce76..37026f04ee4d 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -266,6 +266,30 @@ const struct dsa_device_ops 
> *dsa_resolve_tag_protocol(int tag_protocol)
>   return ops;
>  }
>  
> +int dsa_cpu_port_ethtool_setup(struct dsa_switch_tree *dst,
> +struct dsa_switch *ds)
> +{
> + struct net_device *master;
> + struct ethtool_ops *cpu_ops;
> +
> + master = ds->dst->master_netdev;
> + if (ds->master_netdev)
> + master = ds->master_netdev;
> +
> + cpu_ops = devm_kzalloc(ds->dev, sizeof(*cpu_ops), GFP_KERNEL);
> + if (!cpu_ops)
> + return -ENOMEM;
> +
> + memcpy(>master_ethtool_ops, master->ethtool_ops,
> +sizeof(struct ethtool_ops));
> + memcpy(cpu_ops, >master_ethtool_ops,
> +sizeof(struct ethtool_ops));
> + dsa_cpu_port_ethtool_init(cpu_ops);
> + master->ethtool_ops = cpu_ops;
> +
> + return 0;
> +}

Hi Florian

Why is there not a symmetrical dsa_cpu_port_ethertool_destroy method,
which will restore master->ethtool_ops when the switch module is
unloaded. I think at the moment, you end up with master->ethtool_ops
pointing at released memory.

 Andrew


Re: [PATCH net-next v2 2/5] net: dsa: Initialize ds->enabled_port_mask and ds->phys_mii_mask

2016-06-06 Thread Andrew Lunn
> @@ -304,6 +312,18 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, 
> struct dsa_switch *ds)
>   if (err < 0)
>   return err;
>  
> + if (!ds->slave_mii_bus && ds->drv->phy_read) {
> + ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
> + if (!ds->slave_mii_bus)
> + return err;
> +
> + dsa_slave_mii_bus_init(ds);
> +
> + err = mdiobus_register(ds->slave_mii_bus);
> + if (err < 0)
> + return err;
> + }
> +
>   for (index = 0; index < DSA_MAX_PORTS; index++) {
>   port = ds->ports[index].dn;
>   if (!port)

Hi Florian

This hunk does not seem to fit in this patch.

It is also missing the unregister in dsa_ds_unapply().

   Andrew


Re: [PATCH net-next 2/2] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-06-06 Thread Eric Dumazet
On Mon, 2016-06-06 at 16:15 -0700, Cong Wang wrote:
> On Mon, Jun 6, 2016 at 9:37 AM, Eric Dumazet  wrote:
> >  void
> > -__gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
> > +__gnet_stats_copy_basic(const seqcount_t *running,
> > +   struct gnet_stats_basic_packed *bstats,
> > struct gnet_stats_basic_cpu __percpu *cpu,
> > struct gnet_stats_basic_packed *b)
> >  {
> > +   unsigned int seq;
> > +
> > if (cpu) {
> > __gnet_stats_copy_basic_cpu(bstats, cpu);
> > -   } else {
> > +   return;
> > +   }
> > +   do {
> > +   if (running)
> > +   seq = read_seqcount_begin(running);
> > bstats->bytes = b->bytes;
> > bstats->packets = b->packets;
> > -   }
> > +   } while (running && read_seqcount_retry(running, seq));
> >  }
> 
> Why only these basic stats need to get read seqlock?

(seqcount)

> Queue stats (gnet_stats_copy_queue()) too, right?

All these values are 32bit values, right ?

struct gnet_stats_queue {
__u32   qlen;
__u32   backlog;
__u32   drops;
__u32   requeues;
__u32   overlimits;
};

Really sounds overkill to care about these, as probably no one needs to
get a 'consistent view of all these counters in a snapshot'.

Even as of today, the qlen/backlog pair is wrong. No one ever used these
values in an SNMP agent.

Note that qlen/backlog is changed both by enqueue/dequeue, so the
seqcount protection would not work.

With the percpu stats thing, stats can not be fetched in a 'consistent'
way.





Re: [PATCH net-next v3 0/2] net: vrf: Improve use of FIB rules

2016-06-06 Thread David Ahern

On 6/6/16 4:47 PM, David Miller wrote:


I hate module parameters.

And you don't even need one in this situation, just use a default preference
of 1000 and add a newlink netlink attribute that can change it.


The intent is to generate default rules similar to what is done for 
local and main tables. Users are free to delete and move rules if they 
don't like the defaults. I'll drop the module parameter and just do the 
rule add once on first device create. From there user intentions reign 
if they move it.


Re: [PATCH][RT] netpoll: Always take poll_lock when doing polling

2016-06-06 Thread Alison Chaiken
Steven Rostedt suggests in reference to "[PATCH][RT] netpoll: Always
take poll_lock when doing polling"
>> >> [ Alison, can you try this patch ]
>>
 Sebastian follows up:
>> >Alison, did you try it?

I wrote:
>> I did try that patch, but it hasn't made much difference.   Let me
>> back up and restate the problem I'm trying to solve, which is that a
>> DRA7X OMAP5 SOC system running a patched 4.1.18-ti-rt kernel has a
>> main event loop in user space that misses latency deadlines under the
>> test condition where I ping-flood it from another box.   While in
>> production, the system would not be expected to support high rates of
>> network traffic, but the instability with the ping-flood makes me
>> wonder if there aren't underlying configuration problems.

Clark asked:
> What sort of tunings have you applied, regarding thread and interrupt 
> affinity?
> Also, what scheduler policy/priority are you using for the user-space 
> application?

We have the most critical hard IRQs (CAN, UART) pinned to one core,
scheduled with FIFO, and running at highest RT priority.  The less
critical IRQs (ethernet, MMC, DMA) are pinned to the other core and
are running at lower FIFO priority.   Next in FIFO priority we have
the ktimersoftd threads.   Then  we have our critical userspace
application running under RR with slightly lower priority and no
pinning.

When there is not much network traffic, the userspace event_loop makes
its deadlines, but when there is a lot of network traffic, the two
network hard IRQs shoot to the top of the process table, with one of
them using about 80% of one core.   This behavior persists whether the
kernel includes "net: provide a way to delegate processing a softirq
to ksoftirqd", "softirq: Perform softirqs in local_bh_enable() for a
limited amount of time", or reverts c10d73671 "softirq: reduce
latencies".

It's hard to see how a *hard* IRQ could take so much processor time.
I guess this gets back to
http://article.gmane.org/gmane.linux.kernel/2219110:
From: Rik van Riel <>
Subject: Re: [RFC PATCH 0/2] net: threadable napi poll loop
I need to get back to fixing irq & softirq time accounting,
which does not currently work correctly in all time keeping
modes...

So most likely the softirq budget is getting charged to the hard IRQ
that raises it.

>If you have not, you might try isolating one of your cores and just run the 
>user-space application on that core, with interrupt threads running on the 
>other core. You could use the 'tuna' application like this:
> $ sudo tuna --cpus=1 --isolate
> This will move all the threads that *can* be moved off of cpu1 (probably to 
> cpu0 since I believe the OMAP5 is a dual-core processor?).

Thanks, I installed tuna and gave that a try, but it actually makes
things worse.   I also tried lowering the priority of the ethernet
hard IRQ below that of the most critical userspace application, to no
avail.

Perhaps expecting an RT system to survive a ping-flood is just
unreasonable?   It would be nice to deliver a system that I didn't
know how to bring down.   At least in our real use case, the critical
system will be NAT'ed and packets will not be forwarded to it.

Thanks,
Alison


Re: [PATCH net-next 2/2] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-06-06 Thread Cong Wang
On Mon, Jun 6, 2016 at 9:37 AM, Eric Dumazet  wrote:
>  void
> -__gnet_stats_copy_basic(struct gnet_stats_basic_packed *bstats,
> +__gnet_stats_copy_basic(const seqcount_t *running,
> +   struct gnet_stats_basic_packed *bstats,
> struct gnet_stats_basic_cpu __percpu *cpu,
> struct gnet_stats_basic_packed *b)
>  {
> +   unsigned int seq;
> +
> if (cpu) {
> __gnet_stats_copy_basic_cpu(bstats, cpu);
> -   } else {
> +   return;
> +   }
> +   do {
> +   if (running)
> +   seq = read_seqcount_begin(running);
> bstats->bytes = b->bytes;
> bstats->packets = b->packets;
> -   }
> +   } while (running && read_seqcount_retry(running, seq));
>  }

Why only these basic stats need to get read seqlock?
Queue stats (gnet_stats_copy_queue()) too, right?


[PATCH net-next v2 0/5] net: dsa: misc improvements

2016-06-06 Thread Florian Fainelli
Hi all,

This patch series builds on top of Andrew's "New DSA bind, switches as devices"
patch set and does the following:

- add a few helper functions/goodies for net/dsa/dsa2.c to be as close as 
possible
  from net/dsa/dsa.c in terms of what drivers can expect, in particular the 
slave
  MDIO bus and the enabled_port_mask and phy_mii_mask

- fix the CPU port ethtools ops to work in a multiple tree setup since we can
  no longer assume a single tree is supported

- make the bcm_sf2 driver register its own MDIO bus, yet assign it to
  ds->slave_mii_bus for everything to work in net/dsa/slave.c wrt. PHY probing,
  this is a tad cleaner than what we have now

Most of the previous patches have been dropped to just keep the relevant ones
now.

Florian Fainelli (5):
  net: dsa: Provide unique DSA slave MII bus names
  net: dsa: Initialize ds->enabled_port_mask and ds->phys_mii_mask
  net: dsa: Add initialization helper for CPU port ethtool_ops
  net: dsa: Initialize CPU port ethtool ops per tree
  net: dsa: bcm_sf2: Register our slave MDIO bus

 drivers/net/dsa/bcm_sf2.c | 215 +-
 drivers/net/dsa/bcm_sf2.h |   6 ++
 net/dsa/dsa.c |  28 ++
 net/dsa/dsa2.c|  31 +++
 net/dsa/dsa_priv.h|   3 +
 net/dsa/slave.c   |  25 ++
 6 files changed, 211 insertions(+), 97 deletions(-)

-- 
2.7.4



[PATCH net-next v2 1/5] net: dsa: Provide unique DSA slave MII bus names

2016-06-06 Thread Florian Fainelli
In case we have multiples trees and switches with the same index, we
need to add another discriminating id: the switch tree.

Reviewed-by: Andrew Lunn 
Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 net/dsa/slave.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 15a492261895..a51dfedf0014 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -49,7 +49,8 @@ void dsa_slave_mii_bus_init(struct dsa_switch *ds)
ds->slave_mii_bus->name = "dsa slave smi";
ds->slave_mii_bus->read = dsa_slave_phy_read;
ds->slave_mii_bus->write = dsa_slave_phy_write;
-   snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "dsa-%d", ds->index);
+   snprintf(ds->slave_mii_bus->id, MII_BUS_ID_SIZE, "dsa-%d.%d",
+ds->dst->tree, ds->index);
ds->slave_mii_bus->parent = ds->dev;
ds->slave_mii_bus->phy_mask = ~ds->phys_mii_mask;
 }
-- 
2.7.4



[PATCH net-next v2 4/5] net: dsa: Initialize CPU port ethtool ops per tree

2016-06-06 Thread Florian Fainelli
Now that we can properly support multiple distinct trees in the system,
using a global variable: dsa_cpu_port_ethtool_ops is getting clobbered
as soon as the second switch tree gets probed, and we don't want that.

We need to move this to be dynamically allocated, and since we can't
really be comparing addresses anymore to determine first time
initialization versus any other times, just move this to dsa.c and
dsa2.c where the remainder of the dst/ds initialization happens.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa.c  | 28 
 net/dsa/dsa2.c |  4 
 net/dsa/dsa_priv.h |  2 ++
 net/dsa/slave.c| 10 --
 4 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index ce3b942dce76..37026f04ee4d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -266,6 +266,30 @@ const struct dsa_device_ops *dsa_resolve_tag_protocol(int 
tag_protocol)
return ops;
 }
 
+int dsa_cpu_port_ethtool_setup(struct dsa_switch_tree *dst,
+  struct dsa_switch *ds)
+{
+   struct net_device *master;
+   struct ethtool_ops *cpu_ops;
+
+   master = ds->dst->master_netdev;
+   if (ds->master_netdev)
+   master = ds->master_netdev;
+
+   cpu_ops = devm_kzalloc(ds->dev, sizeof(*cpu_ops), GFP_KERNEL);
+   if (!cpu_ops)
+   return -ENOMEM;
+
+   memcpy(>master_ethtool_ops, master->ethtool_ops,
+  sizeof(struct ethtool_ops));
+   memcpy(cpu_ops, >master_ethtool_ops,
+  sizeof(struct ethtool_ops));
+   dsa_cpu_port_ethtool_init(cpu_ops);
+   master->ethtool_ops = cpu_ops;
+
+   return 0;
+}
+
 static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
 {
struct dsa_switch_driver *drv = ds->drv;
@@ -379,6 +403,10 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
ret = 0;
}
 
+   ret = dsa_cpu_port_ethtool_setup(dst, ds);
+   if (ret)
+   return ret;
+
 #ifdef CONFIG_NET_DSA_HWMON
/* If the switch provides a temperature sensor,
 * register with hardware monitoring subsystem.
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 5ae45210a936..6e912745e43d 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -391,6 +391,10 @@ static int dsa_dst_apply(struct dsa_switch_tree *dst)
return err;
}
 
+   err = dsa_cpu_port_ethtool_setup(dst, dst->ds[0]);
+   if (err)
+   return err;
+
/* If we use a tagging format that doesn't have an ethertype
 * field, make sure that all packets from this point on get
 * sent to the tag format's receive function.
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 106a9f067f94..3bb88b2fb580 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -54,6 +54,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device 
*dev,
  struct device_node *port_dn, int port);
 void dsa_cpu_dsa_destroy(struct device_node *port_dn);
 const struct dsa_device_ops *dsa_resolve_tag_protocol(int tag_protocol);
+int dsa_cpu_port_ethtool_setup(struct dsa_switch_tree *dst,
+  struct dsa_switch *ds);
 
 /* slave.c */
 extern const struct dsa_device_ops notag_netdev_ops;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8d159932e082..7236eb26dc97 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -892,8 +892,6 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = {
.get_eee= dsa_slave_get_eee,
 };
 
-static struct ethtool_ops dsa_cpu_port_ethtool_ops;
-
 static const struct net_device_ops dsa_slave_netdev_ops = {
.ndo_open   = dsa_slave_open,
.ndo_stop   = dsa_slave_close,
@@ -1126,14 +1124,6 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
device *parent,
 
slave_dev->features = master->vlan_features;
slave_dev->ethtool_ops = _slave_ethtool_ops;
-   if (master->ethtool_ops != _cpu_port_ethtool_ops) {
-   memcpy(>master_ethtool_ops, master->ethtool_ops,
-  sizeof(struct ethtool_ops));
-   memcpy(_cpu_port_ethtool_ops, >master_ethtool_ops,
-  sizeof(struct ethtool_ops));
-   dsa_cpu_port_ethtool_init(_cpu_port_ethtool_ops);
-   master->ethtool_ops = _cpu_port_ethtool_ops;
-   }
eth_hw_addr_inherit(slave_dev, master);
slave_dev->priv_flags |= IFF_NO_QUEUE;
slave_dev->netdev_ops = _slave_netdev_ops;
-- 
2.7.4



[PATCH net-next v2 3/5] net: dsa: Add initialization helper for CPU port ethtool_ops

2016-06-06 Thread Florian Fainelli
Add a helper function: dsa_cpu_port_ethtool_init() which initializes a
custom ethtool_ops structure with custom DSA ethtool operations for CPU
ports. This is a preliminary change to move the initialization outside
of net/dsa/slave.c.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa_priv.h |  1 +
 net/dsa/slave.c| 14 --
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index b42f1a5f95f3..106a9f067f94 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -58,6 +58,7 @@ const struct dsa_device_ops *dsa_resolve_tag_protocol(int 
tag_protocol);
 /* slave.c */
 extern const struct dsa_device_ops notag_netdev_ops;
 void dsa_slave_mii_bus_init(struct dsa_switch *ds);
+void dsa_cpu_port_ethtool_init(struct ethtool_ops *ops);
 int dsa_slave_create(struct dsa_switch *ds, struct device *parent,
 int port, const char *name);
 void dsa_slave_destroy(struct net_device *slave_dev);
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index a51dfedf0014..8d159932e082 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -865,6 +865,13 @@ static void dsa_slave_poll_controller(struct net_device 
*dev)
 }
 #endif
 
+void dsa_cpu_port_ethtool_init(struct ethtool_ops *ops)
+{
+   ops->get_sset_count = dsa_cpu_port_get_sset_count;
+   ops->get_ethtool_stats = dsa_cpu_port_get_ethtool_stats;
+   ops->get_strings = dsa_cpu_port_get_strings;
+}
+
 static const struct ethtool_ops dsa_slave_ethtool_ops = {
.get_settings   = dsa_slave_get_settings,
.set_settings   = dsa_slave_set_settings,
@@ -1124,12 +1131,7 @@ int dsa_slave_create(struct dsa_switch *ds, struct 
device *parent,
   sizeof(struct ethtool_ops));
memcpy(_cpu_port_ethtool_ops, >master_ethtool_ops,
   sizeof(struct ethtool_ops));
-   dsa_cpu_port_ethtool_ops.get_sset_count =
-   dsa_cpu_port_get_sset_count;
-   dsa_cpu_port_ethtool_ops.get_ethtool_stats =
-   dsa_cpu_port_get_ethtool_stats;
-   dsa_cpu_port_ethtool_ops.get_strings =
-   dsa_cpu_port_get_strings;
+   dsa_cpu_port_ethtool_init(_cpu_port_ethtool_ops);
master->ethtool_ops = _cpu_port_ethtool_ops;
}
eth_hw_addr_inherit(slave_dev, master);
-- 
2.7.4



[PATCH net-next v2 5/5] net: dsa: bcm_sf2: Register our slave MDIO bus

2016-06-06 Thread Florian Fainelli
Register a slave MDIO bus which allows us to divert problematic
read/writes towards conflicting pseudo-PHY address (30). Do no longer
rely on DSA's slave_mii_bus, but instead provide our own implementation
which offers more flexibility as to what to do, and when to register it.

We need to register it by the time we are able to get access to our
memory mapped registers, which is not until drv->setup() time. In order
to avoid forward declarations, we need to re-order the function bodies a
bit.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 215 +-
 drivers/net/dsa/bcm_sf2.h |   6 ++
 2 files changed, 140 insertions(+), 81 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 73df91bb0466..8026fc21c4fb 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -836,6 +837,66 @@ static int bcm_sf2_sw_fdb_dump(struct dsa_switch *ds, int 
port,
return 0;
 }
 
+static int bcm_sf2_sw_indir_rw(struct bcm_sf2_priv *priv, int op, int addr,
+  int regnum, u16 val)
+{
+   int ret = 0;
+   u32 reg;
+
+   reg = reg_readl(priv, REG_SWITCH_CNTRL);
+   reg |= MDIO_MASTER_SEL;
+   reg_writel(priv, reg, REG_SWITCH_CNTRL);
+
+   /* Page << 8 | offset */
+   reg = 0x70;
+   reg <<= 2;
+   core_writel(priv, addr, reg);
+
+   /* Page << 8 | offset */
+   reg = 0x80 << 8 | regnum << 1;
+   reg <<= 2;
+
+   if (op)
+   ret = core_readl(priv, reg);
+   else
+   core_writel(priv, val, reg);
+
+   reg = reg_readl(priv, REG_SWITCH_CNTRL);
+   reg &= ~MDIO_MASTER_SEL;
+   reg_writel(priv, reg, REG_SWITCH_CNTRL);
+
+   return ret & 0x;
+}
+
+static int bcm_sf2_sw_mdio_read(struct mii_bus *bus, int addr, int regnum)
+{
+   struct bcm_sf2_priv *priv = bus->priv;
+
+   /* Intercept reads from Broadcom pseudo-PHY address, else, send
+* them to our master MDIO bus controller
+*/
+   if (addr == BRCM_PSEUDO_PHY_ADDR && priv->indir_phy_mask & BIT(addr))
+   return bcm_sf2_sw_indir_rw(priv, 1, addr, regnum, 0);
+   else
+   return mdiobus_read(priv->master_mii_bus, addr, regnum);
+}
+
+static int bcm_sf2_sw_mdio_write(struct mii_bus *bus, int addr, int regnum,
+u16 val)
+{
+   struct bcm_sf2_priv *priv = bus->priv;
+
+   /* Intercept writes to the Broadcom pseudo-PHY address, else,
+* send them to our master MDIO bus controller
+*/
+   if (addr == BRCM_PSEUDO_PHY_ADDR && priv->indir_phy_mask & BIT(addr))
+   bcm_sf2_sw_indir_rw(priv, 0, addr, regnum, val);
+   else
+   mdiobus_write(priv->master_mii_bus, addr, regnum, val);
+
+   return 0;
+}
+
 static irqreturn_t bcm_sf2_switch_0_isr(int irq, void *dev_id)
 {
struct bcm_sf2_priv *priv = dev_id;
@@ -932,6 +993,72 @@ static void bcm_sf2_identify_ports(struct bcm_sf2_priv 
*priv,
}
 }
 
+static int bcm_sf2_mdio_register(struct dsa_switch *ds)
+{
+   struct bcm_sf2_priv *priv = ds_to_priv(ds);
+   struct device_node *dn;
+   static int index;
+   int err;
+
+   /* Find our integratd MDIO bus node */
+   dn = of_find_compatible_node(NULL, NULL, "brcm,unimac-mdio");
+   priv->master_mii_bus = of_mdio_find_bus(dn);
+   if (!priv->master_mii_bus)
+   return -EPROBE_DEFER;
+
+   get_device(>master_mii_bus->dev);
+   priv->master_mii_dn = dn;
+
+   priv->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
+   if (!priv->slave_mii_bus)
+   return -ENOMEM;
+
+   priv->slave_mii_bus->priv = priv;
+   priv->slave_mii_bus->name = "sf2 slave mii";
+   priv->slave_mii_bus->read = bcm_sf2_sw_mdio_read;
+   priv->slave_mii_bus->write = bcm_sf2_sw_mdio_write;
+   snprintf(priv->slave_mii_bus->id, MII_BUS_ID_SIZE, "sf2-%d",
+index++);
+   priv->slave_mii_bus->dev.of_node = dn;
+
+   /* Include the pseudo-PHY address to divert reads towards our
+* workaround. This is only required for 7445D0, since 7445E0
+* disconnects the internal switch pseudo-PHY such that we can use the
+* regular SWITCH_MDIO master controller instead.
+*
+* Here we flag the pseudo PHY as needing special treatment and would
+* otherwise make all other PHY read/writes go to the master MDIO bus
+* controller that comes with this switch backed by the "mdio-unimac"
+* driver.
+*/
+   if (of_machine_is_compatible("brcm,bcm7445d0"))
+   priv->indir_phy_mask |= (1 << BRCM_PSEUDO_PHY_ADDR);
+   else
+   priv->indir_phy_mask = 0;
+
+   ds->phys_mii_mask = priv->indir_phy_mask;
+   ds->slave_mii_bus = 

[PATCH net-next v2 2/5] net: dsa: Initialize ds->enabled_port_mask and ds->phys_mii_mask

2016-06-06 Thread Florian Fainelli
Some drivers rely on these two bitmasks to contain the correct values
for them to successfully probe and initialize at drv->setup() time,
calculate correct values to put in both masks as early as possible in
dsa_get_ports_dn().

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 80dfe08db825..5ae45210a936 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -283,6 +283,7 @@ static void dsa_user_port_unapply(struct device_node *port, 
u32 index,
if (ds->ports[index].netdev) {
dsa_slave_destroy(ds->ports[index].netdev);
ds->ports[index].netdev = NULL;
+   ds->enabled_port_mask &= ~(1 << index);
}
 }
 
@@ -292,6 +293,13 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, 
struct dsa_switch *ds)
u32 index;
int err;
 
+   /* Initialize ds->phys_mii_mask before registering the slave MDIO bus
+* driver and before drv->setup() has run, since the switch drivers and
+* the slave MDIO bus driver rely on these values for probing PHY
+* devices or not
+*/
+   ds->phys_mii_mask = ds->enabled_port_mask;
+
err = ds->drv->setup(ds);
if (err < 0)
return err;
@@ -304,6 +312,18 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, 
struct dsa_switch *ds)
if (err < 0)
return err;
 
+   if (!ds->slave_mii_bus && ds->drv->phy_read) {
+   ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
+   if (!ds->slave_mii_bus)
+   return err;
+
+   dsa_slave_mii_bus_init(ds);
+
+   err = mdiobus_register(ds->slave_mii_bus);
+   if (err < 0)
+   return err;
+   }
+
for (index = 0; index < DSA_MAX_PORTS; index++) {
port = ds->ports[index].dn;
if (!port)
@@ -511,6 +531,13 @@ static int dsa_parse_ports_dn(struct device_node *ports, 
struct dsa_switch *ds)
return -EINVAL;
 
ds->ports[reg].dn = port;
+
+   /* Initialize enabled_port_mask now for drv->setup()
+* to have access to a correct value, just like what
+* net/dsa/dsa.c::dsa_switch_setup_one does.
+*/
+   if (!dsa_port_is_cpu(port))
+   ds->enabled_port_mask |= 1 << reg;
}
 
return 0;
-- 
2.7.4



[PATCH net-next] gue: Implement direction IP encapsulation

2016-06-06 Thread Tom Herbert
This patch implements direct encapsulation of IPv4 and IPv6 packets
in UDP. This is done a version "1" of GUE and as explained in I-D
draft-ietf-nvo3-gue-03.

Changes here are only in the receive path, fou with IPxIPx already
supports the transmit side. Both the normal receive path and
GRO path are modified to check for GUE version and check for
IP version in the case that GUE version is "1".

Tested:

IPIP with direct GUE encap
  1 TCP_STREAM
4530 Mbps
  200 TCP_RR
1297625 tps
135/232/444 90/95/99% latencies

IP4IP6 with direct GUE encap
  1 TCP_STREAM
4903 Mbps
  200 TCP_RR
1184481 tps
149/253/473 90/95/99% latencies

IP6IP6 direct GUE encap
  1 TCP_STREAM
   5146 Mbps
  200 TCP_RR
1202879 tps
146/251/472 90/95/99% latencies

SIT with direct GUE encap
  1 TCP_STREAM
6111 Mbps
  200 TCP_RR
1250337 tps
139/241/467 90/95/99% latencies

Signed-off-by: Tom Herbert 
---
 net/ipv4/fou.c | 81 ++
 1 file changed, 76 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 5f9207c..321d57f 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -129,6 +129,36 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff 
*skb)
 
guehdr = (struct guehdr *)_hdr(skb)[1];
 
+   switch (guehdr->version) {
+   case 0: /* Full GUE header present */
+   break;
+
+   case 1: {
+   /* Direct encasulation of IPv4 or IPv6 */
+
+   int prot;
+
+   switch (((struct iphdr *)guehdr)->version) {
+   case 4:
+   prot = IPPROTO_IPIP;
+   break;
+   case 6:
+   prot = IPPROTO_IPV6;
+   break;
+   default:
+   goto drop;
+   }
+
+   if (fou_recv_pull(skb, fou, sizeof(struct udphdr)))
+   goto drop;
+
+   return -prot;
+   }
+
+   default: /* Undefined version */
+   goto drop;
+   }
+
optlen = guehdr->hlen << 2;
len += optlen;
 
@@ -289,6 +319,7 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
int flush = 1;
struct fou *fou = fou_from_sock(sk);
struct gro_remcsum grc;
+   u8 proto;
 
skb_gro_remcsum_init();
 
@@ -302,6 +333,25 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
goto out;
}
 
+   switch (guehdr->version) {
+   case 0:
+   break;
+   case 1:
+   switch (((struct iphdr *)guehdr)->version) {
+   case 4:
+   proto = IPPROTO_IPIP;
+   break;
+   case 6:
+   proto = IPPROTO_IPV6;
+   break;
+   default:
+   goto out;
+   }
+   goto next_proto;
+   default:
+   goto out;
+   }
+
optlen = guehdr->hlen << 2;
len += optlen;
 
@@ -370,6 +420,10 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
}
}
 
+   proto = guehdr->proto_ctype;
+
+next_proto:
+
/* We can clear the encap_mark for GUE as we are essentially doing
 * one of two possible things.  We are either adding an L4 tunnel
 * header to the outer L3 tunnel header, or we are are simply
@@ -383,7 +437,7 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
 
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-   ops = rcu_dereference(offloads[guehdr->proto_ctype]);
+   ops = rcu_dereference(offloads[proto]);
if (WARN_ON_ONCE(!ops || !ops->callbacks.gro_receive))
goto out_unlock;
 
@@ -404,13 +458,30 @@ static int gue_gro_complete(struct sock *sk, struct 
sk_buff *skb, int nhoff)
const struct net_offload **offloads;
struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff);
const struct net_offload *ops;
-   unsigned int guehlen;
+   unsigned int guehlen = 0;
u8 proto;
int err = -ENOENT;
 
-   proto = guehdr->proto_ctype;
-
-   guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
+   switch (guehdr->version) {
+   case 0:
+   proto = guehdr->proto_ctype;
+   guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
+   break;
+   case 1:
+   switch (((struct iphdr *)guehdr)->version) {
+   case 4:
+   proto = IPPROTO_IPIP;
+   break;
+   case 6:
+   proto = IPPROTO_IPV6;
+   break;
+   default:
+   return err;
+   }
+   break;
+   default:
+   return err;
+   }
 
rcu_read_lock();

Re: [PATCH net-next 0/3] net: vrf: Add support for local traffic to local addresses

2016-06-06 Thread David Ahern

On 6/6/16 4:56 PM, David Miller wrote:

From: David Miller 
Date: Mon, 06 Jun 2016 15:12:28 -0700 (PDT)


From: David Ahern 
Date: Thu,  2 Jun 2016 13:15:09 -0700


Add support for locally originated traffic to VRF-local addresses,
be it addresses on enslaved devices or addresses on the VRF device:

 ...

Series applied, but I've been wondering what happens to hw offloads
when these VRF devices sit in the middle.

Does TSO et al. still occur properly?


It should, but I will double check and adjust if needed.



Actually I have to revert this series, ip6_input() is not an exported
module symbol.

If you're only build testing things like this with everything "=y",
please reconsider.



As I noted in the commit message "ip6_input is exported so the VRF 
driver can use it for the dst input function." It was dropped somewhere 
along the way of refactoring for upstream. Will send a v2.


Re: [PATCH net-next 2/2] tcp: add NV congestion control

2016-06-06 Thread David Miller
From: Lawrence Brakmo 
Date: Fri, 3 Jun 2016 13:37:58 -0700

> +module_param(nv_enable, int, 0644);
> +MODULE_PARM_DESC(nv_enable, "enable NV (congestion avoidance) behavior");
> +module_param(nv_pad, int, 0644);
> +MODULE_PARM_DESC(nv_pad, "extra packets above congestion level");
> +module_param(nv_pad_buffer, int, 0644);
> +MODULE_PARM_DESC(nv_pad_buffer, "no growth buffer zone");
> +module_param(nv_reset_period, int, 0644);
> +MODULE_PARM_DESC(nv_reset_period, "nv_min_rtt reset period (secs)");
> +module_param(nv_min_cwnd, int, 0644);
> +MODULE_PARM_DESC(nv_min_cwnd, "NV will not decrease cwnd below this value"
> +  " without losses");
> +module_param(nv_dec_eval_min_calls, int, 0644);
> +MODULE_PARM_DESC(nv_dec_eval_min_calls, "Wait for this many data points"
> +  " before declaring congestion");
> +module_param(nv_inc_eval_min_calls, int, 0644);
> +MODULE_PARM_DESC(nv_inc_eval_min_calls, "Wait for this many data points"
> +  " before allowing cwnd growth");
> +module_param(nv_stop_rtt_cnt, int, 0644);
> +MODULE_PARM_DESC(nv_stop_rtt_cnt, "Wait for this many RTTs before stopping"
> +  " cwnd growth");
> +module_param(nv_ssthresh_eval_min_calls, int, 0644);
> +MODULE_PARM_DESC(nv_ssthresh_eval_min_calls, "Wait for this many data points"
> +  " before declaring congestion during initial slow-start");
> +module_param(nv_rtt_min_cnt, int, 0644);
> +MODULE_PARM_DESC(nv_rtt_min_cnt, "Wait for this many RTTs before declaring"
> +  " congestion");
> +module_param(nv_cong_dec_mult, int, 0644);
> +MODULE_PARM_DESC(nv_cong_dec_mult, "Congestion decrease factor");
> +module_param(nv_ssthresh_factor, int, 0644);
> +MODULE_PARM_DESC(nv_ssthresh_factor, "ssthresh factor");
> +module_param(nv_rtt_factor, int, 0644);
> +MODULE_PARM_DESC(nv_rtt_factor, "rtt averaging factor");
> +module_param(nv_rtt_cnt_dec_delta, int, 0644);
> +MODULE_PARM_DESC(nv_rtt_cnt_dec_delta, "decrease cwnd for this many RTTs"
> +  " every 100 RTTs");
> +module_param(nv_dec_factor, int, 0644);
> +MODULE_PARM_DESC(nv_dec_factor, "decrease cwnd every ~192 RTTS by factor/8");
> +module_param(nv_loss_dec_factor, int, 0644);
> +MODULE_PARM_DESC(nv_loss_dec_factor, "on loss new cwnd = cwnd * this / 
> 1024");
> +module_param(nv_cwnd_growth_rate_neg, int, 0644);
> +MODULE_PARM_DESC(nv_cwnd_growth_rate_neg, "Applies when current cwnd growth"
> +  " rate < Reno");
> +module_param(nv_cwnd_growth_rate_pos, int, 0644);
> +MODULE_PARM_DESC(nv_cwnd_growth_rate_pos, "Applies when current cwnd growth"
> +  " rate >= Reno");
> +module_param(nv_min_min_rtt, int, 0644);
> +MODULE_PARM_DESC(nv_min_min_rtt, "lower bound for ca->nv_min_rtt");
> +module_param(nv_max_min_rtt, int, 0644);
> +MODULE_PARM_DESC(nv_max_min_rtt, "upper bound for ca->nv_min_rtt");

That's a disturbingly huge number of module parameters.  Even the first
one "nv_enable" is superfluous, just hook up another congestion control
algorithm.

Please trim this down to something reasonable.  The more of these you have,
the more of them people will start wanting to use and depend upon, and then
you're stuck with them whether you like it or not.


Re: [PATCH net-next 0/3] net: vrf: Add support for local traffic to local addresses

2016-06-06 Thread David Miller
From: David Miller 
Date: Mon, 06 Jun 2016 15:12:28 -0700 (PDT)

> From: David Ahern 
> Date: Thu,  2 Jun 2016 13:15:09 -0700
> 
>> Add support for locally originated traffic to VRF-local addresses,
>> be it addresses on enslaved devices or addresses on the VRF device:
>  ...
> 
> Series applied, but I've been wondering what happens to hw offloads
> when these VRF devices sit in the middle.
> 
> Does TSO et al. still occur properly?

Actually I have to revert this series, ip6_input() is not an exported
module symbol.

If you're only build testing things like this with everything "=y",
please reconsider.


Re: [PATCH -next] cbq: remove only caller of qdisc->drop()

2016-06-06 Thread Florian Westphal
Eric Dumazet  wrote:
> >  static void cbq_ovl_drop(struct cbq_class *cl)
> >  {
> > -   if (cl->q->ops->drop)
> > -   if (cl->q->ops->drop(cl->q))
> > -   cl->qdisc->q.qlen--;
> > +   struct sk_buff *skb = cl->q->ops->dequeue(cl->q);
> > +
> > +   if (skb) {
> > +   cl->deficit -= qdisc_pkt_len(skb);
> > +   cl->qdisc->q.qlen--;
> > +   qdisc_drop(skb, cl->qdisc);
> > +   }
> > +
> > cl->xstats.overactions++;
> > cbq_ovl_classic(cl);
> >  }
> 
> A drop() is not equivalent to a dequeue() followed by qdisc_drop() for
> statistics.
> 
> dequeue() will update stats of _sent_ packets/bytes, while drop() should
> not.

Well, I could send patch to just remove cbq_ovl_drop completely,
you can't configure this facility with iproute2.

You are right of course, but is it really worth to have this?

Not calling cl->q->ops->drop() in cbq would allow removal of ~300 LOC
in qdiscs...


Re: [PATCH net-next v3 0/2] net: vrf: Improve use of FIB rules

2016-06-06 Thread David Miller
From: David Ahern 
Date: Fri,  3 Jun 2016 12:36:54 -0700

> Currently, VRFs require 1 oif and 1 iif rule per address family per
> VRF. As the number of VRF devices increases it brings scalability
> issues with the increasing rule list. All of the VRF rules have the
> same format with the exception of the specific table id to direct the
> lookup. Since the table id is available from the oif or iif in the
> loopup, the VRF rules can be consolidated to a single rule that pulls
> the table from the VRF device.
> 
> This solution still allows a user to insert their own rules for VRFs,
> including rules with additional attributes. Accordingly, it is backwards
> compatible with existing setups and allows other policy routing as
> desired.

I hate module parameters.

And you don't even need one in this situation, just use a default preference
of 1000 and add a newlink netlink attribute that can change it.

Thanks.


[PATCH 2/2] ipvs: update real-server binding of outgoing connections in SIP-pe

2016-06-06 Thread Pablo Neira Ayuso
From: Marco Angaroni 

Previous patch that introduced handling of outgoing packets in SIP
persistent-engine did not call ip_vs_check_template() in case packet was
matching a connection template. Assumption was that real-server was
healthy, since it was sending a packet just in that moment.

There are however real-server fault conditions requiring that association
between call-id and real-server (represented by connection template)
gets updated. Here is an example of the sequence of events:
  1) RS1 is a back2back user agent that handled call-id1 and call-id2
  2) RS1 is down and was marked as unavailable
  3) new message from outside comes to IPVS with call-id1
  4) IPVS reschedules the message to RS2, which becomes new call handler
  5) RS2 forwards the message outside, translating call-id1 to call-id2
  6) inside pe->conn_out() IPVS matches call-id2 with existing template
  7) IPVS does not change association call-id2 <-> RS1
  8) new message comes from client with call-id2
  9) IPVS reschedules the message to a real-server potentially different
 from RS2, which is now the correct destination

This patch introduces ip_vs_check_template() call in the handling of
outgoing packets for SIP-pe. And also introduces a second optional
argument for ip_vs_check_template() that allows to check if dest
associated to a connection template is the same dest that was identified
as the source of the packet. This is to change the real-server bound to a
particular call-id independently from its availability status: the idea
is that it's more reliable, for in->out direction (where internal
network can be considered trusted), to always associate a call-id with
the last real-server that used it in one of its messages. Think about
above sequence of events where, just after step 5, RS1 returns instead
to be available.

Comparison of dests is done by simply comparing pointers to struct
ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its
memory address, but represent a different real-server in terms of
ip-address / port.

Fixes: 39b972231536 ("ipvs: handle connections started by real-servers")
Signed-off-by: Marco Angaroni 
Acked-by: Julian Anastasov 
Signed-off-by: Simon Horman 
---
 include/net/ip_vs.h | 2 +-
 net/netfilter/ipvs/ip_vs_conn.c | 5 +++--
 net/netfilter/ipvs/ip_vs_core.c | 5 +++--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index af4c10e..cd6018a 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1232,7 +1232,7 @@ void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
 const char *ip_vs_state_name(__u16 proto, int state);
 
 void ip_vs_tcp_conn_listen(struct ip_vs_conn *cp);
-int ip_vs_check_template(struct ip_vs_conn *ct);
+int ip_vs_check_template(struct ip_vs_conn *ct, struct ip_vs_dest *cdest);
 void ip_vs_random_dropentry(struct netns_ipvs *ipvs);
 int ip_vs_conn_init(void);
 void ip_vs_conn_cleanup(void);
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 2cb3c62..096a451 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -762,7 +762,7 @@ static int expire_quiescent_template(struct netns_ipvs 
*ipvs,
  * If available, return 1, otherwise invalidate this connection
  * template and return 0.
  */
-int ip_vs_check_template(struct ip_vs_conn *ct)
+int ip_vs_check_template(struct ip_vs_conn *ct, struct ip_vs_dest *cdest)
 {
struct ip_vs_dest *dest = ct->dest;
struct netns_ipvs *ipvs = ct->ipvs;
@@ -772,7 +772,8 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
 */
if ((dest == NULL) ||
!(dest->flags & IP_VS_DEST_F_AVAILABLE) ||
-   expire_quiescent_template(ipvs, dest)) {
+   expire_quiescent_template(ipvs, dest) ||
+   (cdest && (dest != cdest))) {
IP_VS_DBG_BUF(9, "check_template: dest not available for "
  "protocol %s s:%s:%d v:%s:%d "
  "-> d:%s:%d\n",
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 1207f20..2c1b498 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -321,7 +321,7 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 
/* Check if a template already exists */
ct = ip_vs_ct_in_get();
-   if (!ct || !ip_vs_check_template(ct)) {
+   if (!ct || !ip_vs_check_template(ct, NULL)) {
struct ip_vs_scheduler *sched;
 
/*
@@ -1154,7 +1154,8 @@ struct ip_vs_conn *ip_vs_new_conn_out(struct 
ip_vs_service *svc,
  vport, ) < 0)
return NULL;
ct = ip_vs_ct_in_get();
-   if (!ct) {
+   /* check if template exists and points to the same dest */
+   if (!ct || 

[PATCH 1/2] netfilter: x_tables: don't reject valid target size on some architectures

2016-06-06 Thread Pablo Neira Ayuso
From: Florian Westphal 

Quoting John Stultz:
  In updating a 32bit arm device from 4.6 to Linus' current HEAD, I
  noticed I was having some trouble with networking, and realized that
  /proc/net/ip_tables_names was suddenly empty.
  Digging through the registration process, it seems we're catching on the:

   if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 &&
   target_offset + sizeof(struct xt_standard_target) != next_offset)
 return -EINVAL;

  Where next_offset seems to be 4 bytes larger then the
  offset + standard_target struct size.

next_offset needs to be aligned via XT_ALIGN (so we can access all members
of ip(6)t_entry struct).

This problem didn't show up on i686 as it only needs 4-byte alignment for
u64, but iptables userspace on other 32bit arches does insert extra padding.

Reported-by: John Stultz 
Tested-by: John Stultz 
Fixes: 7ed2abddd20cf ("netfilter: x_tables: check standard target size too")
Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/x_tables.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index c69c892..2675d58 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -612,7 +612,7 @@ int xt_compat_check_entry_offsets(const void *base, const 
char *elems,
return -EINVAL;
 
if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 &&
-   target_offset + sizeof(struct compat_xt_standard_target) != 
next_offset)
+   COMPAT_XT_ALIGN(target_offset + sizeof(struct 
compat_xt_standard_target)) != next_offset)
return -EINVAL;
 
/* compat_xt_entry match has less strict aligment requirements,
@@ -694,7 +694,7 @@ int xt_check_entry_offsets(const void *base,
return -EINVAL;
 
if (strcmp(t->u.user.name, XT_STANDARD_TARGET) == 0 &&
-   target_offset + sizeof(struct xt_standard_target) != next_offset)
+   XT_ALIGN(target_offset + sizeof(struct xt_standard_target)) != 
next_offset)
return -EINVAL;
 
return xt_check_entry_match(elems, base + target_offset,
-- 
2.1.4



[PATCH 0/2] Netfilter/IPVS fixes for net

2016-06-06 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains two Netfilter/IPVS fixes for your net
tree, they are:

1) Fix missing alignment in next offset calculation for standard
   targets, introduced in the previous merge window, patch from
   Florian Westphal.

2) Fix to correct the handling of outgoing connections which use the
   SIP-pe such that the binding of a real-server is updated when needed.
   This was an omission from changes introduced by Marco Angaroni in
   the previous merge window too, to allow handling of outgoing
   connections by the SIP-pe. Patch and report came via Simon Horman.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 14b84e8654c89ed59f433654e6bb64c886d095cd:

  qed: fix qed_fill_link() error handling (2016-06-01 22:04:54 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 3ec10d3a2ba591c87da94219c1e46b02ae97757a:

  ipvs: update real-server binding of outgoing connections in SIP-pe 
(2016-06-06 09:47:25 +0900)


Florian Westphal (1):
  netfilter: x_tables: don't reject valid target size on some architectures

Marco Angaroni (1):
  ipvs: update real-server binding of outgoing connections in SIP-pe

 include/net/ip_vs.h | 2 +-
 net/netfilter/ipvs/ip_vs_conn.c | 5 +++--
 net/netfilter/ipvs/ip_vs_core.c | 5 +++--
 net/netfilter/x_tables.c| 4 ++--
 4 files changed, 9 insertions(+), 7 deletions(-)


Re: [PATCH net-next] cxgb4: Reduce resource allocation in kdump kernel

2016-06-06 Thread David Miller
From: Yuval Mintz 
Date: Sat, 4 Jun 2016 13:24:43 +

>> When is_kdump_kernel() is true, reduce our memory footprint by only using a
>> single "Queue Set" and Forcing Master so we can reinitialize the 
>> Firmware/Chip.
>> 
>> Signed-off-by: Hariprasad Shenai 
> ...
>> if (q10g > netif_get_num_default_rss_queues())
>> q10g = netif_get_num_default_rss_queues();
>> 
>> +   /* Reduce memory usage in kdump environment by using only one queue
>> +* and disable all offload.
>> +*/
>> +   if (is_kdump_kernel()) {
>> +   q10g = 1;
>> +   adap->params.offload = 0;
>> +   }
>> +
> 
> Sounds like a common issue that might interest other devices as well.
> Perhaps we should change netif_get_num_default_rss_queues() to return 1
> when called from a kdump kernel?

Yeah that might make sense.


Re: [PATCH] netfilter/nflog: nflog-range does not truncate packets

2016-06-06 Thread Pablo Neira Ayuso
On Wed, Jun 01, 2016 at 08:23:54PM -0400, Vishwanath Pai wrote:
> netfilter/nflog: nflog-range does not truncate packets
> 
> The --nflog-range parameter from userspace is ignored in the kernel and
> the entire packet is sent to the userspace. The per-instance parameter
> copy_range still works, with this change --nflog-range will have
> preference over copy_range.

I think it's reasonable to assume that --nflog-range from the rule
applies globally to any instance.

However, per-instance copy_range has prevailed over --nflog-range
since the beginning, so I would follow a more conservative approach,
ie. remain copy_range in preference over --nflog-range.

So I'd suggest you invert this logic.

Let me know, thanks.


Re: [PATCH v2] soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPF

2016-06-06 Thread David Miller
From: Helge Deller 
Date: Fri, 3 Jun 2016 23:49:17 +0200

> Commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
> missed to add the compat case for the SO_ATTACH_REUSEPORT_CBPF option.
> 
> Signed-off-by: Helge Deller 

Applied.


Re: [PATCH] soreuseport: Fix reuseport_bpf testcase on 32bit architectures

2016-06-06 Thread David Miller
From: Helge Deller 
Date: Fri, 3 Jun 2016 19:19:20 +0200

> This fixes the following compiler warnings when compiling the
> reuseport_bpf testcase on a 32 bit platform:
> 
> reuseport_bpf.c: In function ‘attach_ebpf’:
> reuseport_bpf.c:114:15: warning: cast from pointer to integer of ifferent 
> size [-Wpointer-to-int-cast]
> 
> Signed-off-by: Helge Deller 

Applied.


Re: [PATCH net-next 00/15] net/smc: Shared Memory Communications - RDMA

2016-06-06 Thread David Miller
From: Ursula Braun 
Date: Fri,  3 Jun 2016 17:26:59 +0200

> It is transparent to most existing TCP connection load balancers
> that are commonly used in the enterprise data center environment for
> multi-tier application workloads.

I think a better word would be "bypass".  And likewise the data stream
bypasses our packet scheduler, netfilter, classifiers, and just about
every other interesting facility in the kernel.


Re: [PATCH -next] cbq: remove only caller of qdisc->drop()

2016-06-06 Thread Eric Dumazet
On Mon, 2016-06-06 at 23:20 +0200, Florian Westphal wrote:
> since initial revision of cbq in 2004 iproute2 never implemented
> support for TCA_CBQ_OVL_STRATEGY, which is what needs to be set to
> activate the class->drop() call (TC_CBQ_OVL_DROP strategy must be
> set by userspace).
> 
> So lets remove this.  We can even do this in a backwards compatible
> way by switching ovl_drop to perform a dequeue+drop on the leaf.
> 
> A followup commit can then remove all .drop qdisc methods since this
> was the only caller.
> 
> Signed-off-by: Florian Westphal 
> ---
>  On a related note, iproute2 doesn't support the TCA_CBQ_POLICE
>  attribute either.  If we'd remove that too we could then get rid
>  of __parent and reshape_fail in struct Qdisc.
> 
>  However, in TCA_CBQ_POLICE case there is no way to replace
>  the functionality, i.e. we'd have to -EOPNOTSUPP or ignore
>  TCA_CBQ_POLICE if some other non-iproute2 tool presents it to us.
> 
>  AFAICS TCA_CBQ_POLICE doesn't work all that well even if userspace
>  would set it, one needs to:
> 
>   add a class
>   attach a qdisc to the class (default pfifo doesn't work as
>   q->handle is 0 and cbq_set_police() is a no-op in this case)
>   re-'add' the same class (tc class change ...) again
> 
>   user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
>   this 'police' feature relies on its presence
>   the added qdisc (or the leaves) must be one of bfifo, pfifo, tbf or
>   netem as only these implement qdisc_reshape_fail() call.
> 
>   If all of these conditions are met, then instead of drop the leaf
>   qdiscs mentioned above would attempt to re-enqueue the skb in the
>   cbq TC_PRIO_BESTEFFORT class if their limit is reached.
> 
>   I think it would be safe to just ignore TCA_CBQ_POLICE and change the
>   qdiscs to drop right away.
> 
>   Does anyone have a reason to leave this in the tree?  Thanks!
> 
> diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
> index baafddf..df79791 100644
> --- a/net/sched/sch_cbq.c
> +++ b/net/sched/sch_cbq.c
> @@ -542,9 +542,14 @@ static void cbq_ovl_lowprio(struct cbq_class *cl)
>  
>  static void cbq_ovl_drop(struct cbq_class *cl)
>  {
> - if (cl->q->ops->drop)
> - if (cl->q->ops->drop(cl->q))
> - cl->qdisc->q.qlen--;
> + struct sk_buff *skb = cl->q->ops->dequeue(cl->q);
> +
> + if (skb) {
> + cl->deficit -= qdisc_pkt_len(skb);
> + cl->qdisc->q.qlen--;
> + qdisc_drop(skb, cl->qdisc);
> + }
> +
>   cl->xstats.overactions++;
>   cbq_ovl_classic(cl);
>  }

A drop() is not equivalent to a dequeue() followed by qdisc_drop() for
statistics.

dequeue() will update stats of _sent_ packets/bytes, while drop() should
not.





Re: [PATCH net-next 0/3] net: vrf: Add support for local traffic to local addresses

2016-06-06 Thread David Miller
From: David Ahern 
Date: Thu,  2 Jun 2016 13:15:09 -0700

> Add support for locally originated traffic to VRF-local addresses,
> be it addresses on enslaved devices or addresses on the VRF device:
 ...

Series applied, but I've been wondering what happens to hw offloads
when these VRF devices sit in the middle.

Does TSO et al. still occur properly?


[PATCH net] tcp: record TLP and ER timer stats in v6 stats

2016-06-06 Thread Yuchung Cheng
The v6 tcp stats scan do not provide TLP and ER timer information
correctly like the v4 version . This patch fixes that.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Fixes: eed530b6c676 ("tcp: early retransmit")
Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
---
 net/ipv6/tcp_ipv6.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 4ad8edb..d1fc453 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1707,7 +1707,9 @@ static void get_tcp6_sock(struct seq_file *seq, struct 
sock *sp, int i)
destp = ntohs(inet->inet_dport);
srcp  = ntohs(inet->inet_sport);
 
-   if (icsk->icsk_pending == ICSK_TIME_RETRANS) {
+   if (icsk->icsk_pending == ICSK_TIME_RETRANS ||
+   icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
+   icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
timer_active= 1;
timer_expires   = icsk->icsk_timeout;
} else if (icsk->icsk_pending == ICSK_TIME_PROBE0) {
-- 
2.8.0.rc3.226.g39d4020



[PATCH -next] cbq: remove only caller of qdisc->drop()

2016-06-06 Thread Florian Westphal
since initial revision of cbq in 2004 iproute2 never implemented
support for TCA_CBQ_OVL_STRATEGY, which is what needs to be set to
activate the class->drop() call (TC_CBQ_OVL_DROP strategy must be
set by userspace).

So lets remove this.  We can even do this in a backwards compatible
way by switching ovl_drop to perform a dequeue+drop on the leaf.

A followup commit can then remove all .drop qdisc methods since this
was the only caller.

Signed-off-by: Florian Westphal 
---
 On a related note, iproute2 doesn't support the TCA_CBQ_POLICE
 attribute either.  If we'd remove that too we could then get rid
 of __parent and reshape_fail in struct Qdisc.

 However, in TCA_CBQ_POLICE case there is no way to replace
 the functionality, i.e. we'd have to -EOPNOTSUPP or ignore
 TCA_CBQ_POLICE if some other non-iproute2 tool presents it to us.

 AFAICS TCA_CBQ_POLICE doesn't work all that well even if userspace
 would set it, one needs to:

  add a class
  attach a qdisc to the class (default pfifo doesn't work as
  q->handle is 0 and cbq_set_police() is a no-op in this case)
  re-'add' the same class (tc class change ...) again

  user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
  this 'police' feature relies on its presence
  the added qdisc (or the leaves) must be one of bfifo, pfifo, tbf or
  netem as only these implement qdisc_reshape_fail() call.

  If all of these conditions are met, then instead of drop the leaf
  qdiscs mentioned above would attempt to re-enqueue the skb in the
  cbq TC_PRIO_BESTEFFORT class if their limit is reached.

  I think it would be safe to just ignore TCA_CBQ_POLICE and change the
  qdiscs to drop right away.

  Does anyone have a reason to leave this in the tree?  Thanks!

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index baafddf..df79791 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -542,9 +542,14 @@ static void cbq_ovl_lowprio(struct cbq_class *cl)
 
 static void cbq_ovl_drop(struct cbq_class *cl)
 {
-   if (cl->q->ops->drop)
-   if (cl->q->ops->drop(cl->q))
-   cl->qdisc->q.qlen--;
+   struct sk_buff *skb = cl->q->ops->dequeue(cl->q);
+
+   if (skb) {
+   cl->deficit -= qdisc_pkt_len(skb);
+   cl->qdisc->q.qlen--;
+   qdisc_drop(skb, cl->qdisc);
+   }
+
cl->xstats.overactions++;
cbq_ovl_classic(cl);
 }
-- 
2.7.3



Re: [PATCH] netlabel: add address family checks to netlbl_{sock, req}_delattr()

2016-06-06 Thread David Miller
From: Paul Moore 
Date: Mon, 6 Jun 2016 15:37:56 -0400

> On Mon, Jun 6, 2016 at 3:35 PM, Paul Moore  wrote:
>> From: Paul Moore 
>>
>> It seems risky to always rely on the caller to ensure the socket's
>> address family is correct before passing it to the NetLabel kAPI,
>> especially since we see at least one LSM which didn't. Add address
>> family checks to the *_delattr() functions to help prevent future
>> problems.
>>
>> Cc: 
>> Reported-by: Maninder Singh 
>> Signed-off-by: Paul Moore 
>> ---
>>  net/netlabel/netlabel_kapi.c |   12 ++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> DaveM, since this is such a trivial fix I'm adding it into my
> selinux#next branch right now, but if you would prefer to carry it via
> netdev#next let me know.

That's fine.


[PATCH] brcmfmac: drop unused pm_block vif attribute

2016-06-06 Thread Rafał Miłecki
This attribute was added 3 years ago by
commit 3eacf866559c ("brcmfmac: introduce brcmf_cfg80211_vif structure")
but it remains unused since then. It seems we can safely drop it.

Signed-off-by: Rafał Miłecki 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 9 +++--
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.h | 5 +
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c  | 5 ++---
 3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index ce35ada..4894eb7 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -603,7 +603,7 @@ struct wireless_dev *brcmf_ap_add_vif(struct wiphy *wiphy, 
const char *name,
 
brcmf_dbg(INFO, "Adding vif \"%s\"\n", name);
 
-   vif = brcmf_alloc_vif(cfg, NL80211_IFTYPE_AP, false);
+   vif = brcmf_alloc_vif(cfg, NL80211_IFTYPE_AP);
if (IS_ERR(vif))
return (struct wireless_dev *)vif;
 
@@ -5190,8 +5190,7 @@ static struct cfg80211_ops brcmf_cfg80211_ops = {
 };
 
 struct brcmf_cfg80211_vif *brcmf_alloc_vif(struct brcmf_cfg80211_info *cfg,
-  enum nl80211_iftype type,
-  bool pm_block)
+  enum nl80211_iftype type)
 {
struct brcmf_cfg80211_vif *vif_walk;
struct brcmf_cfg80211_vif *vif;
@@ -5206,8 +5205,6 @@ struct brcmf_cfg80211_vif *brcmf_alloc_vif(struct 
brcmf_cfg80211_info *cfg,
vif->wdev.wiphy = cfg->wiphy;
vif->wdev.iftype = type;
 
-   vif->pm_block = pm_block;
-
brcmf_init_prof(>profile);
 
if (type == NL80211_IFTYPE_AP) {
@@ -6850,7 +6847,7 @@ struct brcmf_cfg80211_info *brcmf_cfg80211_attach(struct 
brcmf_pub *drvr,
init_vif_event(>vif_event);
INIT_LIST_HEAD(>vif_list);
 
-   vif = brcmf_alloc_vif(cfg, NL80211_IFTYPE_STATION, false);
+   vif = brcmf_alloc_vif(cfg, NL80211_IFTYPE_STATION);
if (IS_ERR(vif))
goto wiphy_out;
 
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.h 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.h
index 95e35bc..c6f9986 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.h
@@ -167,7 +167,6 @@ struct vif_saved_ie {
  * @wdev: wireless device.
  * @profile: profile information.
  * @sme_state: SME state using enum brcmf_vif_status bits.
- * @pm_block: power-management blocked.
  * @list: linked list.
  * @mgmt_rx_reg: registered rx mgmt frame types.
  * @mbss: Multiple BSS type, set if not first AP (not relevant for P2P).
@@ -177,7 +176,6 @@ struct brcmf_cfg80211_vif {
struct wireless_dev wdev;
struct brcmf_cfg80211_profile profile;
unsigned long sme_state;
-   bool pm_block;
struct vif_saved_ie saved_ie;
struct list_head list;
u16 mgmt_rx_reg;
@@ -388,8 +386,7 @@ s32 brcmf_cfg80211_down(struct net_device *ndev);
 enum nl80211_iftype brcmf_cfg80211_get_iftype(struct brcmf_if *ifp);
 
 struct brcmf_cfg80211_vif *brcmf_alloc_vif(struct brcmf_cfg80211_info *cfg,
-  enum nl80211_iftype type,
-  bool pm_block);
+  enum nl80211_iftype type);
 void brcmf_free_vif(struct brcmf_cfg80211_vif *vif);
 
 s32 brcmf_vif_set_mgmt_ie(struct brcmf_cfg80211_vif *vif, s32 pktflag,
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
index 1652a48..fd49cdf 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/p2p.c
@@ -2076,8 +2076,7 @@ static struct wireless_dev 
*brcmf_p2p_create_p2pdev(struct brcmf_p2p_info *p2p,
if (p2p->bss_idx[P2PAPI_BSSCFG_DEVICE].vif)
return ERR_PTR(-ENOSPC);
 
-   p2p_vif = brcmf_alloc_vif(p2p->cfg, NL80211_IFTYPE_P2P_DEVICE,
- false);
+   p2p_vif = brcmf_alloc_vif(p2p->cfg, NL80211_IFTYPE_P2P_DEVICE);
if (IS_ERR(p2p_vif)) {
brcmf_err("could not create discovery vif\n");
return (struct wireless_dev *)p2p_vif;
@@ -2177,7 +2176,7 @@ struct wireless_dev *brcmf_p2p_add_vif(struct wiphy 
*wiphy, const char *name,
return ERR_PTR(-EOPNOTSUPP);
}
 
-   vif = brcmf_alloc_vif(cfg, type, false);
+   vif = brcmf_alloc_vif(cfg, type);
if (IS_ERR(vif))
return (struct wireless_dev *)vif;
brcmf_cfg80211_arm_vif_event(cfg, vif);
-- 
1.8.4.5



Re: [PATCH] net: stmmac: dwmac-rk: keep PHY up for WoL

2016-06-06 Thread Vincent Palatin
On Mon, Jun 6, 2016 at 1:45 PM, Heiko Stübner  wrote:
> Hi,
>
> Am Freitag, 3. Juni 2016, 10:29:20 schrieb Vincent Palatin:
>> Do not shutdown the PHY if Wake-on-Lan is enabled, else it cannot wake
>> us up.
>>
>> Signed-off-by: Vincent Palatin 
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
>> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 0cd3ecf..2e45e75
>> 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
>> @@ -534,6 +534,10 @@ static int rk_gmac_init(struct platform_device *pdev,
>> void *priv) struct rk_priv_data *bsp_priv = priv;
>>   int ret;
>>
>> + /* Keep the PHY up if we use Wake-on-Lan. */
>> + if (device_may_wakeup(>dev))
>> + return 0;
>> +
>
> Hmm, this looks like it would also block the initial setup of clocks and phy?

Yes, that's bad. Doug told me so but I forget to CC him on the
previous submission.
I will do another version.

> platform_device + device struct are created before probe gets called, so
> something could set the wakeup flag before the driver initially probes?

The device tree 'wakeup' attribute likely does it.

-- 
Vincent


[PATCH net] net: sched: fix tc_should_offload for specific clsact classes

2016-06-06 Thread Daniel Borkmann
When offloading classifiers such as u32 or flower to hardware, and the
qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes,
since not all of them handle ingress, therefore we must leave those in
software path. Add a .tcf_cl_offload() callback, so we can generically
handle them, tested on ixgbe.

Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters 
statistics support")
Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support")
Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
Signed-off-by: Daniel Borkmann 
---
 include/net/pkt_cls.h | 10 +++---
 include/net/sch_generic.h |  1 +
 net/sched/cls_flower.c|  6 +++---
 net/sched/cls_u32.c   |  8 
 net/sched/sch_ingress.c   | 12 
 5 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 0f7efa8..3722dda 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -392,16 +392,20 @@ struct tc_cls_u32_offload {
};
 };
 
-static inline bool tc_should_offload(struct net_device *dev, u32 flags)
+static inline bool tc_should_offload(const struct net_device *dev,
+const struct tcf_proto *tp, u32 flags)
 {
+   const struct Qdisc *sch = tp->q;
+   const struct Qdisc_class_ops *cops = sch->ops->cl_ops;
+
if (!(dev->features & NETIF_F_HW_TC))
return false;
-
if (flags & TCA_CLS_FLAGS_SKIP_HW)
return false;
-
if (!dev->netdev_ops->ndo_setup_tc)
return false;
+   if (cops && cops->tcf_cl_offload)
+   return cops->tcf_cl_offload(tp->classid);
 
return true;
 }
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a1fd76c..6a01fc5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -168,6 +168,7 @@ struct Qdisc_class_ops {
 
/* Filter manipulation */
struct tcf_proto __rcu ** (*tcf_chain)(struct Qdisc *, unsigned long);
+   bool(*tcf_cl_offload)(u32 classid);
unsigned long   (*bind_tcf)(struct Qdisc *, unsigned long,
u32 classid);
void(*unbind_tcf)(struct Qdisc *, unsigned long);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 730aaca..b3b7978 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -171,7 +171,7 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, 
unsigned long cookie)
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, 0))
+   if (!tc_should_offload(dev, tp, 0))
return;
 
offload.command = TC_CLSFLOWER_DESTROY;
@@ -194,7 +194,7 @@ static void fl_hw_replace_filter(struct tcf_proto *tp,
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, flags))
+   if (!tc_should_offload(dev, tp, flags))
return;
 
offload.command = TC_CLSFLOWER_REPLACE;
@@ -216,7 +216,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
struct tc_cls_flower_offload offload = {0};
struct tc_to_netdev tc;
 
-   if (!tc_should_offload(dev, 0))
+   if (!tc_should_offload(dev, tp, 0))
return;
 
offload.command = TC_CLSFLOWER_STATS;
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 079b43b..a63272c 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -440,7 +440,7 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, u32 
handle)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (tc_should_offload(dev, 0)) {
+   if (tc_should_offload(dev, tp, 0)) {
offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
offload.cls_u32->knode.handle = handle;
dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
@@ -460,7 +460,7 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp,
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (tc_should_offload(dev, flags)) {
+   if (tc_should_offload(dev, tp, flags)) {
offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
offload.cls_u32->hnode.divisor = h->divisor;
offload.cls_u32->hnode.handle = h->handle;
@@ -484,7 +484,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct 
tc_u_hnode *h)
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
 
-   if (tc_should_offload(dev, 0)) {
+   if (tc_should_offload(dev, tp, 0)) {
offload.cls_u32->command = TC_CLSU32_DELETE_HNODE;
offload.cls_u32->hnode.divisor = h->divisor;
offload.cls_u32->hnode.handle = h->handle;
@@ -507,7 +507,7 @@ static int 

Re: [PATCH] net: stmmac: dwmac-rk: keep PHY up for WoL

2016-06-06 Thread Heiko Stübner
Hi,

Am Freitag, 3. Juni 2016, 10:29:20 schrieb Vincent Palatin:
> Do not shutdown the PHY if Wake-on-Lan is enabled, else it cannot wake
> us up.
> 
> Signed-off-by: Vincent Palatin 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 0cd3ecf..2e45e75
> 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
> @@ -534,6 +534,10 @@ static int rk_gmac_init(struct platform_device *pdev,
> void *priv) struct rk_priv_data *bsp_priv = priv;
>   int ret;
> 
> + /* Keep the PHY up if we use Wake-on-Lan. */
> + if (device_may_wakeup(>dev))
> + return 0;
> +

Hmm, this looks like it would also block the initial setup of clocks and phy?
platform_device + device struct are created before probe gets called, so 
something could set the wakeup flag before the driver initially probes?


Confused,
Heiko


linux-next: UBSAN whine and BUG in net/ipv4/fib_trie.c

2016-06-06 Thread Valdis Kletnieks
Seeing this in next-20160606 (next-20160530 is fine), does it ring
any bells before I spend a long evening doing a bisect?  The Google
doesn't seem to have seen this traceback in the past week

[  226.938222] 

[  226.938231] UBSAN: Undefined behaviour in net/ipv4/fib_trie.c:1573:14
[  226.938235] shift exponent 136 is too large for 64-bit type 'long unsigned 
int'

[  226.938403] 

[  226.938406] UBSAN: Undefined behaviour in net/ipv4/fib_trie.c:1589:22
[  226.938409] shift exponent 136 is too large for 64-bit type 'long unsigned 
int'

[  226.938434] Call Trace:
[  226.938437]  [] dump_stack+0x7b/0xd1
[  226.938441]  [] ubsan_epilogue+0xd/0x40
[  226.938445]  [] 
__ubsan_handle_shift_out_of_bounds+0xf9/0x150
[  226.938449]  [] ? cpuacct_account_field+0x251/0x2b0
[  226.938453]  [] ? bh_lru_install+0x244/0x2c0
[  226.938456]  [] leaf_walk_rcu+0x302/0x440
[  226.938460]  [] fib_table_dump+0x6b/0x440
[  226.938464]  [] ? inet_dump_fib+0x74/0x370
[  226.938468]  [] inet_dump_fib+0x142/0x370
[  226.938471]  [] ? inet_dump_fib+0x74/0x370
[  226.938475]  [] rtnl_dump_all+0x12c/0x350
[  226.938479]  [] ? __alloc_skb+0x96/0x2c0
[  226.938482]  [] netlink_dump+0x174/0x3e0
[  226.938486]  [] __netlink_dump_start+0x190/0x240
[  226.938490]  [] rtnetlink_rcv_msg+0x1c0/0x640
[  226.938493]  [] ? trace_hardirqs_on_caller+0x16/0x2c0
[  226.938497]  [] ? fdb_vid_parse+0x90/0x90
[  226.938500]  [] ? fdb_vid_parse+0x90/0x90
[  226.938504]  [] ? rtnl_link_unregister+0x140/0x140
[  226.938508]  [] netlink_rcv_skb+0x87/0xc0
[  226.938511]  [] rtnetlink_rcv+0x2a/0x40
[  226.938515]  [] netlink_unicast+0x200/0x300
[  226.938518]  [] netlink_sendmsg+0x402/0x670
[  226.938523]  [] sock_sendmsg+0x5b/0xd0
[  226.938526]  [] SYSC_sendto+0x153/0x1f0
[  226.938531]  [] ? selinux_socket_setsockopt+0x45/0x60
[  226.938535]  [] ? entry_SYSCALL_64_fastpath+0x5/0xa8
[  226.938538]  [] ? trace_hardirqs_on_caller+0x16/0x2c0
[  226.938541]  [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[  226.938545]  [] SyS_sendto+0xe/0x10
[  226.938549]  [] entry_SYSCALL_64_fastpath+0x18/0xa8
[  226.938553]  [] ? trace_hardirqs_off_caller+0x1f/0xf0

followed by a not-surprising BUG while we pagefault because we went off
the deep end:

[  226.938555] 

[  226.938559] BUG: sleeping function called from invalid context at 
arch/x86/mm/fault.c:1309
[  226.938563] in_atomic(): 0, irqs_disabled(): 0, pid: 4577, name: geoclue
[  226.938565] INFO: lockdep is turned off.

[  226.938591] Call Trace:
[  226.938595]  [] dump_stack+0x7b/0xd1
[  226.938599]  [] ___might_sleep+0x196/0x2f0
[  226.938603]  [] __might_sleep+0x65/0x1f0
[  226.938607]  [] __do_page_fault+0x5b6/0x7d0
[  226.938611]  [] do_page_fault+0xc/0x10
[  226.938614]  [] page_fault+0x22/0x30
[  226.938619]  [] ? leaf_walk_rcu+0x195/0x440
[  226.938622]  [] ? leaf_walk_rcu+0x175/0x440
[  226.938626]  [] fib_table_dump+0x6b/0x440
[  226.938630]  [] ? inet_dump_fib+0x74/0x370
[  226.938633]  [] inet_dump_fib+0x142/0x370
[  226.938637]  [] ? inet_dump_fib+0x74/0x370
[  226.938641]  [] rtnl_dump_all+0x12c/0x350
[  226.938644]  [] ? __alloc_skb+0x96/0x2c0
[  226.938648]  [] netlink_dump+0x174/0x3e0
[  226.938651]  [] __netlink_dump_start+0x190/0x240
[  226.938655]  [] rtnetlink_rcv_msg+0x1c0/0x640
[  226.938658]  [] ? trace_hardirqs_on_caller+0x16/0x2c0
[  226.938662]  [] ? fdb_vid_parse+0x90/0x90
[  226.938666]  [] ? fdb_vid_parse+0x90/0x90
[  226.938669]  [] ? rtnl_link_unregister+0x140/0x140
[  226.938673]  [] netlink_rcv_skb+0x87/0xc0
[  226.938677]  [] rtnetlink_rcv+0x2a/0x40
[  226.938680]  [] netlink_unicast+0x200/0x300
[  226.938684]  [] netlink_sendmsg+0x402/0x670
[  226.938688]  [] sock_sendmsg+0x5b/0xd0
[  226.938692]  [] SYSC_sendto+0x153/0x1f0
[  226.938696]  [] ? selinux_socket_setsockopt+0x45/0x60
[  226.938700]  [] ? entry_SYSCALL_64_fastpath+0x5/0xa8
[  226.938703]  [] ? trace_hardirqs_on_caller+0x16/0x2c0
[  226.938706]  [] ? trace_hardirqs_on_thunk+0x1a/0x1c
[  226.938710]  [] SyS_sendto+0xe/0x10
[  226.938714]  [] entry_SYSCALL_64_fastpath+0x18/0xa8
[  226.938718]  [] ? trace_hardirqs_off_caller+0x1f/0xf0

and then the wheels come totally off the bus:

[  226.938728] BUG: unable to handle kernel paging request at 000f6105
[  226.938733] IP: [] leaf_walk_rcu+0x195/0x440
[  226.938738] PGD 0
[  226.938742] Oops:  [#1] PREEMPT SMP

[  226.938845] Call Trace:
[  226.938849]  [] fib_table_dump+0x6b/0x440
[  226.938853]  [] ? inet_dump_fib+0x74/0x370
[  226.938857]  [] inet_dump_fib+0x142/0x370
[  226.938860]  [] ? inet_dump_fib+0x74/0x370
[  226.938864]  [] rtnl_dump_all+0x12c/0x350
[  226.938867]  [] ? __alloc_skb+0x96/0x2c0
[  226.938871]  [] netlink_dump+0x174/0x3e0
[  226.938874]  [] __netlink_dump_start+0x190/0x240
[  226.938878

Re: [PATCH net-next] net, cls: allow for deleting all filters for given parent

2016-06-06 Thread Daniel Borkmann

On 06/06/2016 09:52 PM, Cong Wang wrote:

On Mon, Jun 6, 2016 at 12:25 PM, Daniel Borkmann  wrote:

On 06/06/2016 07:12 PM, Cong Wang wrote:


On Sat, Jun 4, 2016 at 9:24 AM, Daniel Borkmann 
wrote:


+   if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) {
+   tcf_destroy_chain(chain);
+   err = 0;
+   goto errout;
+   }


We need to notify users we removed which filters, right?


As far as I know, most such use cases that listen on this are bypasses
that mirror kernel configs from user space ... but well, sure, I can add
a notification if people care. Would do this as a separate patch.


This is fundamental for libnl to update caches.


I see, makes sense then. Thanks!


I don't understand why it should be separated, since notification is
not a feature, we already have notifications in other paths.


Looking into this, I would probably make this a single notification that
denotes this 'wild-card' removal for that parent instead of calling
tfilter_notify() for each filter separately (which allocs skb, dumps it,
etc), qdisc del doesn't loop through it either, so probably fine this way.


Makes sense.

Thanks.





Re: [PATCH net-next 2/9] net: dsa: Add support for parsing the old binding

2016-06-06 Thread Florian Fainelli
On 06/05/2016 08:19 PM, Andrew Lunn wrote:
>> How much support do we want to have for the old binding for in tree
>> platforms? Is the plan to migrate them all to the new binding?
> 
> I think there are three cases to consider.
> 
> 1) There are some old boards using setup.c files which have a platform
>device, platform data, etc. I've never used DSA in this way, and it
>could be all the recent additions have broken this. We might want
>to test this, and if it is in fact broken, and has been for a
>while, it indicates nobody uses those boards any more. We might
>suggest removing them. Even if they do work, i doubt anybody is
>interested in converting them to device tree. So we might have to
>keep the platform data support around.

We had a report a while ago of breakage, which got addressed and fixed
upstream, so if it breaks again, it will get fixed again.

> 
> 2) In tree devices using the DT binding. We can update them all to the
>new binding. The kirkwood boards don't have a u-boot which is DT
>capable. Some of the armada boards do have a DT capable uboot, but
>all these boards have been added by the community, so i suspect
>they are not flashed never to be changed again.
> 
> 3) Out of tree devices using the DT binding. As far as i can see,
>there is no in three board actually using the Broadcom SF2 driver
>and its odd binding. However from talking to you, i know there are
>devices out in the wild using this binding, and their DT blob is
>fixed, never to be changed again.

The concept of an "in-tree" board does not make much sense once the
bootloader provides a blob to the kernel, and synchronizing the Device
Tree sources with what a bootloader provides is just a pain with no
reward as long as the binding remains standard and works.

> 
> It actually seems odd to me that we have a nice new binding and an
> implement which is reasonably clean, and we want to add code to
> support a legacy binding for an out of tree board.
> 
> I need to think on this for a while. However, i don't see the old code
> and binding going away anytime soon. It will take a few cycles to
> determine if the old platform device/platform data still works, and to
> remove the old boards if not. We can update the in tree devices to the
> new binding, but we should keep the old binding a while to aid the
> transition.

I do not see the need for platform data going away actually, there are
tons of devices out there that are not supported using Device Tree, yet
feature Ethernet switches that could easily be supported would we want
to add support for that, and clearly an answer along the lines of let's
add Device Tree support for these platforms is not going to fly.

> 
> I'm tempted to say you should keep using the old code to support your
> out of tree devices. You should define a new binding for SF2 which
> conforms to the device tree binding document which just got accepted,
> and add it to SF 2 alongside the legacy binding. And it would be great
> if you could go the last step and actually add a boards device tree
> file using it.

I suppose I could do that.

> 
> I'm hesitant to add legacy binding support for SF2 to the new DSA2
> code. We should try to keep it free of cruft, and set a good example
> for others to follow when they bring along there new drivers.

What if this code was moved to the bcm_sf2.c where it matters and there
is just the bottom part of dsa_register_switch() exposed instead?
-- 
Florian


Re: [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns

2016-06-06 Thread Eric W. Biederman
Nikolay Borisov  writes:

> On 06/03/2016 11:41 PM, Eric W. Biederman wrote:
>> Nikolay Borisov  writes:
>> 
>>> On 06/02/2016 07:58 PM, Eric W. Biederman wrote:

 Nikolay please see my question for you at the end.
>> [snip] 
 All of that said there is definitely a practical question that needs to
 be asked.  Nikolay how did you get into this situation?  A typical user
 namespace configuration will set up uid and gid maps with the help of a
 privileged program and not map the uid of the user who created the user
 namespace.  Thus avoiding exhausting the limits of the user who created
 the container.
>>>
>>> Right but imagine having multiple containers with identical uid/gid maps
>>> for LXC-based setups imagine this:
>>>
>>> lxc.id_map = u 0 1337 65536
>> 
>> So I am only moderately concerned when the containers have overlapping
>> ids.  Because at some level overlapping ids means they are the same
>> user.  This is certainly true for file permissions and for other
>> permissions.  To isolate one container from another it fundamentally
>> needs to have separate uids and gids on the host system.
>> 
>>> Now all processes which are running with the same user on different
>>> containers will actually share the underlying user_struct thus the
>>> inotify limits. In such cases even running multiple instances of 'tail'
>>> in one container will eventually use all allowed inotify/mark instances.
>>> For this to happen you needn't also have complete overlap of the uid
>>> map, it's enough to have at least one UID between 2 containers overlap.
>>>
>>>
>>> So the risk of exhaustion doesn't apply to the privileged user that
>>> created the container and the uid mapping, but rather the users under
>>> which the various processes in the container are running. Does that make
>>> it clear?
>> 
>> Yes.  That is clear.
>> 
 Which makes me personally more worried about escaping the existing
 limits than exhausting the limits of a particular user.
>>>
>>> So I thought bit about it and I guess a solution can be concocted which
>>> utilize the hierarchical nature of page counter, and the inotify limits
>>> are set per namespace if you have capable(CAP_SYS_ADMIN). That way the
>>> admin can set one fairly large on the init_user_ns and then in every
>>> namespace created one can set smaller limits. That way for a branch in
>>> the tree (in the nomenclature you used in your previous reply to me) you
>>> will really be upper-bound to the limit set in the namespace which have
>>> ->level = 1. For the width of the tree, you will be bound by the
>>> "global" init_user_ns limits. How does that sound?
>> 
>> As a addendum to that design.  I think there should be an additional
>> sysctl or two that specifies how much the limit decreases when creating
>> a new user namespace and when creating a new user in that user
>> namespace.  That way with a good selection of limits and a limit
>> decrease people can use the kernel defaults without needing to change
>> them.
>
> I agree that a sysctl which controls how the limits are set for new
> namespaces is a good idea. I think it's best if this is in % rather than
> some absolute value. Also I'm not sure about the sysctl when a user is
> added in a namespace since just adding a new user should fall under the
> limits of the current userns.

My hunch is that a reserve per namespace as an absolute number will be
easier to implement and analyze but I don't much care.

I meant that we have a tree where we track created inotify things
that looks like:

uns0:
  +-//\\--+
 /   /--/  \\  \
   user1  user2user3  user4
   +---//\\+
  / /--/  \---\ \
uns1  uns2   uns3  uns4
  +---//\\-+
 //---/  \---\  \
  user5 user6   user7  user8


Allowing a hierarchical tracking of things per user and per user
namespace.

The limits programed with the sysctl would look something like they do
today.
  
> Also should those sysctls be global or should they be per-namespace? At
> this point I'm more inclined to have global sysctl and maybe refine it
> in the future if the need arises?

I think at the end of the day per-namespace is interesting.  We
certainly need to track the values as if they were per namespace.

However given that this should be a setup and forget kind of operation
we don't need to worry about how to implement the sysctl settings as per
namespace in the until everything else is sorted.  

>> Having default settings that are good enough 99% of the time and that
>> people don't need to tune, would be my biggest requirement (aside from
>> being light-weight) for merging something like this.
>> 
>> If things are set and forget and even the continer case does not need to
>> be aware then I think we have a design sufficiently 

Re: [PATCH net-next] net, cls: allow for deleting all filters for given parent

2016-06-06 Thread Cong Wang
On Mon, Jun 6, 2016 at 12:25 PM, Daniel Borkmann  wrote:
> On 06/06/2016 07:12 PM, Cong Wang wrote:
>>
>> On Sat, Jun 4, 2016 at 9:24 AM, Daniel Borkmann 
>> wrote:
>>>
>>> +   if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) {
>>> +   tcf_destroy_chain(chain);
>>> +   err = 0;
>>> +   goto errout;
>>> +   }
>>
>>
>> We need to notify users we removed which filters, right?
>
>
> As far as I know, most such use cases that listen on this are bypasses
> that mirror kernel configs from user space ... but well, sure, I can add
> a notification if people care. Would do this as a separate patch.

This is fundamental for libnl to update caches.

I don't understand why it should be separated, since notification is
not a feature, we already have notifications in other paths.

>
> Looking into this, I would probably make this a single notification that
> denotes this 'wild-card' removal for that parent instead of calling
> tfilter_notify() for each filter separately (which allocs skb, dumps it,
> etc), qdisc del doesn't loop through it either, so probably fine this way.

Makes sense.

Thanks.


Re: [PATCH] netlabel: add address family checks to netlbl_{sock, req}_delattr()

2016-06-06 Thread Paul Moore
On Mon, Jun 6, 2016 at 3:35 PM, Paul Moore  wrote:
> From: Paul Moore 
>
> It seems risky to always rely on the caller to ensure the socket's
> address family is correct before passing it to the NetLabel kAPI,
> especially since we see at least one LSM which didn't. Add address
> family checks to the *_delattr() functions to help prevent future
> problems.
>
> Cc: 
> Reported-by: Maninder Singh 
> Signed-off-by: Paul Moore 
> ---
>  net/netlabel/netlabel_kapi.c |   12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

DaveM, since this is such a trivial fix I'm adding it into my
selinux#next branch right now, but if you would prefer to carry it via
netdev#next let me know.

> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 1325776..bd007a9 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -824,7 +824,11 @@ socket_setattr_return:
>   */
>  void netlbl_sock_delattr(struct sock *sk)
>  {
> -   cipso_v4_sock_delattr(sk);
> +   switch (sk->sk_family) {
> +   case AF_INET:
> +   cipso_v4_sock_delattr(sk);
> +   break;
> +   }
>  }
>
>  /**
> @@ -987,7 +991,11 @@ req_setattr_return:
>  */
>  void netlbl_req_delattr(struct request_sock *req)
>  {
> -   cipso_v4_req_delattr(req);
> +   switch (req->rsk_ops->family) {
> +   case AF_INET:
> +   cipso_v4_req_delattr(req);
> +   break;
> +   }
>  }
>
>  /**
>

-- 
paul moore
security @ redhat


[PATCH] netlabel: add address family checks to netlbl_{sock, req}_delattr()

2016-06-06 Thread Paul Moore
From: Paul Moore 

It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.

Cc: 
Reported-by: Maninder Singh 
Signed-off-by: Paul Moore 
---
 net/netlabel/netlabel_kapi.c |   12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
index 1325776..bd007a9 100644
--- a/net/netlabel/netlabel_kapi.c
+++ b/net/netlabel/netlabel_kapi.c
@@ -824,7 +824,11 @@ socket_setattr_return:
  */
 void netlbl_sock_delattr(struct sock *sk)
 {
-   cipso_v4_sock_delattr(sk);
+   switch (sk->sk_family) {
+   case AF_INET:
+   cipso_v4_sock_delattr(sk);
+   break;
+   }
 }
 
 /**
@@ -987,7 +991,11 @@ req_setattr_return:
 */
 void netlbl_req_delattr(struct request_sock *req)
 {
-   cipso_v4_req_delattr(req);
+   switch (req->rsk_ops->family) {
+   case AF_INET:
+   cipso_v4_req_delattr(req);
+   break;
+   }
 }
 
 /**



Re: [PATCH net-next] net, cls: allow for deleting all filters for given parent

2016-06-06 Thread Daniel Borkmann

On 06/06/2016 07:12 PM, Cong Wang wrote:

On Sat, Jun 4, 2016 at 9:24 AM, Daniel Borkmann  wrote:

+   if (n->nlmsg_type == RTM_DELTFILTER && prio == 0) {
+   tcf_destroy_chain(chain);
+   err = 0;
+   goto errout;
+   }


We need to notify users we removed which filters, right?


As far as I know, most such use cases that listen on this are bypasses
that mirror kernel configs from user space ... but well, sure, I can add
a notification if people care. Would do this as a separate patch.

Looking into this, I would probably make this a single notification that
denotes this 'wild-card' removal for that parent instead of calling
tfilter_notify() for each filter separately (which allocs skb, dumps it,
etc), qdisc del doesn't loop through it either, so probably fine this way.

Thanks,
Daniel


Re: [EDT][Patch 1/1] socket family check in netlabel APIs

2016-06-06 Thread Paul Moore
On Thu, May 7, 2015 at 1:53 AM, Maninder Singh  wrote:
> EP-E68D5E24548545C9BBB607A98ADD61E6
>
> Hi Paul,
>
>>On Monday, March 30, 2015 11:09:00 AM Maninder Singh wrote:
>>> Dear All,
>>> we found One Kernel Crash issue in cipso_v4_sock_delattr :-
>>> As Cipso supports only inet sockets so cipso_v4_sock_delattr will crash when
>>> try to access any other socket type.  cipso_v4_sock_delattr access
>>> sk_inet->inet_opt which may contain not NULL but invalid address. we found
>>> this issue with netlink socket.(reproducible by trinity using sendto system
>>> call .)
>
>>Hello,
>
>>First, please go read the Documentation/SubmittingPatches from the kernel
>>sources; your patch needs to be resubmitted and the instructions in that file
>>will show you how to do it correctly next time.
>
>>Second, this appears to only affect Smack based systems, yes?  SELinux based
>>systems should have the proper checking in place to prevent this (the checks
>>are handled in the LSM).  That said, it probably wouldn't hurt to add the
>>extra checking to netlbl_sock_delattr().  If you properly resubmit your patch
>>I'll ACK it.
>
>>-Paul
>
>>--
>>paul moore
>>www.paul-moore.com
>
> As suggested resubmitting the patch .

I was delayed in responding because your patch is still not in a form
that makes it easy to merge/review upstream, it appears to be MIME
encoded and not in plain text.  You should be able to save your raw
email message and apply it directly to the kernel source tree using
the patch command.

When you send MIME encoded emails with patches, we have to apply the
patches manually, line by line, which is both time consuming and error
prone.

-- 
paul moore
www.paul-moore.com


Re: [PATCH net 3/3] RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()

2016-06-06 Thread Santosh Shilimkar

On 6/4/2016 2:00 PM, Sowmini Varadhan wrote:

The send path needs to be quiesced before resetting callbacks from
rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize
rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves
this using the c_state and RDS_IN_XMIT bit following the pattern
used by rds_conn_shutdown(). However this leaves the possibility
of a race window as shown in the sequence below
take t_conn_lock in rds_tcp_conn_connect
send outgoing syn to peer
drop t_conn_lock in rds_tcp_conn_connect
incoming from peer triggers rds_tcp_accept_one, conn is
marked CONNECTING
wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads
call rds_tcp_reset_callbacks
[.. race-window where incoming syn-ack can cause the conn
to be marked UP from rds_tcp_state_change ..]
lock_sock called from rds_tcp_reset_callbacks, and we set
t_sock to null
As soon as the conn is marked UP in the race-window above, rds_send_xmit()
threads will proceed to rds_tcp_xmit and may encounter a null-pointer
deref on the t_sock.

Given that rds_tcp_state_change() is invoked in softirq context, whereas
rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT
after lock_sock could result in a deadlock with tcp_sendmsg, this
commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which
will prevent a transition to RDS_CONN_UP from rds_tcp_state_change().

Signed-off-by: Sowmini Varadhan 
---

As we discussed off-list, for immediate fix, this patch is fine.
Dual sync/ack issue we have 'is_outgoing' and now 'RDS_TCP_RESETTING'
so will be good to address that later.

Acked-by: Santosh Shilimkar 




Re: [PATCH net 2/3] RDS: TCP: Retransmit half-sent datagrams when switching sockets in rds_tcp_reset_callbacks

2016-06-06 Thread Santosh Shilimkar

On 6/4/2016 1:59 PM, Sowmini Varadhan wrote:

When we switch a connection's sockets in rds_tcp_rest_callbacks,
any partially sent datagram must be retransmitted on the new
socket so that the receiver can correctly reassmble the RDS
datagram. Use rds_send_reset() which is designed for this purpose.

Signed-off-by: Sowmini Varadhan 
---

Acked-by: Santosh Shilimkar 


Re: [PATCH net 1/3] RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safely

2016-06-06 Thread Santosh Shilimkar

On 6/4/2016 1:59 PM, Sowmini Varadhan wrote:

When rds_tcp_accept_one() has to replace the existing tcp socket
with a newer tcp socket (duelling-syn resolution), it must lock_sock()
to suppress the rds_tcp_data_recv() path while callbacks are being
changed.  Also, existing RDS datagram reassembly state must be reset,
so that the next datagram on the new socket  does not have corrupted
state. Similarly when resetting the newly accepted socket, appropriate
locks and synchronization is needed.

This commit ensures correct synchronization by invoking
kernel_sock_shutdown to reset a newly accepted sock, and by taking
appropriate lock_sock()s (for old and new sockets) when resetting
existing callbacks.

Signed-off-by: Sowmini Varadhan 
---

Acked-by: Santosh Shilimkar 


Re: [PATCH net v2] sfc: report supported link speeds on SFP connections

2016-06-06 Thread Jarod Wilson
On Mon, Jun 06, 2016 at 05:29:30PM +0100, Bert Kenward wrote:
> 7000-series SFC NICs connected with an SFP+ module currently fail to
> report any supported link speeds.
> 
> Reported-by: Jarod Wilson 
> Signed-off-by: Bert Kenward 

Had a feeling my cut might not have been quite right. Looks good to me.

Reviewed-by: Jarod Wilson 

-- 
Jarod Wilson
ja...@redhat.com



ATTENTION ! ATTENTION !! ATTENTION !!!

2016-06-06 Thread Mr Kamal Ali Mohamed


Greetings Dear Friend,

I never want to disturb you at all but will like you give me your attention. I 
need a matured and capable hand in a business deal of $8, 500, 000.00 (Eight 
Million Five Hundred Thousand USA Dollars).I belief you are suitable to handle 
such project perfectly .I will give you more information on how we would handle 
this project. Please indicate your willingness by sending the below information 
for more clarification and easy communication.

INFORMATIONS NEEDED BELOW

(1)Full names:.
(2)Occupation:...
(3)Age and Sex:..
(4)Marital Status:
(5)Private phone number:...
(6)Current residential address:..

Am waiting for your urgent reply so that we will starts immediately, Sorry if 
you received this letter in your spam, Due to recent connection error here in 
my country.

May Almighty God Bless You!

Regards,
Mr. Kamal Ali Mohamed.


Re: [PATCH v4 7/7] phy: Add Northstar2 PCI Phy support

2016-06-06 Thread Florian Fainelli
On 06/06/2016 05:41 AM, Pramod Kumar wrote:
> Add PCI Phy support for Broadcom Northstar2 SoCs.  This driver uses the
> interface from the iproc mdio mux driver to enable the devices
> respective phys.
> 
> Reviewed-by: Andrew Lunn 
> Signed-off-by: Jon Mason 
> Signed-off-by: Pramod Kumar 
> ---
>  drivers/phy/Kconfig|   8 +++
>  drivers/phy/Makefile   |   2 +-
>  drivers/phy/phy-bcm-ns2-pcie.c | 115 
> +
>  3 files changed, 124 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/phy/phy-bcm-ns2-pcie.c
> 
> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
> index b869b98..01fb93b 100644
> --- a/drivers/phy/Kconfig
> +++ b/drivers/phy/Kconfig
> @@ -434,4 +434,12 @@ config PHY_CYGNUS_PCIE
>  
>  source "drivers/phy/tegra/Kconfig"
>  
> +config PHY_NS2_PCIE
> + tristate "Broadcom Northstar2 PCIe PHY driver"
> + depends on OF && MDIO_BUS_MUX_BCM_IPROC
> + select GENERIC_PHY
> + default ARCH_BCM_IPROC

Are not you missing a dependency on PHYLIB too to provide
mdio_module_register() etc. (at least make it build)?

-- 
Florian


Re: [PATCH v4 5/7] net: mdio-mux: Add MDIO mux driver for iProc SoCs

2016-06-06 Thread Florian Fainelli
On 06/06/2016 05:41 AM, Pramod Kumar wrote:
> iProc based SoCs supports the integrated mdio multiplexer which
> has the bus selection as well as mdio transaction generation logic
> inside.
> 
> This multiplexer has child buses for PCIe, SATA, USB and ETH. These
> buses could be internal or external to SOC where PHYs are attached.
> These buses could use C-45 or C-22 mdio transaction.
> 
> Signed-off-by: Pramod Kumar 

Reviewed-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v4 4/7] dt: mdio-mux: Add mdio multiplexer driver node

2016-06-06 Thread Florian Fainelli
On 06/06/2016 05:41 AM, Pramod Kumar wrote:
> Add integrated MDIO multiplexer driver node which contains
> two mux PCIe bus and one ethernet bus along with phys
> lying on these bus.
> 
> Signed-off-by: Pramod Kumar 
> ---
> + mdio_mux_iproc: mdio-mux@6602023c {
> + compatible = "brcm,mdio-mux-iproc";
> + reg = <0x6602023c 0x14>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + mdio@0 {
> + reg = <0x0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + pci_phy0: pci-phy@0 {
> + compatible = "brcm,ns2-pcie-phy";
> + reg = <0x0>;
> + #phy-cells = <0>;
> + };
> + };
> +
> + mdio@7 {
> + reg = <0x7>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + pci_phy1: pci-phy@0 {
> + compatible = "brcm,ns2-pcie-phy";
> + reg = <0x0>;
> + #phy-cells = <0>;
> + };

Are these two PHYs always available in the NS2 SoC, or does that depend
on interfaces exposed at the board level? Should not they be flagged
with a disabled status property by default and enabled in their
respective board files?
-- 
Florian


Re: [PATCH v4 3/7] binding: mdio-mux: Add DT binding doc for Broadcom MDIO bus multiplexer

2016-06-06 Thread Florian Fainelli
On 06/06/2016 05:41 AM, Pramod Kumar wrote:
> Add DT binding doc for Broadcom MDIO bus multiplexer driver.
> 
> Reviewed-by: Andrew Lunn 
> Signed-off-by: Pramod Kumar 

Reviewed-by: Florian Fainelli 


> +for example:
> + mdio_mux_iproc: mdio-mux@6602023c {

I think Rob wanted you to drop the underscores here in favor of dashes,
there are more below, not critical imho.

> + compatible = "brcm,mdio-mux-iproc";
> + reg = <0x6602023c 0x14>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + mdio@0 {
> + reg = <0x0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + pci_phy0: pci-phy@0 {


-- 
Florian


Re: [PATCH v4 1/7] mdio: mux: Enhanced MDIO mux framework for integrated multiplexers

2016-06-06 Thread Florian Fainelli
On 06/06/2016 05:41 AM, Pramod Kumar wrote:
> An integrated multiplexer uses same address space for
> "muxed bus selection" and "generation of mdio transaction"
> hence its good to register parent bus from mux driver.
> 
> Hence added a mechanism where mux driver could register a
> parent bus and pass it down to framework via mdio_mux_init api.
> 
> Signed-off-by: Pramod Kumar 

Reviewed-by: Florian Fainelli 


> diff --git a/include/linux/mdio-mux.h b/include/linux/mdio-mux.h
> index a243dbb..61f5b21 100644
> --- a/include/linux/mdio-mux.h
> +++ b/include/linux/mdio-mux.h
> @@ -10,11 +10,13 @@
>  #ifndef __LINUX_MDIO_MUX_H
>  #define __LINUX_MDIO_MUX_H
>  #include 
> +#include 

You could have added just a forward declaration, this is a pointer to
the structure so you don't need the compiler to have full knowledge of
the storage type. Not a biggie.
-- 
Florian


Re: [PATCH net-next] tcp: accept RST if SEQ matches right edge of right-most SACK block

2016-06-06 Thread Neal Cardwell
The functionality seems OK to me, though there are some
style/formatting issues, which checkpatch.pl picks up:

> ./scripts/checkpatch.pl 
> net-next-tcp-accept-RST-if-SEQ-matches-right-edge-of-right-most-SACK-block.patch
WARNING: line over 80 characters
#73: FILE: net/ipv4/tcp_input.c:5199:
+   /* RFC 5961 3.2 (extended to match against SACK too if
available):

WARNING: line over 80 characters
#74: FILE: net/ipv4/tcp_input.c:5200:
+* If seq num exactly matches RCV.NXT or the
right-most SACK block,

WARNING: line over 80 characters
#88: FILE: net/ipv4/tcp_input.c:5213:
+   for (this_sack = 1; this_sack <
tp->rx_opt.num_sacks; ++this_sack) {

WARNING: line over 80 characters
#89: FILE: net/ipv4/tcp_input.c:5214:
+   max_sack =
after(sp[this_sack].end_seq, max_sack) ?

WARNING: line over 80 characters
#90: FILE: net/ipv4/tcp_input.c:5215:
+
sp[this_sack].end_seq : max_sack;

total: 0 errors, 5 warnings, 0 checks, 40 lines checked

neal


Re: [PATCH 1/5] ethernet: add sun8i-emac driver

2016-06-06 Thread Florian Fainelli
On 06/03/2016 02:56 AM, LABBE Corentin wrote:
> This patch add support for sun8i-emac ethernet MAC hardware.
> It could be found in Allwinner H3/A83T/A64 SoCs.
> 
> It supports 10/100/1000 Mbit/s speed with half/full duplex.
> It can use an internal PHY (MII 10/100) or an external PHY
> via RGMII/RMII.
> 
> Signed-off-by: LABBE Corentin 
> ---

[snip]

> +
> +struct ethtool_str {
> + char name[ETH_GSTRING_LEN];

You might as well drop the encapsulating structure and just use an array
of strings?

> +};
> +

[snip]

> +
> +/* The datasheet said that each descriptor can transfers up to 4096bytes
> + * But latter, a register documentation reduce that value to 2048
> + * Anyway using 2048 cause strange behaviours and even BSP driver use 2047
> + */
> +#define DESC_BUF_MAX 2044
> +#if (DESC_BUF_MAX < (ETH_FRAME_LEN + 4))
> +#error "DESC_BUF_MAX must be set at minimum to ETH_FRAME_LEN + 4"
> +#endif

You can probably drop that, it would not make much sense to enable
fragments and a buffer size smaller than ETH_FRAME_LEN + ETH_FCS_LEN anyway.

> +
> +/* MAGIC value for knowing if a descriptor is available or not */
> +#define DCLEAN (BIT(16) | BIT(14) | BIT(12) | BIT(10) | BIT(9))
> +
> +/* Structure of DMA descriptor used by the hardware  */
> +struct dma_desc {
> + u32 status; /* status of the descriptor */
> + u32 st; /* Information on the frame */
> + u32 buf_addr; /* physical address of the frame data */
> + u32 next; /* physical address of next dma_desc */
> +} __packed __aligned(4);

This has been noticed in other emails, no need for the __packed here,
they are all naturally aligned.

> +
> +/* Benched on OPIPC with 100M, setting more than 256 does not give any
> + * perf boost
> + */
> +static int nbdesc_tx = 256;
> +module_param(nbdesc_tx, int, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(nbdesc_tx, "Number of descriptors in the TX list");
> +static int nbdesc_rx = 128;
> +module_param(nbdesc_rx, int, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(nbdesc_rx, "Number of descriptors in the RX list");

This needs to be statically defined to begin driver operation with, and
then implement the ethtool operations to re-size the rings would you
want that.

[snip]

> +/* Return the number of contiguous free descriptors
> + * starting from tx_slot
> + */
> +static int rb_tx_numfreedesc(struct net_device *ndev)
> +{
> + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> +
> + if (priv->tx_slot < priv->tx_dirty)
> + return priv->tx_dirty - priv->tx_slot;

Does this work with if tx_dirty wraps around?

> +
> + return (nbdesc_tx - priv->tx_slot) + priv->tx_dirty;
> +}
> +
> +/* Allocate a skb in a DMA descriptor
> + *
> + * @i index of slot to fill
> +*/
> +static int sun8i_emac_rx_sk(struct net_device *ndev, int i)
> +{
> + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> + struct dma_desc *ddesc;
> + struct sk_buff *sk;

The networking stack typically refers to "sk" as socket and skb as
socket buffers.

> +
> + ddesc = priv->dd_rx + i;
> +
> + ddesc->st = 0;
> +
> + sk = netdev_alloc_skb_ip_align(ndev, DESC_BUF_MAX);
> + if (!sk)
> + return -ENOMEM;
> +
> + /* should not happen */
> + if (unlikely(priv->rx_sk[i]))
> + dev_warn(priv->dev, "BUG: Leaking a skbuff\n");
> +
> + priv->rx_sk[i] = sk;
> +
> + ddesc->buf_addr = dma_map_single(priv->dev, sk->data,
> +  DESC_BUF_MAX, DMA_FROM_DEVICE);
> + if (dma_mapping_error(priv->dev, ddesc->buf_addr)) {
> + dev_err(priv->dev, "ERROR: Cannot dma_map RX buffer\n");
> + dev_kfree_skb(sk);
> + return -EFAULT;
> + }
> + ddesc->st |= DESC_BUF_MAX;
> + ddesc->status = BIT(31);

You are missing a lightweight barrier here to ensure there is no
re-ordering done by the compiler in how you write to the descriptors in
DRAM, even though they are allocated from dma_alloc_coherent().

[snip]

> +static void sun8i_emac_set_link_mode(struct sun8i_emac_priv *priv)
> +{
> + u32 v;
> +
> + v = readl(priv->base + SUN8I_EMAC_BASIC_CTL0);
> +
> + if (priv->duplex)
> + v |= BIT(0);
> + else
> + v &= ~BIT(0);
> +
> + v &= ~0x0C;
> + switch (priv->speed) {
> + case 1000:
> + break;
> + case 100:
> + v |= BIT(2);
> + v |= BIT(3);
> + break;
> + case 10:
> + v |= BIT(3);
> + break;
> + }

Proper defines for all of these bits and masks?

> +
> + writel(v, priv->base + SUN8I_EMAC_BASIC_CTL0);
> +}
> +
> +static void sun8i_emac_flow_ctrl(struct sun8i_emac_priv *priv, int duplex,
> +  int fc, int pause)
> +{
> + u32 flow = 0;

pause is unused (outside of printing it) here

> +
> + netif_dbg(priv, link, priv->ndev, "%s %d %d %d\n", __func__,
> +   duplex, fc, pause);
> +
> + flow = readl(priv->base + 

[PATCH v3 3/5] drivers: net: phy: Add MDIO driver

2016-06-06 Thread Iyappan Subramanian
Currently, SGMII based 1G rely on the hardware registers for link state
and sometimes it's not reliable.  To get most accurate link state, this
interface has to use the MDIO bus to poll the PHY.

In X-Gene SoC, MDIO bus is shared across RGMII and SGMII based 1G
interfaces, so adding this driver to manage MDIO bus.  This driver
registers the mdio bus and registers the PHYs connected to it.

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
Tested-by: Toan Le 
Tested-by: Matthias Brugger 
---
 drivers/net/ethernet/apm/xgene/Kconfig |   1 +
 drivers/net/phy/Kconfig|   7 +
 drivers/net/phy/Makefile   |   1 +
 drivers/net/phy/mdio-xgene.c   | 532 +
 drivers/net/phy/mdio-xgene.h   | 140 +
 5 files changed, 681 insertions(+)
 create mode 100644 drivers/net/phy/mdio-xgene.c
 create mode 100644 drivers/net/phy/mdio-xgene.h

diff --git a/drivers/net/ethernet/apm/xgene/Kconfig 
b/drivers/net/ethernet/apm/xgene/Kconfig
index 19e38af..300e3b5 100644
--- a/drivers/net/ethernet/apm/xgene/Kconfig
+++ b/drivers/net/ethernet/apm/xgene/Kconfig
@@ -3,6 +3,7 @@ config NET_XGENE
depends on HAS_DMA
depends on ARCH_XGENE || COMPILE_TEST
select PHYLIB
+   select MDIO_XGENE
help
  This is the Ethernet driver for the on-chip ethernet interface on the
  APM X-Gene SoC.
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 6dad9a9..193f418 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -271,6 +271,13 @@ config MDIO_BCM_IPROC
  This module provides a driver for the MDIO busses found in the
  Broadcom iProc SoC's.
 
+config MDIO_XGENE
+   tristate "APM X-Gene SoC MDIO bus controller"
+   help
+ This module provides a driver for the MDIO busses found in the
+ APM X-Gene SoC's.
+
+
 endif # PHYLIB
 
 config MICREL_KS8995MA
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index fcdbb92..9cbd2af 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -44,3 +44,4 @@ obj-$(CONFIG_MDIO_MOXART) += mdio-moxart.o
 obj-$(CONFIG_MDIO_BCM_UNIMAC)  += mdio-bcm-unimac.o
 obj-$(CONFIG_MICROCHIP_PHY)+= microchip.o
 obj-$(CONFIG_MDIO_BCM_IPROC)   += mdio-bcm-iproc.o
+obj-$(CONFIG_MDIO_XGENE)   += mdio-xgene.o
diff --git a/drivers/net/phy/mdio-xgene.c b/drivers/net/phy/mdio-xgene.c
new file mode 100644
index 000..48273e3
--- /dev/null
+++ b/drivers/net/phy/mdio-xgene.c
@@ -0,0 +1,532 @@
+/* Applied Micro X-Gene SoC MDIO Driver
+ *
+ * Copyright (c) 2016, Applied Micro Circuits Corporation
+ * Author: Iyappan Subramanian 
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "mdio-xgene.h"
+
+static bool xgene_mdio_status;
+
+static bool xgene_enet_rd_indirect(void __iomem *addr, void __iomem *rd,
+  void __iomem *cmd, void __iomem *cmd_done,
+  u32 rd_addr, u32 *rd_data)
+{
+   u32 done;
+   u8 wait = 10;
+
+   iowrite32(rd_addr, addr);
+   iowrite32(XGENE_ENET_RD_CMD, cmd);
+
+   /* wait for read command to complete */
+   while (!(done = ioread32(cmd_done)) && wait--)
+   udelay(1);
+
+   if (!done)
+   return false;
+
+   *rd_data = ioread32(rd);
+   iowrite32(0, cmd);
+
+   return true;
+}
+
+static void xgene_enet_rd_mcx_mac(struct xgene_mdio_pdata *pdata,
+ u32 rd_addr, u32 *rd_data)
+{
+   void __iomem *addr, *rd, *cmd, *cmd_done;
+
+   addr = pdata->mac_csr_addr + MAC_ADDR_REG_OFFSET;
+   rd = pdata->mac_csr_addr + MAC_READ_REG_OFFSET;
+   cmd = pdata->mac_csr_addr + MAC_COMMAND_REG_OFFSET;
+   cmd_done = pdata->mac_csr_addr + MAC_COMMAND_DONE_REG_OFFSET;
+
+   if (!xgene_enet_rd_indirect(addr, rd, cmd, cmd_done, rd_addr, rd_data))
+   dev_err(pdata->dev, "MCX mac read failed, addr: 0x%04x\n",
+   rd_addr);
+}
+
+static bool xgene_enet_wr_indirect(void __iomem *addr, void __iomem *wr,
+ 

[PATCH v3 2/5] drivers: net: xgene: Backward compatibility with older firmware

2016-06-06 Thread Iyappan Subramanian
This patch looks for CONFIG_MDIO_XGENE and based on phy-handle DT/ACPI
fields, sets the mdio_driver flag.  The rest of the driver uses the
this flag for any MDIO management, in the case of backward compatibility.
Also, some code clean up done around mdio configuration/remove.

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
Tested-by: Toan Le 
Tested-by: Matthias Brugger 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c|  60 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 165 +++---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |   2 +
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c |  13 +-
 4 files changed, 148 insertions(+), 92 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 5d6d14b..38d6ee4 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -476,9 +476,13 @@ static void xgene_gmac_reset(struct xgene_enet_pdata 
*pdata)
 static void xgene_enet_configure_clock(struct xgene_enet_pdata *pdata)
 {
struct device *dev = >pdev->dev;
+   struct clk *parent;
 
if (dev->of_node) {
-   struct clk *parent = clk_get_parent(pdata->clk);
+   if (IS_ERR(pdata->clk))
+   return;
+
+   parent = clk_get_parent(pdata->clk);
 
switch (pdata->phy_speed) {
case SPEED_10:
@@ -572,6 +576,9 @@ static void xgene_gmac_init(struct xgene_enet_pdata *pdata)
 {
u32 value;
 
+   if (!pdata->mdio_driver)
+   xgene_gmac_reset(pdata);
+
xgene_gmac_set_speed(pdata);
xgene_gmac_set_mac_addr(pdata);
 
@@ -677,7 +684,14 @@ static int xgene_enet_reset(struct xgene_enet_pdata *pdata)
if (!xgene_ring_mgr_init(pdata))
return -ENODEV;
 
-   xgene_enet_ecc_init(pdata);
+   if (!pdata->mdio_driver) {
+   if (!IS_ERR(pdata->clk)) {
+   clk_prepare_enable(pdata->clk);
+   clk_disable_unprepare(pdata->clk);
+   clk_prepare_enable(pdata->clk);
+   xgene_enet_ecc_init(pdata);
+   }
+   }
xgene_enet_config_ring_if_assoc(pdata);
 
return 0;
@@ -800,27 +814,9 @@ static int xgene_mdiobus_register(struct xgene_enet_pdata 
*pdata,
  struct mii_bus *mdio)
 {
struct device *dev = >pdev->dev;
-   struct device_node *mdio_np = NULL;
-   struct device_node *child_np;
-   u32 phyid;
 
if (dev->of_node) {
-   for_each_child_of_node(dev->of_node, child_np) {
-   if (of_device_is_compatible(child_np,
-   "apm,xgene-mdio")) {
-   mdio_np = child_np;
-   break;
-   }
-   }
-
-   if (!mdio_np) {
-   mdiobus_free(mdio);
-   return 0;
-   }
-
-   pdata->mdio_driver = false;
-
-   return of_mdiobus_register(mdio, mdio_np);
+   return of_mdiobus_register(mdio, pdata->mdio_np);
} else {
 #ifdef CONFIG_ACPI
struct phy_device *phy;
@@ -839,13 +835,7 @@ static int xgene_mdiobus_register(struct xgene_enet_pdata 
*pdata,
if (ret)
return ret;
 
-   ret = device_property_read_u32(dev, "phy-channel", );
-   if (ret)
-   ret = device_property_read_u32(dev, "phy-addr", );
-   if (ret)
-   return -EINVAL;
-
-   phy = get_phy_device(mdio, phyid, false);
+   phy = get_phy_device(mdio, pdata->phy_id, false);
if (IS_ERR(phy))
return -EIO;
 
@@ -858,6 +848,8 @@ static int xgene_mdiobus_register(struct xgene_enet_pdata 
*pdata,
return ret;
 #endif
}
+
+   return 0;
 }
 
 int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
@@ -885,10 +877,6 @@ int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
if (mdio_bus->state == MDIOBUS_REGISTERED)
mdiobus_unregister(pdata->mdio_bus);
mdiobus_free(mdio_bus);
-   if (pdata->mdio_driver) {
-   ret = xgene_enet_phy_connect(ndev);
-   return 0;
-   }
return ret;
}
pdata->mdio_bus = mdio_bus;
@@ -911,11 +899,9 @@ void xgene_enet_mdio_remove(struct xgene_enet_pdata *pdata)
if (pdata->phy_dev)
phy_disconnect(pdata->phy_dev);
 
-   if (!pdata->mdio_driver) {
-   mdiobus_unregister(pdata->mdio_bus);
-   

[PATCH v3 5/5] drivers: net: xgene: Fix module load/unload crash

2016-06-06 Thread Iyappan Subramanian
When the driver is configured as kernel module and when it gets unloaded
and reloaded, kernel crash was observed, due to incomplete hardware
cleanup.  This patch addresses this issue with the following changes,

- Reordered mac enable and disable
- Added hardware prefetch buffer cleanup
- Added Tx completion ring free
- Fixed delete bufpool
- Moved down delete_desc_rings after ring cleanup

- Moved regsiter_netdev call after hardware is ready
- deleted napi_del function call, since it's called by free_netdev
- Calling netif_tx_start_queues and netif_tx_stop_queues
- correcting setting irq_name before calling request_irq.
- Calling dev_close() within remove
- Added shutdown callback
- Changed to use dmam_ APIs

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
Tested-by: Toan Le 
Tested-by: Matthias Brugger 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c|  71 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|  14 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 199 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  32 +---
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c |  90 --
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c |  66 ++-
 6 files changed, 329 insertions(+), 143 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 38d6ee4..8e3827f 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -684,21 +684,75 @@ static int xgene_enet_reset(struct xgene_enet_pdata 
*pdata)
if (!xgene_ring_mgr_init(pdata))
return -ENODEV;
 
-   if (!pdata->mdio_driver) {
-   if (!IS_ERR(pdata->clk)) {
-   clk_prepare_enable(pdata->clk);
-   clk_disable_unprepare(pdata->clk);
-   clk_prepare_enable(pdata->clk);
-   xgene_enet_ecc_init(pdata);
+   if (pdata->mdio_driver) {
+   xgene_enet_config_ring_if_assoc(pdata);
+   return 0;
+   }
+   if (!IS_ERR(pdata->clk)) {
+   clk_prepare_enable(pdata->clk);
+   clk_disable_unprepare(pdata->clk);
+   clk_prepare_enable(pdata->clk);
+   } else {
+#ifdef CONFIG_ACPI
+   if (acpi_has_method(ACPI_HANDLE(>pdev->dev), "_RST")) {
+   acpi_evaluate_object(ACPI_HANDLE(>pdev->dev),
+"_RST", NULL, NULL);
+   } else if (acpi_has_method(ACPI_HANDLE(>pdev->dev),
+"_INI")) {
+   acpi_evaluate_object(ACPI_HANDLE(>pdev->dev),
+"_INI", NULL, NULL);
}
+#endif
}
+
+   xgene_enet_ecc_init(pdata);
xgene_enet_config_ring_if_assoc(pdata);
 
return 0;
 }
 
+static void xgene_enet_clear(struct xgene_enet_pdata *pdata,
+struct xgene_enet_desc_ring *ring)
+{
+   u32 addr, val, data;
+
+   val = xgene_enet_ring_bufnum(ring->id);
+
+   if (xgene_enet_is_bufpool(ring->id)) {
+   addr = ENET_CFGSSQMIFPRESET_ADDR;
+   data = BIT(val - 0x20);
+   } else {
+   addr = ENET_CFGSSQMIWQRESET_ADDR;
+   data = BIT(val);
+   }
+
+   xgene_enet_wr_ring_if(pdata, addr, data);
+}
+
 static void xgene_gport_shutdown(struct xgene_enet_pdata *pdata)
 {
+   struct xgene_enet_desc_ring *ring;
+   u32 pb, val;
+   int i;
+
+   pb = 0;
+   for (i = 0; i < pdata->rxq_cnt; i++) {
+   ring = pdata->rx_ring[i]->buf_pool;
+
+   val = xgene_enet_ring_bufnum(ring->id);
+   pb |= BIT(val - 0x20);
+   }
+   xgene_enet_wr_ring_if(pdata, ENET_CFGSSQMIFPRESET_ADDR, pb);
+
+   pb = 0;
+   for (i = 0; i < pdata->txq_cnt; i++) {
+   ring = pdata->tx_ring[i];
+
+   val = xgene_enet_ring_bufnum(ring->id);
+   pb |= BIT(val);
+   }
+   xgene_enet_wr_ring_if(pdata, ENET_CFGSSQMIWQRESET_ADDR, pb);
+
if (!IS_ERR(pdata->clk))
clk_disable_unprepare(pdata->clk);
 }
@@ -748,7 +802,7 @@ static void xgene_enet_adjust_link(struct net_device *ndev)
 }
 
 #ifdef CONFIG_ACPI
-static struct acpi_device *acpi_phy_find_device(struct device *dev)
+struct acpi_device *acpi_phy_find_device(struct device *dev)
 {
struct acpi_reference_args args;
struct fwnode_handle *fw_node;
@@ -869,7 +923,7 @@ int xgene_enet_mdio_config(struct xgene_enet_pdata *pdata)
 ndev->name);
 
mdio_bus->priv = pdata;
-   mdio_bus->parent = >dev;
+   mdio_bus->parent = >pdev->dev;
 
ret = xgene_mdiobus_register(pdata, mdio_bus);
if (ret) {
@@ -917,6 +971,7 @@ const struct xgene_mac_ops 

[PATCH v3 0/5] drivers: net: xgene: Fix 1G hot-plug and module support

2016-06-06 Thread Iyappan Subramanian
This patchset addresses the following issues,

1. hot-plug issue on the SGMII 1G interface
- by adding a driver for MDIO management
2. fixes the kernel crash when the driver loaded as an kernel module
- by fixing hardware cleanups and rearrange kernel API calls

Signed-off-by: Iyappan Subramanian 
Tested-by: Matthias Brugger 
---
v3: Address review comments from v2
- Add comment about hardware clock reset sequence on xgene_mdio_reset

v2: Address review comments from v1
- Fixed patch 1 compilation error
- Fixed mdio@1f61 xge0clk reference
- Squashed dtb patches
- Added PORT_OFFSET macro

v1:
- Initial version
---

Iyappan Subramanian (5):
  drivers: net: xgene: MAC and PHY configuration changes
  drivers: net: xgene: Backward compatibility with older firmware
  drivers: net: phy: Add MDIO driver
  dtb: xgene: Add MDIO node
  drivers: net: xgene: Fix module load/unload crash

 arch/arm64/boot/dts/apm/apm-merlin.dts|   9 +
 arch/arm64/boot/dts/apm/apm-mustang.dts   |  12 +
 arch/arm64/boot/dts/apm/apm-shadowcat.dtsi|  11 +
 arch/arm64/boot/dts/apm/apm-storm.dtsi|  38 +-
 drivers/net/ethernet/apm/xgene/Kconfig|   1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 241 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|  19 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  | 325 -
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |  36 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 191 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.h |   8 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c |  66 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h |   4 +
 drivers/net/phy/Kconfig   |   7 +
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/mdio-xgene.c  | 532 ++
 drivers/net/phy/mdio-xgene.h  | 140 ++
 17 files changed, 1374 insertions(+), 267 deletions(-)
 create mode 100644 drivers/net/phy/mdio-xgene.c
 create mode 100644 drivers/net/phy/mdio-xgene.h

-- 
1.9.1



[PATCH v3 4/5] dtb: xgene: Add MDIO node

2016-06-06 Thread Iyappan Subramanian
Added mdio node for mdio driver.  Also added phy-handle
reference to the ethernet nodes.

Removed unused mdio subnode within storm menet ethernet node.
Removed unused clock node from storm sgenet1.

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
Tested-by: Toan Le 
Tested-by: Matthias Brugger 
---
 arch/arm64/boot/dts/apm/apm-merlin.dts |  9 +++
 arch/arm64/boot/dts/apm/apm-mustang.dts| 12 ++
 arch/arm64/boot/dts/apm/apm-shadowcat.dtsi | 11 +
 arch/arm64/boot/dts/apm/apm-storm.dtsi | 38 +++---
 4 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/boot/dts/apm/apm-merlin.dts 
b/arch/arm64/boot/dts/apm/apm-merlin.dts
index 387c6a8..c765f26 100644
--- a/arch/arm64/boot/dts/apm/apm-merlin.dts
+++ b/arch/arm64/boot/dts/apm/apm-merlin.dts
@@ -83,3 +83,12 @@
status = "ok";
};
 };
+
+ {
+   sgenet0phy: phy@3 {
+   reg = <0x0>;
+   };
+   sgenet1phy: phy@2 {
+   reg = <0x2>;
+   };
+};
diff --git a/arch/arm64/boot/dts/apm/apm-mustang.dts 
b/arch/arm64/boot/dts/apm/apm-mustang.dts
index 44db32e..c4e2bc4 100644
--- a/arch/arm64/boot/dts/apm/apm-mustang.dts
+++ b/arch/arm64/boot/dts/apm/apm-mustang.dts
@@ -79,3 +79,15 @@
  {
status = "ok";
 };
+
+ {
+   menetphy: phy@3 {
+   reg = <0x3>;
+   };
+   sgenet0phy: phy@4 {
+   reg = <0x4>;
+   };
+   sgenet1phy: phy@5 {
+   reg = <0x5>;
+   };
+};
diff --git a/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi 
b/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
index c569f76..17333fa 100644
--- a/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
+++ b/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
@@ -625,6 +625,15 @@
apm,irq-start = <8>;
};
 
+   mdio: mdio@0x1f61 {
+   compatible = "apm,xgene-mdio-sgmii";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x0 0x1f61 0x0 0x100>,
+ <0x0 0X1f61d000 0x0 0X100>;
+   clocks = < 0>;
+   };
+
sgenet0: ethernet@1f61 {
compatible = "apm,xgene2-sgenet";
status = "disabled";
@@ -637,6 +646,7 @@
clocks = < 0>;
local-mac-address = [00 01 73 00 00 01];
phy-connection-type = "sgmii";
+   phy-handle = <>;
};
 
xgenet1: ethernet@1f62 {
@@ -659,6 +669,7 @@
clocks = < 0>;
local-mac-address = [00 01 73 00 00 02];
phy-connection-type = "xgmii";
+   phy-handle = <>;
};
 
rng: rng@1052 {
diff --git a/arch/arm64/boot/dts/apm/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm/apm-storm.dtsi
index 5147d76..f631fe4 100644
--- a/arch/arm64/boot/dts/apm/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm/apm-storm.dtsi
@@ -237,20 +237,11 @@
clocks = < 0>;
reg = <0x0 0x1f21c000 0x0 0x1000>;
reg-names = "csr-reg";
-   csr-mask = <0x3>;
+   csr-mask = <0xa>;
+   enable-mask = <0xf>;
clock-output-names = "sge0clk";
};
 
-   sge1clk: sge1clk@1f21c000 {
-   compatible = "apm,xgene-device-clock";
-   #clock-cells = <1>;
-   clocks = < 0>;
-   reg = <0x0 0x1f21c000 0x0 0x1000>;
-   reg-names = "csr-reg";
-   csr-mask = <0xc>;
-   clock-output-names = "sge1clk";
-   };
-
xge0clk: xge0clk@1f61c000 {
compatible = "apm,xgene-device-clock";
#clock-cells = <1>;
@@ -921,6 +912,14 @@
clocks = < 0>;
};
 
+   mdio: mdio@0x1702 {
+   compatible = "apm,xgene-mdio-rgmii";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0x0 0x1702 0x0 0xd100>;
+   clocks = < 0>;
+   };
+
menet: ethernet@1702 {
compatible = "apm,xgene-enet";
status = "disabled";
@@ -930,21 +929,11 @@
reg-names = "enet_csr", "ring_csr", "ring_cmd";
interrupts = <0x0 0x3c 0x4>;
  

[PATCH v3 1/5] drivers: net: xgene: MAC and PHY configuration changes

2016-06-06 Thread Iyappan Subramanian
This patch fixes MAC configuration to support 10/100GbE for SGMII and
link_state call back. It also sets pdata->mdio_driver flag based on
ethernet mdio subnode and prepare for MDIO driver support.

In summary, following are the changes,

- Added set_speed function pointer in mac_ops
- Changed link_state to call the set_speed
- Add 10/100 support for SGMII based 1G
- Fixed mac_init for 1G

- Call mac_ops rx_enable/disable and tx_enable/disable function pointers
- Add acpi_phy_find_device to find PHY using phy-handle reference object
- Changing phy_start and phy_stop calls based on phy_dev object existence
- Calling phy_connect based on pdata->mdio_driver flag

Signed-off-by: Iyappan Subramanian 
Tested-by: Fushen Chen 
Tested-by: Toan Le 
Tested-by: Matthias Brugger 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c| 190 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h|   5 +
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c  |  41 +++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h  |   2 +
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.c | 106 +++-
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.h |   8 +
 drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.h |   4 +
 7 files changed, 259 insertions(+), 97 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index 2f5638f..5d6d14b 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -512,14 +512,11 @@ static void xgene_enet_configure_clock(struct 
xgene_enet_pdata *pdata)
 #endif
 }
 
-static void xgene_gmac_init(struct xgene_enet_pdata *pdata)
+static void xgene_gmac_set_speed(struct xgene_enet_pdata *pdata)
 {
struct device *dev = >pdev->dev;
-   u32 value, mc2;
-   u32 intf_ctl, rgmii;
-   u32 icm0, icm2;
-
-   xgene_gmac_reset(pdata);
+   u32 icm0, icm2, mc2;
+   u32 intf_ctl, rgmii, value;
 
xgene_enet_rd_mcx_csr(pdata, ICM_CONFIG0_REG_0_ADDR, );
xgene_enet_rd_mcx_csr(pdata, ICM_CONFIG2_REG_0_ADDR, );
@@ -564,7 +561,18 @@ static void xgene_gmac_init(struct xgene_enet_pdata *pdata)
mc2 |= FULL_DUPLEX2 | PAD_CRC;
xgene_enet_wr_mcx_mac(pdata, MAC_CONFIG_2_ADDR, mc2);
xgene_enet_wr_mcx_mac(pdata, INTERFACE_CONTROL_ADDR, intf_ctl);
+   xgene_enet_wr_csr(pdata, RGMII_REG_0_ADDR, rgmii);
+   xgene_enet_configure_clock(pdata);
+
+   xgene_enet_wr_mcx_csr(pdata, ICM_CONFIG0_REG_0_ADDR, icm0);
+   xgene_enet_wr_mcx_csr(pdata, ICM_CONFIG2_REG_0_ADDR, icm2);
+}
 
+static void xgene_gmac_init(struct xgene_enet_pdata *pdata)
+{
+   u32 value;
+
+   xgene_gmac_set_speed(pdata);
xgene_gmac_set_mac_addr(pdata);
 
/* Adjust MDC clock frequency */
@@ -579,15 +587,10 @@ static void xgene_gmac_init(struct xgene_enet_pdata 
*pdata)
 
/* Rtype should be copied from FP */
xgene_enet_wr_csr(pdata, RSIF_RAM_DBG_REG0_ADDR, 0);
-   xgene_enet_wr_csr(pdata, RGMII_REG_0_ADDR, rgmii);
-   xgene_enet_configure_clock(pdata);
 
/* Rx-Tx traffic resume */
xgene_enet_wr_csr(pdata, CFG_LINK_AGGR_RESUME_0_ADDR, TX_PORT0);
 
-   xgene_enet_wr_mcx_csr(pdata, ICM_CONFIG0_REG_0_ADDR, icm0);
-   xgene_enet_wr_mcx_csr(pdata, ICM_CONFIG2_REG_0_ADDR, icm2);
-
xgene_enet_rd_mcx_csr(pdata, RX_DV_GATE_REG_0_ADDR, );
value &= ~TX_DV_GATE_EN0;
value &= ~RX_DV_GATE_EN0;
@@ -671,25 +674,12 @@ bool xgene_ring_mgr_init(struct xgene_enet_pdata *p)
 
 static int xgene_enet_reset(struct xgene_enet_pdata *pdata)
 {
-   u32 val;
-
if (!xgene_ring_mgr_init(pdata))
return -ENODEV;
 
-   if (!IS_ERR(pdata->clk)) {
-   clk_prepare_enable(pdata->clk);
-   clk_disable_unprepare(pdata->clk);
-   clk_prepare_enable(pdata->clk);
-   xgene_enet_ecc_init(pdata);
-   }
+   xgene_enet_ecc_init(pdata);
xgene_enet_config_ring_if_assoc(pdata);
 
-   /* Enable auto-incr for scanning */
-   xgene_enet_rd_mcx_mac(pdata, MII_MGMT_CONFIG_ADDR, );
-   val |= SCAN_AUTO_INCR;
-   MGMT_CLOCK_SEL_SET(, 1);
-   xgene_enet_wr_mcx_mac(pdata, MII_MGMT_CONFIG_ADDR, val);
-
return 0;
 }
 
@@ -724,29 +714,49 @@ static int xgene_enet_mdio_write(struct mii_bus *bus, int 
mii_id, int regnum,
 static void xgene_enet_adjust_link(struct net_device *ndev)
 {
struct xgene_enet_pdata *pdata = netdev_priv(ndev);
+   const struct xgene_mac_ops *mac_ops = pdata->mac_ops;
struct phy_device *phydev = pdata->phy_dev;
 
if (phydev->link) {
if (pdata->phy_speed != phydev->speed) {
pdata->phy_speed = phydev->speed;
-   xgene_gmac_init(pdata);
-   xgene_gmac_rx_enable(pdata);
-   

Re: [PATCH 3/5] ARM: sun8i: dt: Add DT bindings documentation for Allwinner sun8i-emac

2016-06-06 Thread Corentin LABBE
Le 06/06/2016 16:14, Rob Herring a écrit :
> On Fri, Jun 03, 2016 at 11:56:28AM +0200, LABBE Corentin wrote:
>> This patch adds documentation for Device-Tree bindings for the
>> Allwinner sun8i-emac driver.
>>
>> Signed-off-by: LABBE Corentin 
>> ---
>>  .../bindings/net/allwinner,sun8i-emac.txt  | 64 
>> ++
>>  1 file changed, 64 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt 
>> b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
>> new file mode 100644
>> index 000..cf71a71
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-emac.txt
>> @@ -0,0 +1,64 @@
>> +* Allwinner sun8i EMAC ethernet controller
>> +
>> +Required properties:
>> +- compatible: "allwinner,sun8i-a83t-emac", "allwinner,sun8i-h3-emac",
>> +or "allwinner,sun50i-a64-emac"
>> +- reg: address and length of the register sets for the device.
>> +- reg-names: should be "emac" and "syscon", matching the register sets
> 
> Is syscon shared with other devices? Your example only has 1 reg 
> address.
> 

The example is bad, emac and syscon are two distinct regspaces.
I will correct the example.

>> +- interrupts: interrupt for the device
>> +- clocks: A phandle to the reference clock for this device
>> +- clock-names: should be "ahb"
>> +- resets: A phandle to the reset control for this device
>> +- reset-names: should be "ahb"
>> +- phy-mode: See ethernet.txt
>> +- phy or phy-handle: See ethernet.txt
>> +- #address-cells: shall be 1
>> +- #size-cells: shall be 0
>> +
>> +"allwinner,sun8i-h3-emac" also requires:
>> +- clocks: an extra phandle to the reference clock for the EPHY
>> +- clock-names: an extra "ephy" entry matching the clocks property
>> +- resets: an extra phandle to the reset control for the EPHY
>> +- resets-names: an extra "ephy" entry matching the resets property
>> +
>> +See ethernet.txt in the same directory for generic bindings for ethernet
>> +controllers.
>> +
>> +The device node referenced by "phy" or "phy-handle" should be a child node
>> +of this node. See phy.txt for the generic PHY bindings.
>> +
>> +Optional properties:
>> +- phy-supply: phandle to a regulator if the PHY needs one
>> +- phy-io-supply: phandle to a regulator if the PHY needs a another one for 
>> I/O.
>> + This is sometimes found with RGMII PHYs, which use a second
>> + regulator for the lower I/O voltage.
> 
> These should go in the phy's node.
> 

In fact, I forgot to remove them, since for the moment, the driver sent do not 
have any regulator support.

Thanks



RE: [EXT] [PATCH v4] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-06 Thread Mario_Limonciello
> -Original Message-
> From: Konstantin Shkolnyy [mailto:konstantin.shkol...@silabs.com]
> Sent: Monday, June 6, 2016 12:43 PM
> To: Limonciello, Mario ;
> hayesw...@realtek.com
> Cc: LKML ; Netdev
> ; Linux USB ;
> pali.ro...@gmail.com; anthony.w...@canonical.com; Greg KH
> 
> Subject: RE: [EXT] [PATCH v4] r8152: Add support for setting pass through
> MAC address on RTL8153-AD
> 
> > -Original Message-
> > From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-
> > ow...@vger.kernel.org] On Behalf Of Mario Limonciello
> > Sent: Monday, June 06, 2016 12:19
> > To: hayesw...@realtek.com
> > Cc: LKML; Netdev; Linux USB; pali.ro...@gmail.com;
> > anthony.w...@canonical.com; Greg KH; Mario Limonciello
> > Subject: [EXT] [PATCH v4] r8152: Add support for setting pass through MAC
> > address on RTL8153-AD
> >
> > The RTL8153-AD supports a persistent system specific MAC address.
> > This means a device plugged into two different systems with host side
> > support will show different (but persistent) MAC addresses.
> >
> > This information for the system's persistent MAC address is burned in
> when
> > the system HW is built and available under _SB\AMAC in the DSDT at
> > runtime.
> >
> > This technology is currently implemented in the Dell TB15 and WD15 Type-C
> > docks.  More information is available here:
> > http://www.dell.com/support/article/us/en/04/SLN301147
> 
> What is going to happen if I connect multiple dongles? Will they all get the
> same address?

If you connect a dongle without a RTL8153-AD or without that bit they'll
get the MAC that was burned into the dongle.

If you connect multiple docks that have this Realtek chip with this bit set,
yes they'll all get the same address.
I confirmed that's what happens on Windows too.



Re: [net-next PATCH 1/1] net sched: indentation and other OCD stylistic fixes

2016-06-06 Thread Cong Wang
On Sun, Jun 5, 2016 at 7:41 AM, Jamal Hadi Salim  wrote:
> From: Jamal Hadi Salim 
>
> Signed-off-by: Jamal Hadi Salim 

Looks good,

Acked-by: Cong Wang 

Thanks!


RE: [EXT] [PATCH v4] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-06 Thread Konstantin Shkolnyy
> -Original Message-
> From: linux-usb-ow...@vger.kernel.org [mailto:linux-usb-
> ow...@vger.kernel.org] On Behalf Of Mario Limonciello
> Sent: Monday, June 06, 2016 12:19
> To: hayesw...@realtek.com
> Cc: LKML; Netdev; Linux USB; pali.ro...@gmail.com;
> anthony.w...@canonical.com; Greg KH; Mario Limonciello
> Subject: [EXT] [PATCH v4] r8152: Add support for setting pass through MAC
> address on RTL8153-AD
> 
> The RTL8153-AD supports a persistent system specific MAC address.
> This means a device plugged into two different systems with host side
> support will show different (but persistent) MAC addresses.
> 
> This information for the system's persistent MAC address is burned in when
> the system HW is built and available under _SB\AMAC in the DSDT at
> runtime.
> 
> This technology is currently implemented in the Dell TB15 and WD15 Type-C
> docks.  More information is available here:
> http://www.dell.com/support/article/us/en/04/SLN301147

What is going to happen if I connect multiple dongles? Will they all get the 
same address?



Re: [PATCH v3] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-06 Thread Greg KH
On Mon, Jun 06, 2016 at 05:24:57PM +, mario_limoncie...@dell.com wrote:
> That said, I would be highly surprised if Realtek decided to implement
> with another OEM differently.  It would increase their code complexity
> on Windows as well since this is part of the generic driver.

Ah, it's refreshing to see people who haven't dealt with BIOS and system
vendors for very long, your good attitude is a wonderful sign :)

Seriously, if there is ANY chance that it could be broken or changed, it
will be.  I place the odds that your next hardware product will do just
this for no obvious reason at all and am willing to buy the beer if it
doesn't happen.

thanks,

greg k-h


Re: [PATCH v2 net-next 0/3] net sched action timestamp improvements

2016-06-06 Thread Cong Wang
On Mon, Jun 6, 2016 at 3:32 AM, Jamal Hadi Salim  wrote:
> From: Jamal Hadi Salim 
>
> Various aggregations of duplicated code, fixes and introduction of firstused
> timestamp
>
> v2: add const for source time info per suggestion from Cong

For this whole series,

Acked-by: Cong Wang 


Thanks!


Re: [PATCH net 2/2] net: cls_u32: be more strict about skip-sw flag

2016-06-06 Thread Samudrala, Sridhar



On 6/6/2016 8:16 AM, Jakub Kicinski wrote:

Return an error if user requested skip-sw and the underlaying
hardware cannot handle tc offloads (or offloads are disabled).

Signed-off-by: Jakub Kicinski 


looks good. I think we need similar checks in u32_replace_hw_knode() too.



---
  net/sched/cls_u32.c | 21 +++--
  1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index b17e090f2fe1..fe05449537a3 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -457,20 +457,21 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp,
struct tc_to_netdev offload;
int err;
  
+	if (!tc_should_offload(dev, flags))

+   return tc_skip_sw(flags) ? -EINVAL : 0;
+
offload.type = TC_SETUP_CLSU32;
offload.cls_u32 = _offload;
  
-	if (tc_should_offload(dev, flags)) {

-   offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
-   offload.cls_u32->hnode.divisor = h->divisor;
-   offload.cls_u32->hnode.handle = h->handle;
-   offload.cls_u32->hnode.prio = h->prio;
+   offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
  
-		err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,

-   tp->protocol, );
-   if (tc_skip_sw(flags))
-   return err;
-   }
+   err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+   tp->protocol, );
+   if (tc_skip_sw(flags))
+   return err;
  
  	return 0;

  }




RE: [PATCH v3] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-06 Thread Mario_Limonciello
> > Realtek has this in their Windows driver that all OEM's will be taking.
> > Another OEM would just need to burn the right information into the SPI at
> > manufacturing and expose it to the DSDT.
> 
> Where it the match up for the Realtek bit to corrispond with this
> specific ACPI field?  If it's not in the ACPI spec, then vendors _WILL_
> do this in different ways.
> 
> Again, document it, in the code, what is going on here, that's all I'm
> asking.  I'm not asking you to change the logic at all!

I've added additional comments for v4.  I strongly believe that even if
another vendor does do this differently for their implementation of
\\_SB.AMAC this code will be safe to run.

All of the output from the field are tested for exactly what the field
should look like.

That said, I would be highly surprised if Realtek decided to implement
with another OEM differently.  It would increase their code complexity
on Windows as well since this is part of the generic driver.


[PATCH v4] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-06-06 Thread Mario Limonciello
The RTL8153-AD supports a persistent system specific MAC address.
This means a device plugged into two different systems with host side
support will show different (but persistent) MAC addresses.

This information for the system's persistent MAC address is burned in when
the system HW is built and available under _SB\AMAC in the DSDT at runtime.

This technology is currently implemented in the Dell TB15 and WD15 Type-C
docks.  More information is available here:
http://www.dell.com/support/article/us/en/04/SLN301147

Signed-off-by: Mario Limonciello 
---
Changes from v3:
 * Add additional comments about functions and what they're doing
 * Adjust warning calls to use netif instead
 * Rename function

 drivers/net/usb/r8152.c | 70 +++--
 1 file changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 3f9f6ed..b2339d3 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Information for net-next */
 #define NETNEXT_VERSION"08"
@@ -455,6 +456,11 @@
 /* SRAM_IMPEDANCE */
 #define RX_DRIVING_MASK0x6000
 
+/* MAC PASSTHRU */
+#define AD_MASK0xfee0
+#define EFUSE  0xcfdb
+#define PASS_THRU_MASK 0x1
+
 enum rtl_register_content {
_1000bps= 0x10,
_100bps = 0x08,
@@ -1030,6 +1036,59 @@ out1:
return ret;
 }
 
+/* Devices containing RTL8153-AD can support a persistent
+ * host system provided MAC address.
+ * Examples of this are Dell TB15 and Dell WD15 docks
+ */
+static int get_vendor_mac_passthru_addr(struct r8152 *tp, struct sockaddr *sa)
+{
+   acpi_status status;
+   struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+   union acpi_object *obj;
+   int ret = -EINVAL;
+   u32 ocp_data;
+   unsigned char buf[6];
+
+   /* test for -AD variant of RTL8153 */
+   ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_MISC_0);
+   if ((ocp_data & AD_MASK) != 0x1000)
+   return -ENODEV;
+
+   /* test for MAC address pass-through bit */
+   ocp_data = ocp_read_byte(tp, MCU_TYPE_USB, EFUSE);
+   if ((ocp_data & PASS_THRU_MASK) != 1)
+   return -ENODEV;
+
+   /* returns _AUXMAC_#AABBCCDDEEFF# */
+   status = acpi_evaluate_object(NULL, "\\_SB.AMAC", NULL, );
+   obj = (union acpi_object *)buffer.pointer;
+   if (ACPI_SUCCESS(status)) {
+   if (obj->type != ACPI_TYPE_BUFFER ||
+   obj->string.length != 0x17) {
+   netif_warn(tp, probe, tp->netdev, "Invalid buffer\n");
+   goto amacout;
+   }
+   if (strncmp(obj->string.pointer, "_AUXMAC_#", 9) != 0) {
+   netif_warn(tp, probe, tp->netdev, "Invalid header\n");
+   goto amacout;
+   }
+   ret = hex2bin(buf, obj->string.pointer + 9, 6);
+   if (ret < 0 || !is_valid_ether_addr(buf)) {
+   netif_warn(tp, probe, tp->netdev, "Invalid MAC\n");
+   goto amacout;
+   }
+   memcpy(sa->sa_data, buf, 6);
+   ether_addr_copy(tp->netdev->dev_addr, sa->sa_data);
+   netif_info(tp, probe, tp->netdev,
+  "Using pass-through MAC addr %pM\n", sa->sa_data);
+   ret = 0;
+   }
+
+amacout:
+   kfree(obj);
+   return ret;
+}
+
 static int set_ethernet_addr(struct r8152 *tp)
 {
struct net_device *dev = tp->netdev;
@@ -1038,8 +1097,15 @@ static int set_ethernet_addr(struct r8152 *tp)
 
if (tp->version == RTL_VER_01)
ret = pla_ocp_read(tp, PLA_IDR, 8, sa.sa_data);
-   else
-   ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data);
+   else {
+   /* if this is not an RTL8153-AD, no eFuse mac pass thru set,
+* or system doesn't provide valid _SB.AMAC this will be
+* be expected to non-zero
+*/
+   ret = get_vendor_mac_passthru_addr(tp, );
+   if (ret < 0)
+   ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data);
+   }
 
if (ret < 0) {
netif_err(tp, probe, dev, "Get ether addr fail\n");
-- 
2.7.4



Re: [PATCH net 1/2] net: cls_u32: fix error code for invalid flags

2016-06-06 Thread Samudrala, Sridhar



On 6/6/2016 8:16 AM, Jakub Kicinski wrote:

'err' variable is not set in this test, we would return whatever
previous test set 'err' to.

Signed-off-by: Jakub Kicinski 


Acked-by: Sridhar Samudrala 


---
  net/sched/cls_u32.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 079b43b3c5d2..b17e090f2fe1 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -863,7 +863,7 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (tb[TCA_U32_FLAGS]) {
flags = nla_get_u32(tb[TCA_U32_FLAGS]);
if (!tc_flags_valid(flags))
-   return err;
+   return -EINVAL;
}
  
  	n = (struct tc_u_knode *)*arg;




Re: [PATCH v2 3/5] drivers: net: phy: Add MDIO driver

2016-06-06 Thread Iyappan Subramanian
Hi Andrew,

Thanks for the review.

On Tue, May 31, 2016 at 6:11 PM, Andrew Lunn  wrote:
> On Tue, May 31, 2016 at 05:10:38PM -0700, Iyappan Subramanian wrote:
>> +static int xgene_mdio_reset(struct xgene_mdio_pdata *pdata)
>> +{
>> + int ret;
>> +
>> + if (pdata->mdio_id == XGENE_MDIO_RGMII) {
>> + if (pdata->dev->of_node) {
>> + clk_prepare_enable(pdata->clk);
>> + clk_disable_unprepare(pdata->clk);
>> + clk_prepare_enable(pdata->clk);
>
> Hi Iyappan
>
> Is that a workaround for a hardware problem? If so, i would suggest
> adding a comment, to stop people submitting a patch simplifying it.

Hardware expects this clock sequence.  I'll add comment as you suggested.

>
>
>> +static int xgene_mdio_probe(struct platform_device *pdev)
>> +{
>> + struct device *dev = >dev;
>> + struct mii_bus *mdio_bus;
>> + const struct of_device_id *of_id;
>> + struct resource *res;
>> + struct xgene_mdio_pdata *pdata;
>> + void __iomem *csr_addr;
>> + int mdio_id = 0, ret = 0;
>> +
>
>
>> + of_id = of_match_device(xgene_mdio_of_match, >dev);
>> + if (mdio_id == XGENE_MDIO_RGMII) {
>> + mdio_bus->read = xgene_mdio_rgmii_read;
>> + mdio_bus->write = xgene_mdio_rgmii_write;
>> + } else {
>> + mdio_bus->read = xgene_xfi_mdio_read;
>> + mdio_bus->write = xgene_xfi_mdio_write;
>> + }
>
>> +static const struct of_device_id xgene_mdio_of_match[] = {
>> + {
>> + .compatible = "apm,xgene-mdio-rgmii",
>> + .data = (void *)XGENE_MDIO_RGMII
>> + },
>> + {
>> + .compatible = "apm,xgene-mdio-xfi",
>> + .data = (void *)XGENE_MDIO_XFI},
>> + {},
>> +};
>
>
> This all makes me think you should have two separate MDIO drivers, one
> for each compatible string. There is not that much shared code.

I would like to keep the driver consistent with the ethernet driver.
Only the mdio read and write functions are hardware specific, and that
too implemented using function pointers.  Other parts of the code are
shared and much cleaner that way.

>
> Andrew


  1   2   3   >