date:20160630

On 16-06-30 08:23 AM, Saeed Mahameed wrote:
> From: Or Gerlitz 
> 
> Add the commands to set and show the mode of SRIOV E-Switch, two modes
> are supported:
> 
> * legacy: operating in the "old" L2 based mode (DMAC --> VF vport)
> 
> * switchdev: the E-Switch is referred to as whitebox switch configured
> using standard tools such as tc, bridge, openvswitch etc. To allow
> working with the tools, for each VF, a VF representor netdevice is
> created by the E-Switch manager vendor device driver instance (e.g PF).
> 
> Signed-off-by: Or Gerlitz 
> Signed-off-by: Saeed Mahameed 
> ---

OK I can't come up with a better name and Jiri/Or convinced me this
should work ok so this works for me.

One question though going forward. We have devices with multiple
"switches" in them how does this work in a devlink environment? Do
we need some way to enumerate the switches and identify them. In
which case this attribute would be a global setting.

Thanks,
John

Re: [PATCH iproute2] bridge: man: fix STP LISTENING description

On Wed, 29 Jun 2016 19:26:29 +
Vivien Didelot  wrote:

> Correct the unclear and poorly conjugated STP LISTENING documentation.
> 
> Signed-off-by: Vivien Didelot 

Applied

Re: [PATCH] ip route: timeout for routes has to be set in seconds

On Tue, 28 Jun 2016 23:27:14 +
Andrey Vagin  wrote:

> From: Andrew Vagin 
> 
> Currently a timeout is multiplied by HZ in user-space and
> then it multiplied by HZ in kernel-space.
> 
> $ ./ip/ip r add 2002::0/64 dev veth1 expires 10
> $ ./ip/ip -6 r
> 2002::/64 dev veth1  metric 1024 linkdown  expires 996sec pref medium
> 
> Cc: Xin Long 
> Cc: Hangbin Liu 
> Cc: Stephen Hemminger 
> Fixes: 68eede250500 ("route: allow routes to be configured with expire 
> values")
> Signed-off-by: Andrew Vagin 

Applied.

Re: [PATCH iproute2] bridge: man: fix BPUD typo

On Wed, 29 Jun 2016 19:26:10 +
Vivien Didelot  wrote:

> s/BPUD/BPDU/ in guard description.
> 
> Signed-off-by: Vivien Didelot 

Applied

Re: [PATCHv2 net-next 1/3] net: Add provision to specify pf number while assigning VF mac

2016-06-30 Thread Yuval Mintz

> Chelsio T4/T5 cards have SR-IOV Capabilities on Physical Functions
> 0..3 and the administrative Driver(cxgb4) attaches to Physical Function 4.
> Each of the Physical Functions 0..3 can support up to 16 Virtual
> Functions. With the current Linux APIs, a 2-Port card would only be
> able to use the Virtual Functions on Physical Functions 0..1 and not
> allow the Virtual Functions on Physical Functions 2..3 to be used since
> there are no Ports 2..3 on a 2-Port card.
>
> Also the current ip commands takes netdev as one of the argument, and
> it assumes a 1-to-1 mapping of Network Ports, Physical Functions and the
> SR-IOV Virtual Functions of those Physical Functions. But it is not
> true in our case and won't work for us.
> 
> Added a new argument to specify the PF number associated with the VF, to
> fix this.

I don't get it - what's the exact definition of 'Physical Function'?
Are we talking PCI functions? Logical partitons? Something else?

Re: [PATCH iproute2 net-next] bridge: vlan: add support to display per-vlan statistics

On Tue, 21 Jun 2016 18:11:59 +0200
Nikolay Aleksandrov  wrote:

> >> Thanks, this is a useful tool, but I think the formatting of output may 
> >> need to be
> >> reworked.  The bridge tool works similar to ip command. And in the ip 
> >> command the
> >> -s flag causes additional lines, but does not change the output format.  
> > 
> > Indeed, I agree that it needs refinement.
> >   
> 
> Or alternatively I can make it:
> $ bridge vlan stats
> a subcommand instead of using the "-s" argument in order to be consistent.
> So it can have its own format.

Why not:

$ bridge -s vlan show

to be consistent with:

$ ip -s li show

Re: [Patch net] net_sched: fix mirrored packets checksum

On Thu, Jun 30, 2016 at 4:26 PM, Cong Wang  wrote:
> On Thu, Jun 30, 2016 at 4:11 PM, Daniel Borkmann  wrote:
>> On 07/01/2016 12:42 AM, Cong Wang wrote:
>>>
>>> On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann 
>>> wrote:


 Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
 to the header then? Both seem similarly small at least (could be split
 f.e into two patches then, first for the move, second for the actual
 fix).
>>>
>>>
>>> No objection from me. Please feel free to send a patch. ;)
>>
>>
>> Shrug, I actually meant this as feedback to your patch, since you move that
>> helper and not as a note to myself. ;)
>
> Interesting, my patch only moves what it needs, why does it need
> to do more?

In case you miss the context:
http://marc.info/?l=linux-netdev=146730654005424=2

This patch should be backported to stable too, which is another
reason why we should keep it as small as possible.

Here, at Twitter, we already backported it to 4.1 kernel for testing.

(The reason why I don't have a Fixes: tag is that I don't identify an
offending commit to blame yet.)

Re: [Patch net] net_sched: fix mirrored packets checksum

On Thu, Jun 30, 2016 at 4:11 PM, Daniel Borkmann  wrote:
> On 07/01/2016 12:42 AM, Cong Wang wrote:
>>
>> On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann 
>> wrote:
>>>
>>>
>>> Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
>>> to the header then? Both seem similarly small at least (could be split
>>> f.e into two patches then, first for the move, second for the actual
>>> fix).
>>
>>
>> No objection from me. Please feel free to send a patch. ;)
>
>
> Shrug, I actually meant this as feedback to your patch, since you move that
> helper and not as a note to myself. ;)

Interesting, my patch only moves what it needs, why does it need
to do more?

Again, I am not against your idea, just 1) it doesn't belong to my patch
2) I am too lazy to create a patch for it, or, I am perfectly fine with not
moving it too ;)

Re: [PATCH 5/7] net: ethernet: bgmac: Add platform device support

[snip]

+
> + return 0;
> +
> +err2:
> + devm_iounmap(>dev, bgmac->plat.idm_base);
> +err1:
> + devm_iounmap(>dev, bgmac->plat.base);
> +err:
> + devm_kfree(>dev, bgmac);


This is not needed actually, now that you use the device managed helper
functions.

> +
> + return rc;
> +}
> +
> +static int bgmac_remove(struct platform_device *pdev)
> +{
> + struct bgmac *bgmac = platform_get_drvdata(pdev);
> +
> + bgmac_enet_remove(bgmac);
> + devm_iounmap(>dev, bgmac->plat.idm_base);
> + devm_iounmap(>dev, bgmac->plat.base);
> + devm_kfree(>dev, bgmac);

Same here.
-- 
Florian

[PATCH net-next 5/9] RDS: TCP: make ->sk_user_data point to a rds_conn_path

The socket callbacks should all operate on a struct rds_conn_path,
in preparation for a MP capable RDS-TCP.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |   25 +
 net/rds/tcp.h |4 ++--
 net/rds/tcp_connect.c |   16 
 net/rds/tcp_listen.c  |   12 ++--
 net/rds/tcp_recv.c|   12 ++--
 net/rds/tcp_send.c|   12 ++--
 6 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index b327727..5658f3e 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -136,9 +136,9 @@ void rds_tcp_restore_callbacks(struct socket *sock,
  * from being called while it isn't set.
  */
 void rds_tcp_reset_callbacks(struct socket *sock,
-struct rds_connection *conn)
+struct rds_conn_path *cp)
 {
-   struct rds_tcp_connection *tc = conn->c_transport_data;
+   struct rds_tcp_connection *tc = cp->cp_transport_data;
struct socket *osock = tc->t_sock;
 
if (!osock)
@@ -148,8 +148,8 @@ void rds_tcp_reset_callbacks(struct socket *sock,
 * We have an outstanding SYN to this peer, which may
 * potentially have transitioned to the RDS_CONN_UP state,
 * so we must quiesce any send threads before resetting
-* c_transport_data. We quiesce these threads by setting
-* c_state to something other than RDS_CONN_UP, and then
+* cp_transport_data. We quiesce these threads by setting
+* cp_state to something other than RDS_CONN_UP, and then
 * waiting for any existing threads in rds_send_xmit to
 * complete release_in_xmit(). (Subsequent threads entering
 * rds_send_xmit() will bail on !rds_conn_up().
@@ -164,8 +164,8 @@ void rds_tcp_reset_callbacks(struct socket *sock,
 * RDS_CONN_RESETTTING, to ensure that rds_tcp_state_change
 * cannot mark rds_conn_path_up() in the window before lock_sock()
 */
-   atomic_set(>c_state, RDS_CONN_RESETTING);
-   wait_event(conn->c_waitq, !test_bit(RDS_IN_XMIT, >c_flags));
+   atomic_set(>cp_state, RDS_CONN_RESETTING);
+   wait_event(cp->cp_waitq, !test_bit(RDS_IN_XMIT, >cp_flags));
lock_sock(osock->sk);
/* reset receive side state for rds_tcp_data_recv() for osock  */
if (tc->t_tinc) {
@@ -186,11 +186,12 @@ void rds_tcp_reset_callbacks(struct socket *sock,
release_sock(osock->sk);
sock_release(osock);
 newsock:
-   rds_send_path_reset(>c_path[0]);
+   rds_send_path_reset(cp);
lock_sock(sock->sk);
write_lock_bh(>sk->sk_callback_lock);
tc->t_sock = sock;
-   sock->sk->sk_user_data = conn;
+   tc->t_cpath = cp;
+   sock->sk->sk_user_data = cp;
sock->sk->sk_data_ready = rds_tcp_data_ready;
sock->sk->sk_write_space = rds_tcp_write_space;
sock->sk->sk_state_change = rds_tcp_state_change;
@@ -203,9 +204,9 @@ void rds_tcp_reset_callbacks(struct socket *sock,
  * above rds_tcp_reset_callbacks for notes about synchronization
  * with data path
  */
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn)
+void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp)
 {
-   struct rds_tcp_connection *tc = conn->c_transport_data;
+   struct rds_tcp_connection *tc = cp->cp_transport_data;
 
rdsdebug("setting sock %p callbacks to tc %p\n", sock, tc);
write_lock_bh(>sk->sk_callback_lock);
@@ -221,12 +222,12 @@ void rds_tcp_set_callbacks(struct socket *sock, struct 
rds_connection *conn)
sock->sk->sk_data_ready = sock->sk->sk_user_data;
 
tc->t_sock = sock;
-   tc->t_cpath = >c_path[0];
+   tc->t_cpath = cp;
tc->t_orig_data_ready = sock->sk->sk_data_ready;
tc->t_orig_write_space = sock->sk->sk_write_space;
tc->t_orig_state_change = sock->sk->sk_state_change;
 
-   sock->sk->sk_user_data = conn;
+   sock->sk->sk_user_data = cp;
sock->sk->sk_data_ready = rds_tcp_data_ready;
sock->sk->sk_write_space = rds_tcp_write_space;
sock->sk->sk_state_change = rds_tcp_state_change;
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index e1ff169..151b09d 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -49,8 +49,8 @@ struct rds_tcp_statistics {
 /* tcp.c */
 void rds_tcp_tune(struct socket *sock);
 void rds_tcp_nonagle(struct socket *sock);
-void rds_tcp_set_callbacks(struct socket *sock, struct rds_connection *conn);
-void rds_tcp_reset_callbacks(struct socket *sock, struct rds_connection *conn);
+void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
+void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_restore_callbacks(struct socket *sock,
   struct rds_tcp_connection *tc);
 u32 rds_tcp_snd_nxt(struct

[PATCH net-next 6/9] RDS: TCP: make receive path use the rds_conn_path

The ->sk_user_data contains a pointer to the rds_conn_path
for the socket. Use this consistently in the rds_tcp_data_ready
callbacks to get the rds_conn_path for rds_recv_incoming.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/ib.c   |2 +-
 net/rds/ib.h   |2 +-
 net/rds/ib_recv.c  |3 ++-
 net/rds/loop.c |4 ++--
 net/rds/rds.h  |2 +-
 net/rds/tcp.c  |2 +-
 net/rds/tcp.h  |2 +-
 net/rds/tcp_recv.c |   29 -
 net/rds/threads.c  |2 +-
 9 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 1b29ec9..e6ba856 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -385,7 +385,7 @@ struct rds_transport rds_ib_transport = {
.xmit   = rds_ib_xmit,
.xmit_rdma  = rds_ib_xmit_rdma,
.xmit_atomic= rds_ib_xmit_atomic,
-   .recv   = rds_ib_recv,
+   .recv_path  = rds_ib_recv_path,
.conn_alloc = rds_ib_conn_alloc,
.conn_free  = rds_ib_conn_free,
.conn_connect   = rds_ib_conn_connect,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 2051f4b..579de7e 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -354,7 +354,7 @@ void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, 
struct ib_wc *wc);
 /* ib_recv.c */
 int rds_ib_recv_init(void);
 void rds_ib_recv_exit(void);
-int rds_ib_recv(struct rds_connection *conn);
+int rds_ib_recv_path(struct rds_conn_path *conn);
 int rds_ib_recv_alloc_caches(struct rds_ib_connection *ic);
 void rds_ib_recv_free_caches(struct rds_ib_connection *ic);
 void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 4ea8cb1..606a11f 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -1009,8 +1009,9 @@ void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic,
rds_ib_recv_refill(conn, 0, GFP_NOWAIT);
 }
 
-int rds_ib_recv(struct rds_connection *conn)
+int rds_ib_recv_path(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
int ret = 0;
 
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 318c21d..20284a4 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -102,7 +102,7 @@ static void rds_loop_inc_free(struct rds_incoming *inc)
 }
 
 /* we need to at least give the thread something to succeed */
-static int rds_loop_recv(struct rds_connection *conn)
+static int rds_loop_recv_path(struct rds_conn_path *cp)
 {
return 0;
 }
@@ -185,7 +185,7 @@ void rds_loop_exit(void)
  */
 struct rds_transport rds_loop_transport = {
.xmit   = rds_loop_xmit,
-   .recv   = rds_loop_recv,
+   .recv_path  = rds_loop_recv_path,
.conn_alloc = rds_loop_conn_alloc,
.conn_free  = rds_loop_conn_free,
.conn_connect   = rds_loop_conn_connect,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 5bbad08..0faca30 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -462,7 +462,7 @@ struct rds_transport {
unsigned int hdr_off, unsigned int sg, unsigned int off);
int (*xmit_rdma)(struct rds_connection *conn, struct rm_rdma_op *op);
int (*xmit_atomic)(struct rds_connection *conn, struct rm_atomic_op 
*op);
-   int (*recv)(struct rds_connection *conn);
+   int (*recv_path)(struct rds_conn_path *cp);
int (*inc_copy_to_user)(struct rds_incoming *inc, struct iov_iter *to);
void (*inc_free)(struct rds_incoming *inc);
 
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 5658f3e..7bc136c 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -359,7 +359,7 @@ struct rds_transport rds_tcp_transport = {
.xmit_path_prepare  = rds_tcp_xmit_path_prepare,
.xmit_path_complete = rds_tcp_xmit_path_complete,
.xmit   = rds_tcp_xmit,
-   .recv   = rds_tcp_recv,
+   .recv_path  = rds_tcp_recv_path,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free  = rds_tcp_conn_free,
.conn_connect   = rds_tcp_conn_connect,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 151b09d..5a5f91a 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -75,7 +75,7 @@ int rds_tcp_keepalive(struct socket *sock);
 int rds_tcp_recv_init(void);
 void rds_tcp_recv_exit(void);
 void rds_tcp_data_ready(struct sock *sk);
-int rds_tcp_recv(struct rds_connection *conn);
+int rds_tcp_recv_path(struct rds_conn_path *cp);
 void rds_tcp_inc_free(struct rds_incoming *inc);
 int rds_tcp_inc_copy_to_user(struct rds_incoming *inc, struct iov_iter *to);
 
diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
index aa7a79a..ad4892e 100644
--- a/net/rds/tcp_recv.c

[PATCH net-next 1/9] RDS: Rework path specific indirections

Refactor code to avoid separate indirections for single-path
and multipath transports. All transports (both single and mp-capable)
will get a pointer to the rds_conn_path, and can trivially derive
the rds_connection from the ->cp_conn.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c  |5 +
 net/rds/ib.c  |4 ++--
 net/rds/ib.h  |4 ++--
 net/rds/ib_cm.c   |3 ++-
 net/rds/ib_send.c |3 ++-
 net/rds/loop.c|4 ++--
 net/rds/rds.h |3 ---
 net/rds/send.c|   16 
 net/rds/tcp.c |6 +++---
 net/rds/tcp.h |6 +++---
 net/rds/tcp_connect.c |7 ---
 net/rds/tcp_send.c|8 
 12 files changed, 29 insertions(+), 40 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index a4b07c8..17c2f25 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -326,10 +326,7 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
wait_event(cp->cp_waitq,
   !test_bit(RDS_RECV_REFILL, >cp_flags));
 
-   if (!conn->c_trans->t_mp_capable)
-   conn->c_trans->conn_shutdown(conn);
-   else
-   conn->c_trans->conn_path_shutdown(cp);
+   conn->c_trans->conn_path_shutdown(cp);
rds_conn_path_reset(cp);
 
if (!rds_conn_path_transition(cp, RDS_CONN_DISCONNECTING,
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 44946a6..1b29ec9 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -381,7 +381,7 @@ void rds_ib_exit(void)
 
 struct rds_transport rds_ib_transport = {
.laddr_check= rds_ib_laddr_check,
-   .xmit_complete  = rds_ib_xmit_complete,
+   .xmit_path_complete = rds_ib_xmit_path_complete,
.xmit   = rds_ib_xmit,
.xmit_rdma  = rds_ib_xmit_rdma,
.xmit_atomic= rds_ib_xmit_atomic,
@@ -389,7 +389,7 @@ struct rds_transport rds_ib_transport = {
.conn_alloc = rds_ib_conn_alloc,
.conn_free  = rds_ib_conn_free,
.conn_connect   = rds_ib_conn_connect,
-   .conn_shutdown  = rds_ib_conn_shutdown,
+   .conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user   = rds_ib_inc_copy_to_user,
.inc_free   = rds_ib_inc_free,
.cm_initiate_connect= rds_ib_cm_initiate_connect,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 627fb79..2051f4b 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -329,7 +329,7 @@ extern struct list_head ib_nodev_conns;
 int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
 void rds_ib_conn_free(void *arg);
 int rds_ib_conn_connect(struct rds_connection *conn);
-void rds_ib_conn_shutdown(struct rds_connection *conn);
+void rds_ib_conn_path_shutdown(struct rds_conn_path *cp);
 void rds_ib_state_change(struct sock *sk);
 int rds_ib_listen_init(void);
 void rds_ib_listen_stop(void);
@@ -384,7 +384,7 @@ u32 rds_ib_ring_completed(struct rds_ib_work_ring *ring, 
u32 wr_id, u32 oldest);
 extern wait_queue_head_t rds_ib_ring_empty_wait;
 
 /* ib_send.c */
-void rds_ib_xmit_complete(struct rds_connection *conn);
+void rds_ib_xmit_path_complete(struct rds_conn_path *cp);
 int rds_ib_xmit(struct rds_connection *conn, struct rds_message *rm,
unsigned int hdr_off, unsigned int sg, unsigned int off);
 void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index e48bb1b..e34ea0b 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -731,8 +731,9 @@ int rds_ib_conn_connect(struct rds_connection *conn)
  * so that it can be called at any point during startup.  In fact it
  * can be called multiple times for a given connection.
  */
-void rds_ib_conn_shutdown(struct rds_connection *conn)
+void rds_ib_conn_path_shutdown(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
int err = 0;
 
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 6e4110a..84d90c9 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -980,8 +980,9 @@ int rds_ib_xmit_rdma(struct rds_connection *conn, struct 
rm_rdma_op *op)
return ret;
 }
 
-void rds_ib_xmit_complete(struct rds_connection *conn)
+void rds_ib_xmit_path_complete(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
 
/* We may have a pending ACK or window update we were unable
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 15f83db..318c21d 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -156,7 +156,7 @@ static int rds_loop_conn_connect(struct rds_connection 
*conn)
return 0;
 }
 
-static void

[PATCH net-next 2/9] RDS: TCP: Remove dead logic around c_passive in rds-tcp

The c_passive bit is only intended for the IB transport and will
never be encountered in rds-tcp, so remove the dead logic that
predicates on this bit.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index b139630..c56fff2 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -329,11 +329,8 @@ static void rds_tcp_destroy_conns(void)
INIT_LIST_HEAD(_tcp_conn_list);
spin_unlock_irq(_tcp_conn_lock);
 
-   list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) {
-   if (tc->conn->c_passive)
-   rds_conn_destroy(tc->conn->c_passive);
+   list_for_each_entry_safe(tc, _tc, _list, t_tcp_node)
rds_conn_destroy(tc->conn);
-   }
 }
 
 static void rds_tcp_exit(void);
@@ -512,8 +509,6 @@ static void rds_tcp_kill_sock(struct net *net)
sk = tc->t_sock->sk;
sk->sk_prot->disconnect(sk, 0);
tcp_done(sk);
-   if (tc->conn->c_passive)
-   rds_conn_destroy(tc->conn->c_passive);
rds_conn_destroy(tc->conn);
}
 }
-- 
1.7.1

[PATCH net-next 9/9] RDS: Do not send a pong to an incoming ping with 0 src port

RDS ping messages are sent with a non-zero src port to a zero
dst port, so that the rds pong messages can be sent back to the
originators src port. However if a confused/malicious sender
sends a ping with a 0 src port, we'd have an infinite ping-pong
loop. To avoid this, the receiver should ignore ping messages
with a 0 src port.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/recv.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/rds/recv.c b/net/rds/recv.c
index b58f505..fed53a6 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -226,6 +226,10 @@ void rds_recv_incoming(struct rds_connection *conn, __be32 
saddr, __be32 daddr,
cp->cp_next_rx_seq = be64_to_cpu(inc->i_hdr.h_sequence) + 1;
 
if (rds_sysctl_ping_enable && inc->i_hdr.h_dport == 0) {
+   if (inc->i_hdr.h_sport == 0) {
+   rdsdebug("ignore ping with 0 sport from 0x%x\n", saddr);
+   goto out;
+   }
rds_stats_inc(s_recv_ping);
rds_send_pong(cp, inc->i_hdr.h_sport);
goto out;
-- 
1.7.1

[PATCH net-next 8/9] RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts

When reconnecting, the peer with the smaller IP address will initiate
the reconnect, to avoid needless duelling SYN issues.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c |4 +---
 net/rds/threads.c|5 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 1b0c2a7..19a4fee 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -355,9 +355,7 @@ void rds_conn_shutdown(struct rds_conn_path *cp)
rcu_read_lock();
if (!hlist_unhashed(>c_hash_node)) {
rcu_read_unlock();
-   if (conn->c_trans->t_type != RDS_TRANS_TCP ||
-   cp->cp_outgoing == 1)
-   rds_queue_reconnect(cp);
+   rds_queue_reconnect(cp);
} else {
rcu_read_unlock();
}
diff --git a/net/rds/threads.c b/net/rds/threads.c
index e8f0941..bc97d67 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -125,6 +125,11 @@ void rds_queue_reconnect(struct rds_conn_path *cp)
  conn, >c_laddr, >c_faddr,
  cp->cp_reconnect_jiffies);
 
+   /* let peer with smaller addr initiate reconnect, to avoid duels */
+   if (conn->c_trans->t_type == RDS_TRANS_TCP &&
+   conn->c_laddr > conn->c_faddr)
+   return;
+
set_bit(RDS_RECONNECT_PENDING, >cp_flags);
if (cp->cp_reconnect_jiffies == 0) {
cp->cp_reconnect_jiffies = rds_sysctl_reconnect_min_jiffies;
-- 
1.7.1

[PATCH net-next 4/9] RDS: TCP: Refactor connection destruction to handle multiple paths

A single rds_connection may have multiple rds_conn_paths that have
to be carefully and correctly destroyed, for both rmmod and
netns-delete cases.

For both cases, we extract a single rds_tcp_connection for
each conn into a temporary list, and then invoke rds_conn_destroy()
which iteratively dismantles every path in the rds_connection.

For the netns deletion case, we additionally have to make sure
that we do not leave a socket in TIME_WAIT state, as this will
hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy()
to reset state quickly.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/tcp.c |   46 +++---
 1 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c6b47f6..b327727 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -323,6 +323,17 @@ static void rds_tcp_conn_free(void *arg)
kmem_cache_free(rds_tcp_conn_slab, tc);
 }
 
+static bool list_has_conn(struct list_head *list, struct rds_connection *conn)
+{
+   struct rds_tcp_connection *tc, *_tc;
+
+   list_for_each_entry_safe(tc, _tc, list, t_tcp_node) {
+   if (tc->t_cpath->cp_conn == conn)
+   return true;
+   }
+   return false;
+}
+
 static void rds_tcp_destroy_conns(void)
 {
struct rds_tcp_connection *tc, *_tc;
@@ -330,8 +341,10 @@ static void rds_tcp_destroy_conns(void)
 
/* avoid calling conn_destroy with irqs off */
spin_lock_irq(_tcp_conn_lock);
-   list_splice(_tcp_conn_list, _list);
-   INIT_LIST_HEAD(_tcp_conn_list);
+   list_for_each_entry_safe(tc, _tc, _tcp_conn_list, t_tcp_node) {
+   if (!list_has_conn(_list, tc->t_cpath->cp_conn))
+   list_move_tail(>t_tcp_node, _list);
+   }
spin_unlock_irq(_tcp_conn_lock);
 
list_for_each_entry_safe(tc, _tc, _list, t_tcp_node)
@@ -491,10 +504,30 @@ static struct pernet_operations rds_tcp_net_ops = {
.size = sizeof(struct rds_tcp_net),
 };
 
+/* explicitly send a RST on each socket, thereby releasing any socket refcnts
+ * that may otherwise hold up netns deletion.
+ */
+static void rds_tcp_conn_paths_destroy(struct rds_connection *conn)
+{
+   struct rds_conn_path *cp;
+   struct rds_tcp_connection *tc;
+   int i;
+   struct sock *sk;
+
+   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+   cp = >c_path[i];
+   tc = cp->cp_transport_data;
+   if (!tc->t_sock)
+   continue;
+   sk = tc->t_sock->sk;
+   sk->sk_prot->disconnect(sk, 0);
+   tcp_done(sk);
+   }
+}
+
 static void rds_tcp_kill_sock(struct net *net)
 {
struct rds_tcp_connection *tc, *_tc;
-   struct sock *sk;
LIST_HEAD(tmp_list);
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
 
@@ -507,13 +540,12 @@ static void rds_tcp_kill_sock(struct net *net)
 
if (net != c_net || !tc->t_sock)
continue;
-   list_move_tail(>t_tcp_node, _list);
+   if (!list_has_conn(_list, tc->t_cpath->cp_conn))
+   list_move_tail(>t_tcp_node, _list);
}
spin_unlock_irq(_tcp_conn_lock);
list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) {
-   sk = tc->t_sock->sk;
-   sk->sk_prot->disconnect(sk, 0);
-   tcp_done(sk);
+   rds_tcp_conn_paths_destroy(tc->t_cpath->cp_conn);
rds_conn_destroy(tc->t_cpath->cp_conn);
}
 }
-- 
1.7.1

[PATCH net-next 7/9] RDS: TCP: Hooks to set up a single connection path

This patch adds ->conn_path_connect callbacks in the rds_transport
that are used to set up a single connection path.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/ib.c  |2 +-
 net/rds/ib.h  |2 +-
 net/rds/ib_cm.c   |3 ++-
 net/rds/loop.c|6 +++---
 net/rds/rds.h |2 +-
 net/rds/tcp.c |2 +-
 net/rds/tcp.h |4 ++--
 net/rds/tcp_connect.c |   11 ++-
 net/rds/threads.c |5 +++--
 9 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index e6ba856..7eaf887 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -388,7 +388,7 @@ struct rds_transport rds_ib_transport = {
.recv_path  = rds_ib_recv_path,
.conn_alloc = rds_ib_conn_alloc,
.conn_free  = rds_ib_conn_free,
-   .conn_connect   = rds_ib_conn_connect,
+   .conn_path_connect  = rds_ib_conn_path_connect,
.conn_path_shutdown = rds_ib_conn_path_shutdown,
.inc_copy_to_user   = rds_ib_inc_copy_to_user,
.inc_free   = rds_ib_inc_free,
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 579de7e..046f750 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -328,7 +328,7 @@ extern struct list_head ib_nodev_conns;
 /* ib_cm.c */
 int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
 void rds_ib_conn_free(void *arg);
-int rds_ib_conn_connect(struct rds_connection *conn);
+int rds_ib_conn_path_connect(struct rds_conn_path *cp);
 void rds_ib_conn_path_shutdown(struct rds_conn_path *cp);
 void rds_ib_state_change(struct sock *sk);
 int rds_ib_listen_init(void);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index e34ea0b..5b2ab95 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -685,8 +685,9 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id)
return ret;
 }
 
-int rds_ib_conn_connect(struct rds_connection *conn)
+int rds_ib_conn_path_connect(struct rds_conn_path *cp)
 {
+   struct rds_connection *conn = cp->cp_conn;
struct rds_ib_connection *ic = conn->c_transport_data;
struct sockaddr_in src, dest;
int ret;
diff --git a/net/rds/loop.c b/net/rds/loop.c
index 20284a4..f2bf78d 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -150,9 +150,9 @@ static void rds_loop_conn_free(void *arg)
kfree(lc);
 }
 
-static int rds_loop_conn_connect(struct rds_connection *conn)
+static int rds_loop_conn_path_connect(struct rds_conn_path *cp)
 {
-   rds_connect_complete(conn);
+   rds_connect_complete(cp->cp_conn);
return 0;
 }
 
@@ -188,7 +188,7 @@ struct rds_transport rds_loop_transport = {
.recv_path  = rds_loop_recv_path,
.conn_alloc = rds_loop_conn_alloc,
.conn_free  = rds_loop_conn_free,
-   .conn_connect   = rds_loop_conn_connect,
+   .conn_path_connect  = rds_loop_conn_path_connect,
.conn_path_shutdown = rds_loop_conn_path_shutdown,
.inc_copy_to_user   = rds_message_inc_copy_to_user,
.inc_free   = rds_loop_inc_free,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 0faca30..6ef07bd 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -454,7 +454,7 @@ struct rds_transport {
int (*laddr_check)(struct net *net, __be32 addr);
int (*conn_alloc)(struct rds_connection *conn, gfp_t gfp);
void (*conn_free)(void *data);
-   int (*conn_connect)(struct rds_connection *conn);
+   int (*conn_path_connect)(struct rds_conn_path *cp);
void (*conn_path_shutdown)(struct rds_conn_path *conn);
void (*xmit_path_prepare)(struct rds_conn_path *cp);
void (*xmit_path_complete)(struct rds_conn_path *cp);
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 7bc136c..d278432 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -362,7 +362,7 @@ struct rds_transport rds_tcp_transport = {
.recv_path  = rds_tcp_recv_path,
.conn_alloc = rds_tcp_conn_alloc,
.conn_free  = rds_tcp_conn_free,
-   .conn_connect   = rds_tcp_conn_connect,
+   .conn_path_connect  = rds_tcp_conn_path_connect,
.conn_path_shutdown = rds_tcp_conn_path_shutdown,
.inc_copy_to_user   = rds_tcp_inc_copy_to_user,
.inc_free   = rds_tcp_inc_free,
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index 5a5f91a..1c3160f 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -13,7 +13,7 @@ struct rds_tcp_connection {
struct list_headt_tcp_node;
struct rds_conn_path*t_cpath;
/* t_conn_path_lock synchronizes the connection establishment between
-* rds_tcp_accept_one and rds_tcp_conn_connect
+* rds_tcp_accept_one and rds_tcp_conn_path_connect
 */
struct mutext_conn_path_lock;
struct socket

[PATCH net-next 3/9] RDS: TCP: Make rds_tcp_connection track the rds_conn_path

The struct rds_tcp_connection is the transport-specific private
data structure that tracks TCP information per rds_conn_path.
Modify this structure to have a back-pointer to the rds_conn_path
for which it is the ->cp_transport_data.

Acked-by: Santosh Shilimkar 
Signed-off-by: Sowmini Varadhan 
---
 net/rds/connection.c  |   30 +++---
 net/rds/tcp.c |   44 +---
 net/rds/tcp.h |6 +++---
 net/rds/tcp_connect.c |6 +++---
 net/rds/tcp_listen.c  |4 ++--
 5 files changed, 48 insertions(+), 42 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 17c2f25..1b0c2a7 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -253,9 +253,12 @@ static struct rds_connection *__rds_conn_create(struct net 
*net,
 
for (i = 0; i < RDS_MPATH_WORKERS; i++) {
cp = >c_path[i];
-   trans->conn_free(cp->cp_transport_data);
-   if (!trans->t_mp_capable)
-   break;
+   /* The ->conn_alloc invocation may have
+* allocated resource for all paths, so all
+* of them may have to be freed here.
+*/
+   if (cp->cp_transport_data)
+   trans->conn_free(cp->cp_transport_data);
}
kmem_cache_free(rds_conn_slab, conn);
conn = found;
@@ -367,6 +370,9 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp)
 {
struct rds_message *rm, *rtmp;
 
+   if (!cp->cp_transport_data)
+   return;
+
rds_conn_path_drop(cp);
flush_work(>cp_down_w);
 
@@ -398,6 +404,8 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp)
 void rds_conn_destroy(struct rds_connection *conn)
 {
unsigned long flags;
+   int i;
+   struct rds_conn_path *cp;
 
rdsdebug("freeing conn %p for %pI4 -> "
 "%pI4\n", conn, >c_laddr,
@@ -410,18 +418,10 @@ void rds_conn_destroy(struct rds_connection *conn)
synchronize_rcu();
 
/* shut the connection down */
-   if (!conn->c_trans->t_mp_capable) {
-   rds_conn_path_destroy(>c_path[0]);
-   BUG_ON(!list_empty(>c_path[0].cp_retrans));
-   } else {
-   int i;
-   struct rds_conn_path *cp;
-
-   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
-   cp = >c_path[i];
-   rds_conn_path_destroy(cp);
-   BUG_ON(!list_empty(>cp_retrans));
-   }
+   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+   cp = >c_path[i];
+   rds_conn_path_destroy(cp);
+   BUG_ON(!list_empty(>cp_retrans));
}
 
/*
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c56fff2..c6b47f6 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -221,7 +221,7 @@ void rds_tcp_set_callbacks(struct socket *sock, struct 
rds_connection *conn)
sock->sk->sk_data_ready = sock->sk->sk_user_data;
 
tc->t_sock = sock;
-   tc->conn = conn;
+   tc->t_cpath = >c_path[0];
tc->t_orig_data_ready = sock->sk->sk_data_ready;
tc->t_orig_write_space = sock->sk->sk_write_space;
tc->t_orig_state_change = sock->sk->sk_state_change;
@@ -284,24 +284,29 @@ static int rds_tcp_laddr_check(struct net *net, __be32 
addr)
 static int rds_tcp_conn_alloc(struct rds_connection *conn, gfp_t gfp)
 {
struct rds_tcp_connection *tc;
+   int i;
 
-   tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp);
-   if (!tc)
-   return -ENOMEM;
+   for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+   tc = kmem_cache_alloc(rds_tcp_conn_slab, gfp);
+   if (!tc)
+   return -ENOMEM;
 
-   mutex_init(>t_conn_lock);
-   tc->t_sock = NULL;
-   tc->t_tinc = NULL;
-   tc->t_tinc_hdr_rem = sizeof(struct rds_header);
-   tc->t_tinc_data_rem = 0;
+   mutex_init(>t_conn_path_lock);
+   tc->t_sock = NULL;
+   tc->t_tinc = NULL;
+   tc->t_tinc_hdr_rem = sizeof(struct rds_header);
+   tc->t_tinc_data_rem = 0;
 
-   conn->c_transport_data = tc;
+   conn->c_path[i].cp_transport_data = tc;
+   tc->t_cpath = >c_path[i];
 
-   spin_lock_irq(_tcp_conn_lock);
-   list_add_tail(>t_tcp_node, _tcp_conn_list);
-   spin_unlock_irq(_tcp_conn_lock);
+   spin_lock_irq(_tcp_conn_lock);
+   list_add_tail(>t_tcp_node, _tcp_conn_list);
+   spin_unlock_irq(_tcp_conn_lock);
+   rdsdebug("rds_conn_path [%d] tc %p\n", i,
+

[PATCH net-next 0/9] RDS:TCP data structure changes for multipath support

The second installment of changes to enable multipath support in
RDS-TCP. This series implements the changes in rds-tcp so that the 
rds_conn_path has a pointer to the rds_tcp_connection in cp_transport_data.
Struct rds_tcp_connection keeps track of the inet_sk per path in
t_sock. The ->sk_user_data in turn is a pointer to the rds_conn_path.
With this set of changes, rds_tcp has the needed plumbing to handle
multiple paths(socket) per rds_connection.

Sowmini Varadhan (9):
  RDS: Rework path specific indirections
  RDS: TCP: Remove dead logic around c_passive in rds-tcp
  RDS: TCP: Make rds_tcp_connection track the rds_conn_path
  RDS: TCP: Refactor connection destruction to handle multiple paths
  RDS: TCP: make ->sk_user_data point to a rds_conn_path
  RDS: TCP: make receive path use the rds_conn_path
  RDS: TCP: Hooks to set up a single connection path
  RDS: TCP: Simplify reconnect to avoid duelling reconnnect attempts
  RDS: Do not send a pong to an incoming ping with 0 src port

 net/rds/connection.c  |   39 ++
 net/rds/ib.c  |8 ++--
 net/rds/ib.h  |8 ++--
 net/rds/ib_cm.c   |6 ++-
 net/rds/ib_recv.c |3 +-
 net/rds/ib_send.c |3 +-
 net/rds/loop.c|   14 +++---
 net/rds/rds.h |7 +--
 net/rds/recv.c|4 ++
 net/rds/send.c|   16 ++-
 net/rds/tcp.c |  130 +++--
 net/rds/tcp.h |   22 
 net/rds/tcp_connect.c |   38 ---
 net/rds/tcp_listen.c  |   16 +++---
 net/rds/tcp_recv.c|   39 ---
 net/rds/tcp_send.c|   20 
 net/rds/threads.c |   12 +++-
 17 files changed, 211 insertions(+), 174 deletions(-)

Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Daniel Borkmann


On 07/01/2016 12:42 AM, Cong Wang wrote:

On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann  wrote:


Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
to the header then? Both seem similarly small at least (could be split
f.e into two patches then, first for the move, second for the actual fix).


No objection from me. Please feel free to send a patch. ;)


Shrug, I actually meant this as feedback to your patch, since you move that
helper and not as a note to myself. ;)

Thanks,
Daniel

[PATCH 3/7] net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file

Move the BCMA MDIO phy into a separate file, as it is very tightly
coupled with the BCMA bus.  This will help with the upcoming BCMA
removal from the bgmac driver.  Optimally, this should be moved into
phy drivers, but it is too tightly coupled with the bgmac driver to
effectively move it without more changes to the driver.

Note: the phy_reset was intentionally removed, as the mdio phy subsystem
automatically resets the phy if a reset function pointer is present.  In
addition to the moving of the driver, this reset function is added.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/Makefile  |   2 +-
 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c | 264 
 drivers/net/ethernet/broadcom/bgmac.c   | 246 +++---
 drivers/net/ethernet/broadcom/bgmac.h   |   3 +
 4 files changed, 298 insertions(+), 217 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c

diff --git a/drivers/net/ethernet/broadcom/Makefile 
b/drivers/net/ethernet/broadcom/Makefile
index 00584d7..f559794 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -10,6 +10,6 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
-obj-$(CONFIG_BGMAC) += bgmac.o
+obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
 obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
 obj-$(CONFIG_BNXT) += bnxt/
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
new file mode 100644
index 000..1e65349
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
@@ -0,0 +1,264 @@
+/*
+ * Driver for (BCM4706)? GBit MAC core on BCMA bus.
+ *
+ * Copyright (C) 2012 Rafał Miłecki 
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include "bgmac.h"
+
+struct bcma_mdio {
+   struct bcma_device *core;
+   u8 phyaddr;
+};
+
+static bool bcma_mdio_wait_value(struct bcma_device *core, u16 reg, u32 mask,
+u32 value, int timeout)
+{
+   u32 val;
+   int i;
+
+   for (i = 0; i < timeout / 10; i++) {
+   val = bcma_read32(core, reg);
+   if ((val & mask) == value)
+   return true;
+   udelay(10);
+   }
+   dev_err(>dev, "Timeout waiting for reg 0x%X\n", reg);
+   return false;
+}
+
+/**
+ * PHY ops
+ **/
+
+static u16 bcma_mdio_phy_read(struct bcma_mdio *bcma_mdio, u8 phyaddr, u8 reg)
+{
+   struct bcma_device *core;
+   u16 phy_access_addr;
+   u16 phy_ctl_addr;
+   u32 tmp;
+
+   BUILD_BUG_ON(BGMAC_PA_DATA_MASK != BCMA_GMAC_CMN_PA_DATA_MASK);
+   BUILD_BUG_ON(BGMAC_PA_ADDR_MASK != BCMA_GMAC_CMN_PA_ADDR_MASK);
+   BUILD_BUG_ON(BGMAC_PA_ADDR_SHIFT != BCMA_GMAC_CMN_PA_ADDR_SHIFT);
+   BUILD_BUG_ON(BGMAC_PA_REG_MASK != BCMA_GMAC_CMN_PA_REG_MASK);
+   BUILD_BUG_ON(BGMAC_PA_REG_SHIFT != BCMA_GMAC_CMN_PA_REG_SHIFT);
+   BUILD_BUG_ON(BGMAC_PA_WRITE != BCMA_GMAC_CMN_PA_WRITE);
+   BUILD_BUG_ON(BGMAC_PA_START != BCMA_GMAC_CMN_PA_START);
+   BUILD_BUG_ON(BGMAC_PC_EPA_MASK != BCMA_GMAC_CMN_PC_EPA_MASK);
+   BUILD_BUG_ON(BGMAC_PC_MCT_MASK != BCMA_GMAC_CMN_PC_MCT_MASK);
+   BUILD_BUG_ON(BGMAC_PC_MCT_SHIFT != BCMA_GMAC_CMN_PC_MCT_SHIFT);
+   BUILD_BUG_ON(BGMAC_PC_MTE != BCMA_GMAC_CMN_PC_MTE);
+
+   if (bcma_mdio->core->id.id == BCMA_CORE_4706_MAC_GBIT) {
+   core = bcma_mdio->core->bus->drv_gmac_cmn.core;
+   phy_access_addr = BCMA_GMAC_CMN_PHY_ACCESS;
+   phy_ctl_addr = BCMA_GMAC_CMN_PHY_CTL;
+   } else {
+   core = bcma_mdio->core;
+   phy_access_addr = BGMAC_PHY_ACCESS;
+   phy_ctl_addr = BGMAC_PHY_CNTL;
+   }
+
+   tmp = bcma_read32(core, phy_ctl_addr);
+   tmp &= ~BGMAC_PC_EPA_MASK;
+   tmp |= phyaddr;
+   bcma_write32(core, phy_ctl_addr, tmp);
+
+   tmp = BGMAC_PA_START;
+   tmp |= phyaddr << BGMAC_PA_ADDR_SHIFT;
+   tmp |= reg << BGMAC_PA_REG_SHIFT;
+   bcma_write32(core, phy_access_addr, tmp);
+
+   if (!bcma_mdio_wait_value(core, phy_access_addr, BGMAC_PA_START, 0,
+ 1000)) {
+   dev_err(>dev, "Reading PHY %d register 0x%X failed\n",
+   phyaddr, reg);
+   return 0x;
+   }
+
+   return bcma_read32(core, phy_access_addr) & BGMAC_PA_DATA_MASK;
+}
+
+/* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipphywr */
+static int bcma_mdio_phy_write(struct bcma_mdio *bcma_mdio, u8 phyaddr, u8 reg,
+  u16 value)
+{
+   struct bcma_device *core;
+   u16

[PATCH 5/7] net: ethernet: bgmac: Add platform device support

The bcma portion of the driver has been split off into a bcma specific
driver.  This has been mirrored for the platform driver.  The last
references to the bcma core struct have been changed into a generic
function call.  These function calls are wrappers to either the original
bcma code or new platform functions that access the same areas via MMIO.
This necessitated adding function pointers for both platform and bcma to
hide which backend is being used from the generic bgmac code.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/Kconfig   |  23 +-
 drivers/net/ethernet/broadcom/Makefile  |   4 +-
 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c |   2 +
 drivers/net/ethernet/broadcom/bgmac-bcma.c  | 315 +++
 drivers/net/ethernet/broadcom/bgmac-platform.c  | 210 +++
 drivers/net/ethernet/broadcom/bgmac.c   | 329 
 drivers/net/ethernet/broadcom/bgmac.h   |  73 +-
 7 files changed, 671 insertions(+), 285 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index d74a92e..bd8c80c 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -140,10 +140,18 @@ config BNX2X_SRIOV
  allows for virtual function acceleration in virtual environments.
 
 config BGMAC
-   tristate "BCMA bus GBit core support"
+   tristate
+   help
+ This enables the integrated ethernet controller support for many
+ Broadcom (mostly iProc) SoCs. An appropriate bus interface driver
+ needs to be enabled to select this.
+
+config BGMAC_BCMA
+   tristate "Broadcom iProc GBit BCMA support"
depends on BCMA && BCMA_HOST_SOC
depends on HAS_DMA
depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
+   select BGMAC
select PHYLIB
select FIXED_PHY
---help---
@@ -152,6 +160,19 @@ config BGMAC
  In case of using this driver on BCM4706 it's also requires to enable
  BCMA_DRIVER_GMAC_CMN to make it work.
 
+config BGMAC_PLATFORM
+   tristate "Broadcom iProc GBit platform support"
+   depends on HAS_DMA
+   depends on ARCH_BCM_IPROC || COMPILE_TEST
+   depends on OF
+   select BGMAC
+   select PHYLIB
+   select FIXED_PHY
+   default ARCH_BCM_IPROC
+   ---help---
+ Say Y here if you want to use the Broadcom iProc Gigabit Ethernet
+ controller through the generic platform interface
+
 config SYSTEMPORT
tristate "Broadcom SYSTEMPORT internal MAC support"
depends on OF
diff --git a/drivers/net/ethernet/broadcom/Makefile 
b/drivers/net/ethernet/broadcom/Makefile
index f559794..79f2372 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -10,6 +10,8 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
-obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC) += bgmac.o
+obj-$(CONFIG_BGMAC_BCMA) += bgmac-bcma.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC_PLATFORM) += bgmac-platform.o
 obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
 obj-$(CONFIG_BNXT) += bnxt/
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
index 1e65349..7c19c8e 100644
--- a/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
@@ -245,6 +245,7 @@ err:
kfree(bcma_mdio);
return ERR_PTR(err);
 }
+EXPORT_SYMBOL_GPL(bcma_mdio_mii_register);
 
 void bcma_mdio_mii_unregister(struct mii_bus *mii_bus)
 {
@@ -259,6 +260,7 @@ void bcma_mdio_mii_unregister(struct mii_bus *mii_bus)
mdiobus_free(mii_bus);
kfree(bcma_mdio);
 }
+EXPORT_SYMBOL_GPL(bcma_mdio_mii_unregister);
 
 MODULE_AUTHOR("Rafał Miłecki");
 MODULE_LICENSE("GPL");
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma.c
new file mode 100644
index 000..9a9745c4
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
@@ -0,0 +1,315 @@
+/*
+ * Driver for (BCM4706)? GBit MAC core on BCMA bus.
+ *
+ * Copyright (C) 2012 Rafał Miłecki 
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include "bgmac.h"
+
+static inline bool bgmac_is_bcm4707_family(struct bcma_device *core)
+{
+   switch (core->bus->chipinfo.id) {
+   case BCMA_CHIP_ID_BCM4707:
+   case BCMA_CHIP_ID_BCM47094:
+   case BCMA_CHIP_ID_BCM53018:
+   return true;
+   default:
+   return false;
+   }
+}
+
+/**
+ *

[PATCH 4/7] net: ethernet: bgmac: convert to feature flags

The bgmac driver is using the bcma provides device ID and revision, as
well as the SoC ID and package, to determine which features are
necessary to enable, reset, etc in the driver.   In anticipation of
removing the bcma requirement for this driver, these must be changed to
not reference that struct.  In place of that, each "feature" has been
given a flag, and the flags are enabled for their respective device and
SoC.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/bgmac.c | 167 --
 drivers/net/ethernet/broadcom/bgmac.h |  21 -
 2 files changed, 140 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 6c6bb18..b85e39a 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -109,7 +109,7 @@ static void bgmac_dma_tx_enable(struct bgmac *bgmac,
u32 ctl;
 
ctl = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL);
-   if (bgmac->core->id.rev >= 4) {
+   if (bgmac->feature_flags & BGMAC_FEAT_TX_MASK_SETUP) {
ctl &= ~BGMAC_DMA_TX_BL_MASK;
ctl |= BGMAC_DMA_TX_BL_128 << BGMAC_DMA_TX_BL_SHIFT;
 
@@ -330,7 +330,7 @@ static void bgmac_dma_rx_enable(struct bgmac *bgmac,
u32 ctl;
 
ctl = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_RX_CTL);
-   if (bgmac->core->id.rev >= 4) {
+   if (bgmac->feature_flags & BGMAC_FEAT_RX_MASK_SETUP) {
ctl &= ~BGMAC_DMA_RX_BL_MASK;
ctl |= BGMAC_DMA_RX_BL_128 << BGMAC_DMA_RX_BL_SHIFT;
 
@@ -768,14 +768,20 @@ static void bgmac_cmdcfg_maskset(struct bgmac *bgmac, u32 
mask, u32 set,
 {
u32 cmdcfg = bgmac_read(bgmac, BGMAC_CMDCFG);
u32 new_val = (cmdcfg & mask) | set;
+   u32 cmdcfg_sr;
 
-   bgmac_set(bgmac, BGMAC_CMDCFG, BGMAC_CMDCFG_SR(bgmac->core->id.rev));
+   if (bgmac->feature_flags & BGMAC_FEAT_CMDCFG_SR_REV4)
+   cmdcfg_sr = BGMAC_CMDCFG_SR_REV4;
+   else
+   cmdcfg_sr = BGMAC_CMDCFG_SR_REV0;
+
+   bgmac_set(bgmac, BGMAC_CMDCFG, cmdcfg_sr);
udelay(2);
 
if (new_val != cmdcfg || force)
bgmac_write(bgmac, BGMAC_CMDCFG, new_val);
 
-   bgmac_mask(bgmac, BGMAC_CMDCFG, ~BGMAC_CMDCFG_SR(bgmac->core->id.rev));
+   bgmac_mask(bgmac, BGMAC_CMDCFG, ~cmdcfg_sr);
udelay(2);
 }
 
@@ -804,7 +810,7 @@ static void bgmac_chip_stats_update(struct bgmac *bgmac)
 {
int i;
 
-   if (bgmac->core->id.id != BCMA_CORE_4706_MAC_GBIT) {
+   if (!(bgmac->feature_flags & BGMAC_FEAT_NO_CLR_MIB)) {
for (i = 0; i < BGMAC_NUM_MIB_TX_REGS; i++)
bgmac->mib_tx_regs[i] =
bgmac_read(bgmac,
@@ -823,7 +829,7 @@ static void bgmac_clear_mib(struct bgmac *bgmac)
 {
int i;
 
-   if (bgmac->core->id.id == BCMA_CORE_4706_MAC_GBIT)
+   if (bgmac->feature_flags & BGMAC_FEAT_NO_CLR_MIB)
return;
 
bgmac_set(bgmac, BGMAC_DEV_CTL, BGMAC_DC_MROR);
@@ -866,9 +872,8 @@ static void bgmac_mac_speed(struct bgmac *bgmac)
 static void bgmac_miiconfig(struct bgmac *bgmac)
 {
struct bcma_device *core = bgmac->core;
-   u8 imode;
 
-   if (bgmac_is_bcm4707_family(bgmac)) {
+   if (bgmac->feature_flags & BGMAC_FEAT_FORCE_SPEED_2500) {
bcma_awrite32(core, BCMA_IOCTL,
  bcma_aread32(core, BCMA_IOCTL) | 0x40 |
  BGMAC_BCMA_IOCTL_SW_CLKEN);
@@ -876,6 +881,8 @@ static void bgmac_miiconfig(struct bgmac *bgmac)
bgmac->mac_duplex = DUPLEX_FULL;
bgmac_mac_speed(bgmac);
} else {
+   u8 imode;
+
imode = (bgmac_read(bgmac, BGMAC_DEV_STATUS) &
BGMAC_DS_MM_MASK) >> BGMAC_DS_MM_SHIFT;
if (imode == 0 || imode == 1) {
@@ -890,9 +897,7 @@ static void bgmac_miiconfig(struct bgmac *bgmac)
 static void bgmac_chip_reset(struct bgmac *bgmac)
 {
struct bcma_device *core = bgmac->core;
-   struct bcma_bus *bus = core->bus;
-   struct bcma_chipinfo *ci = >chipinfo;
-   u32 flags;
+   u32 cmdcfg_sr;
u32 iost;
int i;
 
@@ -915,15 +920,12 @@ static void bgmac_chip_reset(struct bgmac *bgmac)
}
 
iost = bcma_aread32(core, BCMA_IOST);
-   if ((ci->id == BCMA_CHIP_ID_BCM5357 && ci->pkg == BCMA_PKG_ID_BCM47186) 
||
-   (ci->id == BCMA_CHIP_ID_BCM4749 && ci->pkg == 10) ||
-   (ci->id == BCMA_CHIP_ID_BCM53572 && ci->pkg == 
BCMA_PKG_ID_BCM47188))
+   if (bgmac->feature_flags & BGMAC_FEAT_IOST_ATTACHED)
iost &= ~BGMAC_BCMA_IOST_ATTACHED;
 
/* 3GMAC: for BCM4707 & BCM47094, only do core reset at bgmac_probe() */
-   if (ci->id != BCMA_CHIP_ID_BCM4707 &&
-   ci->id != BCMA_CHIP_ID_BCM47094) {
-   flags = 0;
+   if

[PATCH 1/7] net: ethernet: bgmac: change bgmac_* prints to dev_* prints

The bgmac_* print wrappers call dev_* prints with the dev pointer from
the bcma core.  In anticipation of removing the bcma requirement for
this driver, these must be changed to not reference that struct.  So,
simply change all of the bgmac_* prints to their dev_* counterparts.  In
some cases netdev_* prints are more appropriate, so change those as
well.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/bgmac.c | 103 +-
 drivers/net/ethernet/broadcom/bgmac.h |  14 +
 2 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index e6e74ca..37b3b68 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -50,7 +50,7 @@ static bool bgmac_wait_value(struct bcma_device *core, u16 
reg, u32 mask,
return true;
udelay(10);
}
-   pr_err("Timeout waiting for reg 0x%X\n", reg);
+   dev_err(>dev, "Timeout waiting for reg 0x%X\n", reg);
return false;
 }
 
@@ -84,8 +84,8 @@ static void bgmac_dma_tx_reset(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring)
udelay(10);
}
if (i)
-   bgmac_err(bgmac, "Timeout suspending DMA TX ring 0x%X 
(BGMAC_DMA_TX_STAT: 0x%08X)\n",
- ring->mmio_base, val);
+   dev_err(bgmac->dev, "Timeout suspending DMA TX ring 0x%X 
(BGMAC_DMA_TX_STAT: 0x%08X)\n",
+   ring->mmio_base, val);
 
/* Remove SUSPEND bit */
bgmac_write(bgmac, ring->mmio_base + BGMAC_DMA_TX_CTL, 0);
@@ -93,13 +93,13 @@ static void bgmac_dma_tx_reset(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring)
  ring->mmio_base + BGMAC_DMA_TX_STATUS,
  BGMAC_DMA_TX_STAT, BGMAC_DMA_TX_STAT_DISABLED,
  1)) {
-   bgmac_warn(bgmac, "DMA TX ring 0x%X wasn't disabled on time, 
waiting additional 300us\n",
-  ring->mmio_base);
+   dev_warn(bgmac->dev, "DMA TX ring 0x%X wasn't disabled on time, 
waiting additional 300us\n",
+ring->mmio_base);
udelay(300);
val = bgmac_read(bgmac, ring->mmio_base + BGMAC_DMA_TX_STATUS);
if ((val & BGMAC_DMA_TX_STAT) != BGMAC_DMA_TX_STAT_DISABLED)
-   bgmac_err(bgmac, "Reset of DMA TX ring 0x%X failed\n",
- ring->mmio_base);
+   dev_err(bgmac->dev, "Reset of DMA TX ring 0x%X 
failed\n",
+   ring->mmio_base);
}
 }
 
@@ -161,7 +161,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
int i;
 
if (skb->len > BGMAC_DESC_CTL1_LEN) {
-   bgmac_err(bgmac, "Too long skb (%d)\n", skb->len);
+   netdev_err(bgmac->net_dev, "Too long skb (%d)\n", skb->len);
goto err_drop;
}
 
@@ -174,7 +174,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
 * even when ring->end overflows
 */
if (ring->end - ring->start + nr_frags + 1 >= BGMAC_TX_RING_SLOTS) {
-   bgmac_err(bgmac, "TX ring is full, queue should be stopped!\n");
+   netdev_err(bgmac->net_dev, "TX ring is full, queue should be 
stopped!\n");
netif_stop_queue(net_dev);
return NETDEV_TX_BUSY;
}
@@ -241,8 +241,8 @@ err_dma:
}
 
 err_dma_head:
-   bgmac_err(bgmac, "Mapping error of skb on ring 0x%X\n",
- ring->mmio_base);
+   netdev_err(bgmac->net_dev, "Mapping error of skb on ring 0x%X\n",
+  ring->mmio_base);
 
 err_drop:
dev_kfree_skb(skb);
@@ -320,8 +320,8 @@ static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring)
  ring->mmio_base + BGMAC_DMA_RX_STATUS,
  BGMAC_DMA_RX_STAT, BGMAC_DMA_RX_STAT_DISABLED,
  1))
-   bgmac_err(bgmac, "Reset of ring 0x%X RX failed\n",
- ring->mmio_base);
+   dev_err(bgmac->dev, "Reset of ring 0x%X RX failed\n",
+   ring->mmio_base);
 }
 
 static void bgmac_dma_rx_enable(struct bgmac *bgmac,
@@ -370,7 +370,7 @@ static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
dma_addr = dma_map_single(dma_dev, buf + BGMAC_RX_BUF_OFFSET,
  BGMAC_RX_BUF_SIZE, DMA_FROM_DEVICE);
if (dma_mapping_error(dma_dev, dma_addr)) {
-   bgmac_err(bgmac, "DMA mapping error\n");
+   netdev_err(bgmac->net_dev, "DMA mapping error\n");
put_page(virt_to_head_page(buf));
return -ENOMEM;
}
@@ -465,16 +465,16 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct 
bgmac_dma_ring

[PATCH 7/7] ARM: dts: NSP: Add bgmac entries

Add device tree entries for the ethernet devices present on the
Broadcom Northstar Plus SoCs

Signed-off-by: Jon Mason 
---
 arch/arm/boot/dts/bcm-nsp.dtsi   | 18 ++
 arch/arm/boot/dts/bcm958625k.dts |  8 
 2 files changed, 26 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi
index def9e78..8f4343b 100644
--- a/arch/arm/boot/dts/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/bcm-nsp.dtsi
@@ -192,6 +192,24 @@
status = "disabled";
};
 
+   gmac0: ethernet@22000 {
+   compatible = "brcm,bgmac-nsp";
+   reg = <0x022000 0x1000>,
+ <0x11 0x1000>;
+   reg-names = "gmac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
+   gmac1: ethernet@23000 {
+   compatible = "brcm,bgmac-nsp";
+   reg = <0x023000 0x1000>,
+ <0x111000 0x1000>;
+   reg-names = "gmac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
nand: nand@26000 {
compatible = "brcm,nand-iproc", "brcm,brcmnand-v6.1";
reg = <0x026000 0x600>,
diff --git a/arch/arm/boot/dts/bcm958625k.dts b/arch/arm/boot/dts/bcm958625k.dts
index e298450..d16ab53 100644
--- a/arch/arm/boot/dts/bcm958625k.dts
+++ b/arch/arm/boot/dts/bcm958625k.dts
@@ -56,6 +56,14 @@
status = "okay";
 };
 
+ {
+   status = "okay";
+};
+
+ {
+   status = "okay";
+};
+
  {
status = "okay";
 };
-- 
1.9.1

[PATCH 0/7] net: ethernet: bgmac: Add platform device support

Well, no compilained too loudly at the RFC version of this patch series
(see https://lkml.org/lkml/2016/6/28/863).  So, I'm officially sending
this out for inclusion.  All comments from the RFC were addressed in
this version.

This patch series adds support for other, non-bcma iProc SoC's to the
bgmac driver.  This series only adds NSP support, but we are interested
in adding support for the Cygnus and NS2 families (with more possible
down the road).

To support non-bcma enabled SoCs, we need to add the standard device
tree "platform device" support.  Unfortunately, this driver is very
tighly coupled with the bcma bus and much unwinding is needed.  I tried
to break this up into a number of patches to make it more obvious what
was being done to add platform device support.  I was able to verify
that the bcma code still works using a 53012K board (NS SoC), and that
the platform code works using a 58625K board (NSP SoC).

Thanks,
Jon

Jon Mason (7):
  net: ethernet: bgmac: change bgmac_* prints to dev_* prints
  net: ethernet: bgmac: add dma_dev pointer
  net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file
  net: ethernet: bgmac: convert to feature flags
  net: ethernet: bgmac: Add platform device support
  dt-bindings: net: bgmac: add bindings documentation for bgmac
  ARM: dts: NSP: Add bgmac entries

 .../devicetree/bindings/net/brcm,bgmac-nsp.txt |  24 +
 arch/arm/boot/dts/bcm-nsp.dtsi |  18 +
 arch/arm/boot/dts/bcm958625k.dts   |   8 +
 drivers/net/ethernet/broadcom/Kconfig  |  23 +-
 drivers/net/ethernet/broadcom/Makefile |   2 +
 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c| 266 +
 drivers/net/ethernet/broadcom/bgmac-bcma.c | 315 ++
 drivers/net/ethernet/broadcom/bgmac-platform.c | 210 +++
 drivers/net/ethernet/broadcom/bgmac.c  | 658 +
 drivers/net/ethernet/broadcom/bgmac.h  | 112 +++-
 10 files changed, 1120 insertions(+), 516 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c

-- 
1.9.1

[PATCH 2/7] net: ethernet: bgmac: add dma_dev pointer

The dma buffer allocation, etc references a dma_dev device pointer from
the bcma core.  In anticipation of removing the bcma requirement for
this driver, these must be changed to not reference that struct.  Add a
dma_dev device pointer to the bgmac stuct and reference that instead.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/bgmac.c | 17 +
 drivers/net/ethernet/broadcom/bgmac.h |  1 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 37b3b68..3614bd8 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -152,7 +152,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
struct bgmac_dma_ring *ring,
struct sk_buff *skb)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct net_device *net_dev = bgmac->net_dev;
int index = ring->end % BGMAC_TX_RING_SLOTS;
struct bgmac_slot_info *slot = >slots[index];
@@ -254,7 +254,7 @@ err_drop:
 /* Free transmitted packets */
 static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring *ring)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
int empty_slot;
bool freed = false;
unsigned bytes_compl = 0, pkts_compl = 0;
@@ -351,7 +351,7 @@ static void bgmac_dma_rx_enable(struct bgmac *bgmac,
 static int bgmac_dma_rx_skb_for_slot(struct bgmac *bgmac,
 struct bgmac_slot_info *slot)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
dma_addr_t dma_addr;
struct bgmac_rx_header *rx;
void *buf;
@@ -440,7 +440,7 @@ static int bgmac_dma_rx_read(struct bgmac *bgmac, struct 
bgmac_dma_ring *ring,
end_slot /= sizeof(struct bgmac_dma_desc);
 
while (ring->start != end_slot) {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_slot_info *slot = >slots[ring->start];
struct bgmac_rx_header *rx = slot->buf + BGMAC_RX_BUF_OFFSET;
struct sk_buff *skb;
@@ -543,7 +543,7 @@ static bool bgmac_dma_unaligned(struct bgmac *bgmac,
 static void bgmac_dma_tx_ring_free(struct bgmac *bgmac,
   struct bgmac_dma_ring *ring)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_dma_desc *dma_desc = ring->cpu_base;
struct bgmac_slot_info *slot;
int i;
@@ -569,7 +569,7 @@ static void bgmac_dma_tx_ring_free(struct bgmac *bgmac,
 static void bgmac_dma_rx_ring_free(struct bgmac *bgmac,
   struct bgmac_dma_ring *ring)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_slot_info *slot;
int i;
 
@@ -590,7 +590,7 @@ static void bgmac_dma_ring_desc_free(struct bgmac *bgmac,
 struct bgmac_dma_ring *ring,
 int num_slots)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
int size;
 
if (!ring->cpu_base)
@@ -628,7 +628,7 @@ static void bgmac_dma_free(struct bgmac *bgmac)
 
 static int bgmac_dma_alloc(struct bgmac *bgmac)
 {
-   struct device *dma_dev = bgmac->core->dma_dev;
+   struct device *dma_dev = bgmac->dma_dev;
struct bgmac_dma_ring *ring;
static const u16 ring_base[] = { BGMAC_DMA_BASE0, BGMAC_DMA_BASE1,
 BGMAC_DMA_BASE2, BGMAC_DMA_BASE3, };
@@ -1701,6 +1701,7 @@ static int bgmac_probe(struct bcma_device *core)
net_dev->ethtool_ops = _ethtool_ops;
bgmac = netdev_priv(net_dev);
bgmac->dev = >dev;
+   bgmac->dma_dev = core->dma_dev;
bgmac->net_dev = net_dev;
bgmac->core = core;
bcma_set_drvdata(core, bgmac);
diff --git a/drivers/net/ethernet/broadcom/bgmac.h 
b/drivers/net/ethernet/broadcom/bgmac.h
index abb9dd8..fd20018 100644
--- a/drivers/net/ethernet/broadcom/bgmac.h
+++ b/drivers/net/ethernet/broadcom/bgmac.h
@@ -429,6 +429,7 @@ struct bgmac {
struct bcma_device *cmn; /* Reference to CMN core for BCM4706 */
 
struct device *dev;
+   struct device *dma_dev;
struct net_device *net_dev;
struct napi_struct napi;
struct mii_bus *mii_bus;
-- 
1.9.1

[PATCH 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

Signed-off-by: Jon Mason 
---
 .../devicetree/bindings/net/brcm,bgmac-nsp.txt | 24 ++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt

diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt 
b/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
new file mode 100644
index 000..022946c
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
@@ -0,0 +1,24 @@
+Broadcom GMAC Ethernet Controller Device Tree Bindings
+-
+
+Required properties:
+ - compatible: "brcm,bgmac-nsp"
+ - reg:Address and length of the GMAC registers,
+   Address and length of the GMAC IDM registers
+ - reg-names:  Names of the registers.  Must have both "gmac_base" and
+   "idm_base"
+ - interrupts: Interrupt number
+
+Optional properties:
+- mac-address: See ethernet.txt file in the same directory
+
+Examples:
+
+gmac0: ethernet@18022000 {
+   compatible = "brcm,bgmac-nsp";
+   reg = <0x18022000 0x1000>,
+ <0x1811 0x1000>;
+   reg-names = "gmac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+};
-- 
1.9.1

Re: [Patch net] net_sched: fix mirrored packets checksum

On Thu, Jun 30, 2016 at 12:50 PM, Daniel Borkmann  wrote:
>
> Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
> to the header then? Both seem similarly small at least (could be split
> f.e into two patches then, first for the move, second for the actual fix).

No objection from me. Please feel free to send a patch. ;)

[PATCH] mwifiex: mask PCIe interrupts before removal

2016-06-30 Thread Brian Norris

The PCIe driver didn't mask the host interrupts before trying to tear
down. This causes lockups at reboot or rmmod when using MSI-X on 8997,
since the MSI handler gets confused and locks up the system.

Also tested on 8897, which does not support MSI-X (and wasn't
experiencing this same bug). No regressions seen there.

Signed-off-by: Brian Norris 
---
 drivers/net/wireless/marvell/mwifiex/pcie.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c 
b/drivers/net/wireless/marvell/mwifiex/pcie.c
index 0c7937eb6b77..af98371dc2af 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -440,6 +440,11 @@ static int mwifiex_pcie_disable_host_int(struct 
mwifiex_adapter *adapter)
return 0;
 }
 
+static void mwifiex_pcie_disable_host_int_noerr(struct mwifiex_adapter 
*adapter)
+{
+   WARN_ON(mwifiex_pcie_disable_host_int(adapter));
+}
+
 /*
  * This function enables the host interrupt.
  *
@@ -2945,6 +2950,7 @@ static struct mwifiex_if_ops pcie_ops = {
.register_dev = mwifiex_register_dev,
.unregister_dev =   mwifiex_unregister_dev,
.enable_int =   mwifiex_pcie_enable_host_int,
+   .disable_int =  mwifiex_pcie_disable_host_int_noerr,
.process_int_status =   mwifiex_process_int_status,
.host_to_card = mwifiex_pcie_host_to_card,
.wakeup =   mwifiex_pm_wakeup_card,
-- 
2.8.0.rc3.226.g39d4020

Re: ethtool needs a new maintainer

On 16-06-30 11:15 AM, John W. Linville wrote:
> On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
>> On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
>>> I've become steadily less enthusiastic and less responsive as a
>>> maintainer over the past year or so.  I no longer work on networking
>>> regularly, so it takes a lot more time to get into the right state of
>>> mind to think about ethtool code, while I have other demands on my time
>>> that tend to take priority.
>>>
>>> So, I would like to find a new maintainer to take over as soon as
>>> possible.  Ideally the new maintainer would have previous contributions
>>> to ethtool and an existing account on kernel.org so that they can push
>>> to the git repository and the home page.  But neither of those is
>>> essential.  Please reply if you're interested.
>>
>> I would like to take this responsibility. My previous contributions
>> to ethtool are meager, but I think my skills and interests are suited
>> to the task.  Plus, I already have a kernel.org account... :-)
> 
> Are there any other takers?  Or is this a done deal?
> 
> John
> 

+1 for having John take it on :)

.JohnF

Re: [RFC 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

On Thu, Jun 30, 2016 at 2:06 PM, Ray Jui  wrote:
> Hi Jon,
>
> On 6/28/2016 12:34 PM, Jon Mason wrote:
>>
>> Signed-off-by: Jon Mason 
>> ---
>>  .../devicetree/bindings/net/brcm,bgmac-enet.txt | 21
>> +
>>  1 file changed, 21 insertions(+)
>>  create mode 100644
>> Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>> b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>> new file mode 100644
>> index 000..efd36d5
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
>> @@ -0,0 +1,21 @@
>> +Broadcom GMAC Ethernet Controller Device Tree Bindings
>> +-
>> +
>> +Required properties:
>> + - compatible: "brcm,bgmac-enet"
>> + - reg:Address and length of the GMAC registers,
>> +   Address and length of the GMAC IDM registers
>
>
> As we know there will be additional optional register banks required for
> some of the other SoCs that the current driver has not yet supported. In my
> opinion, we should consider to make "reg-names" a mandatory property now and
> map the register blocks based on names.
>
> I think this will help to make our life easier in the future when new
> optional SoC specific register blocks are added, such that we can map the
> register blocks based on names instead of indices, which will change and be
> different among different SoCs and will require much more complex logic in
> the driver to deal with.

I don't have any objection to this.  I'll tweak the patches to do it by name.

>
>> + - interrupts: Interrupt number
>> +
>> +Optional properties:
>> +- mac-address: mac address to be assigned to the device
>> +
>> +Examples:
>> +
>> +gmac0: enet@18022000 {
>> +   compatible = "brcm,bgmac-enet";
>> +   reg = <0x18022000 0x1000>,
>> + <0x1811 0x1000>;
>> +   interrupts = ;
>> +   status = "disabled";
>> +};
>>
>
> Btw, I think Rob Herring should be included in the review for device tree
> binding document changes.

Thanks, I'll add him and the other DT maintainers when I send this out
as a "PATCH" shortly.

Thanks,
Jon

>
> Thanks,
>
> Ray

Re: [RFC PATCH] ila: Resolver mechanism

2016-06-30 Thread Thomas Graf

On 06/30/16 at 12:41pm, Tom Herbert wrote:
> This is not yet complete, we would still need to some controls
> to rate limit number of resolution requests and a means to track
> pending requests. I'm posting this as RFC because it seems like
> this might be part of a general mechanism to a perform address
> resolution in userspace and I would appreciate comments with
> regard to that.

I wouldn't mind having the rate limiting done as generic route
attribute so it could be applied to non-ILA routes as well.

> 
> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> index a478fe8..d880e49 100644
> --- a/include/uapi/linux/lwtunnel.h
> +++ b/include/uapi/linux/lwtunnel.h
> @@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
>   LWTUNNEL_ENCAP_IP,
>   LWTUNNEL_ENCAP_ILA,
>   LWTUNNEL_ENCAP_IP6,
> + LWTUNNEL_ENCAP_ILA_NOTIFY,
>   __LWTUNNEL_ENCAP_MAX,
>  };

Neat.

> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 262f037..271215f 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -144,6 +144,9 @@ enum {
>   RTM_GETSTATS = 94,
>  #define RTM_GETSTATS RTM_GETSTATS
>  
> + RTM_ADDR_RESOLVE = 95,
> +#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
> +

I realize this is currently only kernel->user but let's plan ahead.
Each RTM_ group should start aligned to 4 with types specified in
the order new, del, get, set. RTM_ADDR_RESOLVE probably maps best
to NEW in terms of behaviour. See the magic around 'kind' in
rtnetlink_rcv_msg().

[PATCH net] macsec: set actual real device for xmit when !protect_frames

2016-06-30 Thread Daniel Borkmann

Avoid recursions of dev_queue_xmit() to the wrong net device when
frames are unprotected, since at that time skb->dev still points to
our own macsec dev and unlike macsec_encrypt_finish() dev pointer
doesn't get updated to real underlying device.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Daniel Borkmann 
Acked-by: Sabrina Dubroca 
---
 drivers/net/macsec.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 0e7eff7..8bcd78f 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -2640,6 +2640,7 @@ static netdev_tx_t macsec_start_xmit(struct sk_buff *skb,
u64_stats_update_begin(_stats->syncp);
secy_stats->stats.OutPktsUntagged++;
u64_stats_update_end(_stats->syncp);
+   skb->dev = macsec->real_dev;
len = skb->len;
ret = dev_queue_xmit(skb);
count_tx(dev, ret, len);
-- 
1.9.3

Re: ethtool needs a new maintainer

2016-06-30 Thread John W. Linville

On Thu, Jun 30, 2016 at 02:37:30PM -0500, Jorge Alberto Garcia wrote:
> El 30/06/2016 02:32 p.m., "Ben Hutchings"  escribió:
> >
> > On Thu, 2016-06-30 at 14:27 -0500, Jorge Alberto Garcia wrote:
> > > On Thu, Jun 30, 2016 at 1:15 PM, John W. Linville
> > >  wrote:
> > > > On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
> > > > > On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > > > > > I've become steadily less enthusiastic and less responsive as a
> > > > > > maintainer over the past year or so.  I no longer work on
> networking
> > > > > > regularly, so it takes a lot more time to get into the right
> state of
> > > > > > mind to think about ethtool code, while I have other demands on
> my time
> > > > > > that tend to take priority.
> > > > > >
> > > > > > So, I would like to find a new maintainer to take over as soon as
> > > > > > possible.  Ideally the new maintainer would have previous
> contributions
> > > > > > to ethtool and an existing account on kernel.org so that they can
> push
> > > > > > to the git repository and the home page.  But neither of those is
> > > > > > essential.  Please reply if you're interested.
> > > > >
> > > > > I would like to take this responsibility. My previous contributions
> > > > > to ethtool are meager, but I think my skills and interests are
> suited
> > > > > to the task.  Plus, I already have a kernel.org account... :-)
> > > >
> > > > Are there any other takers?  Or is this a done deal?
> > > >
> > >
> > > hi guys !, any link to a bugzilla  / patchwork  ?
> >
> > There's nothing as organised as that, though it might be possible to
> > add categories for ethtool on  and
> > .
> >
> > Ben.
> >
> I would like to help but it will be a first for me.
> Maybe in shadow mode ?

Honestly, I don't expect the patch management stuff to be much of a
burden.  I could always use the help reviewing any patches submitted,
of course!

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.

Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))

2016-06-30 Thread Steven Rostedt

On Thu, 30 Jun 2016 16:07:26 -0400
Steven Rostedt  wrote:

> I can reproduce this by having the client unmount and remount the
> directory.

It gets even more interesting. When I unmount the directory, the hidden
port does not go away. It is still there. But if I mount it again, it
goes away (until it times out again).

Even more info:

When I first mount it, it creates 3 sockets, where one immediately is
closed:

tcp0  0 192.168.23.9:892192.168.23.22:44672 TIME_WAIT   
-   
tcp0  0 192.168.23.9:2049   192.168.23.22:815   ESTABLISHED 
-   
tcp0  0 192.168.23.9:754192.168.23.22:44672 ESTABLISHED 
-   

(192.168.23.22 is the machine remotely mounting a directory from the
server 192.168.23.9)

The trace of port 892 is this:

   kworker/u32:1-13473 [000]   4093.915114: xs_setup_tcp: RPC:   set up 
xprt to 192.168.23.22 (port 44672) via tcp
   kworker/u32:1-13473 [000]   4093.915122: xprt_create_transport: RPC: 
  created transport 8803b1c38000 with 65536 slots
kworker/0:1H-129   [000]   4093.915152: xprt_alloc_slot: RPC:47 
reserved req 88040b27ca00 xid c50ccaff
kworker/0:1H-129   [000]   4093.915157: xprt_connect: RPC:47 
xprt_connect xprt 8803b1c38000 is not connected
kworker/0:1H-129   [000]   4093.915159: xs_connect: RPC:   
xs_connect scheduled xprt 8803b1c38000
kworker/0:1H-129   [000] ..s.  4093.915170: inet_csk_get_port: snum 892
kworker/0:1H-129   [000] ..s.  4093.915177: 
 => sched_clock
 => inet_addr_type_table
 => security_capable
 => inet_bind
 => xs_bind
 => release_sock
 => sock_setsockopt
 => __sock_create
 => xs_create_sock.isra.19
 => xs_tcp_setup_socket
 => process_one_work
 => worker_thread
 => worker_thread
 => kthread
 => ret_from_fork
 => kthread
kworker/0:1H-129   [000] ..s.  4093.915178: inet_bind_hash: add 892 
8803bb9b5cc0
kworker/0:1H-129   [000] ..s.  4093.915184: 
 => inet_csk_get_port
 => sched_clock
 => inet_addr_type_table
 => security_capable
 => inet_bind
 => xs_bind
 => release_sock
 => sock_setsockopt
 => __sock_create
 => xs_create_sock.isra.19
 => xs_tcp_setup_socket
 => process_one_work
 => worker_thread
 => worker_thread
 => kthread
 => ret_from_fork
 => kthread
kworker/0:1H-129   [000]   4093.915185: xs_bind: RPC:   xs_bind 
4.136.255.255:892: ok (0)
kworker/0:1H-129   [000]   4093.915186: xs_tcp_setup_socket: RPC:   
worker connecting xprt 8803b1c38000 via tcp to 192.168.23.22 (port 44672)
kworker/0:1H-129   [000]   4093.915221: xs_tcp_setup_socket: RPC:   
8803b1c38000 connect status 115 connected 0 sock state 2
  -0 [003] ..s.  4093.915434: xs_tcp_state_change: RPC:   
xs_tcp_state_change client 8803b1c38000...
  -0 [003] ..s.  4093.915435: xs_tcp_state_change: RPC:   
state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
kworker/3:1H-145   [003]   4093.915558: xprt_connect_status: RPC:47 
xprt_connect_status: retrying
kworker/3:1H-145   [003]   4093.915560: xprt_prepare_transmit: RPC:
47 xprt_prepare_transmit
kworker/3:1H-145   [003]   4093.915562: xprt_transmit: RPC:47 
xprt_transmit(72)
kworker/3:1H-145   [003]   4093.915588: xs_tcp_send_request: RPC:   
xs_tcp_send_request(72) = 0
kworker/3:1H-145   [003]   4093.915589: xprt_transmit: RPC:47 xmit 
complete
  -0 [003] ..s.  4093.915969: xs_tcp_data_ready: RPC:   
xs_tcp_data_ready...
kworker/3:1H-145   [003]   4093.916081: xs_tcp_data_recv: RPC:   
xs_tcp_data_recv started
kworker/3:1H-145   [003]   4093.916083: xs_tcp_data_recv: RPC:   
reading TCP record fragment of length 24
kworker/3:1H-145   [003]   4093.916084: xs_tcp_data_recv: RPC:   
reading XID (4 bytes)
kworker/3:1H-145   [003]   4093.916085: xs_tcp_data_recv: RPC:   
reading request with XID c50ccaff
kworker/3:1H-145   [003]   4093.916086: xs_tcp_data_recv: RPC:   
reading CALL/REPLY flag (4 bytes)
kworker/3:1H-145   [003]   4093.916087: xs_tcp_data_recv: RPC:   
read reply XID c50ccaff
kworker/3:1H-145   [003] ..s.  4093.916088: xs_tcp_data_recv: RPC:   
XID c50ccaff read 16 bytes
kworker/3:1H-145   [003] ..s.  4093.916089: xs_tcp_data_recv: RPC:   
xprt = 8803b1c38000, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24
kworker/3:1H-145   [003] ..s.  4093.916090: xprt_complete_rqst: RPC:47 
xid c50ccaff complete (24 bytes received)
kworker/3:1H-145   [003]   4093.916091: xs_tcp_data_recv: RPC:   
xs_tcp_data_recv done
kworker/3:1H-145   [003]   4093.916098: xprt_release: RPC:47 
release request 88040b27ca00
   kworker/u32:1-13473 [002]   4093.976056: xprt_destroy: RPC:   
destroying transport 8803b1c38000
   kworker/u32:1-13473 [002]

Re: [RFC 5/7] net: ethernet: bgmac: Add platform device support

On Thu, Jun 30, 2016 at 1:58 PM, Ray Jui  wrote:
> Hi Jon,
>
>
> On 6/28/2016 12:34 PM, Jon Mason wrote:
>>
>> The bcma portion of the driver has been split off into a bcma specific
>> driver.  This has been mirrored for the platform driver.  The last
>> references to the bcma core struct have been changed into a generic
>> function call.  These function calls are wrappers to either the original
>> bcma code or new platform functions that access the same areas via MMIO.
>> This necessitated adding function pointers for both platform and bcma to
>> hide which backend is being used from the generic bgmac code.
>>
>> Signed-off-by: Jon Mason 
>> ---
>>  drivers/net/ethernet/broadcom/Kconfig  |  23 +-
>>  drivers/net/ethernet/broadcom/Makefile |   4 +-
>>  drivers/net/ethernet/broadcom/bgmac-bcma.c | 315
>> 
>>  drivers/net/ethernet/broadcom/bgmac-platform.c | 208 
>>  drivers/net/ethernet/broadcom/bgmac.c  | 327
>> -
>>  drivers/net/ethernet/broadcom/bgmac.h  |  73 +-
>>  6 files changed, 666 insertions(+), 284 deletions(-)
>>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
>>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c
>>
>> diff --git a/drivers/net/ethernet/broadcom/Kconfig
>> b/drivers/net/ethernet/broadcom/Kconfig
>> index d74a92e..bd8c80c 100644
>> --- a/drivers/net/ethernet/broadcom/Kconfig
>> +++ b/drivers/net/ethernet/broadcom/Kconfig
>> @@ -140,10 +140,18 @@ config BNX2X_SRIOV
>>   allows for virtual function acceleration in virtual
>> environments.
>>
>>  config BGMAC
>> -   tristate "BCMA bus GBit core support"
>> +   tristate
>> +   help
>> + This enables the integrated ethernet controller support for many
>> + Broadcom (mostly iProc) SoCs. An appropriate bus interface
>> driver
>> + needs to be enabled to select this.
>> +
>> +config BGMAC_BCMA
>> +   tristate "Broadcom iProc GBit BCMA support"
>> depends on BCMA && BCMA_HOST_SOC
>> depends on HAS_DMA
>> depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
>> +   select BGMAC
>> select PHYLIB
>> select FIXED_PHY
>> ---help---
>> @@ -152,6 +160,19 @@ config BGMAC
>>   In case of using this driver on BCM4706 it's also requires to
>> enable
>>   BCMA_DRIVER_GMAC_CMN to make it work.
>>
>> +config BGMAC_PLATFORM
>> +   tristate "Broadcom iProc GBit platform support"
>> +   depends on HAS_DMA
>> +   depends on ARCH_BCM_IPROC || COMPILE_TEST
>> +   depends on OF
>> +   select BGMAC
>> +   select PHYLIB
>> +   select FIXED_PHY
>> +   default ARCH_BCM_IPROC
>> +   ---help---
>> + Say Y here if you want to use the Broadcom iProc Gigabit
>> Ethernet
>> + controller through the generic platform interface
>> +
>>  config SYSTEMPORT
>> tristate "Broadcom SYSTEMPORT internal MAC support"
>> depends on OF
>> diff --git a/drivers/net/ethernet/broadcom/Makefile
>> b/drivers/net/ethernet/broadcom/Makefile
>> index f559794..79f2372 100644
>> --- a/drivers/net/ethernet/broadcom/Makefile
>> +++ b/drivers/net/ethernet/broadcom/Makefile
>> @@ -10,6 +10,8 @@ obj-$(CONFIG_CNIC) += cnic.o
>>  obj-$(CONFIG_BNX2X) += bnx2x/
>>  obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
>>  obj-$(CONFIG_TIGON3) += tg3.o
>> -obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
>> +obj-$(CONFIG_BGMAC) += bgmac.o
>> +obj-$(CONFIG_BGMAC_BCMA) += bgmac-bcma.o bgmac-bcma-mdio.o
>> +obj-$(CONFIG_BGMAC_PLATFORM) += bgmac-platform.o
>>  obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
>>  obj-$(CONFIG_BNXT) += bnxt/
>> diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c
>> b/drivers/net/ethernet/broadcom/bgmac-bcma.c
>> new file mode 100644
>> index 000..9a9745c4
>> --- /dev/null
>> +++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
>> @@ -0,0 +1,315 @@
>> +/*
>> + * Driver for (BCM4706)? GBit MAC core on BCMA bus.
>> + *
>> + * Copyright (C) 2012 Rafał Miłecki 
>> + *
>> + * Licensed under the GNU/GPL. See COPYING for details.
>> + */
>> +
>> +#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include "bgmac.h"
>> +
>> +static inline bool bgmac_is_bcm4707_family(struct bcma_device *core)
>> +{
>> +   switch (core->bus->chipinfo.id) {
>> +   case BCMA_CHIP_ID_BCM4707:
>> +   case BCMA_CHIP_ID_BCM47094:
>> +   case BCMA_CHIP_ID_BCM53018:
>> +   return true;
>> +   default:
>> +   return false;
>> +   }
>> +}
>> +
>> +/**
>> + * BCMA bus ops
>> + **/
>> +
>> +static u32 bcma_bgmac_read(struct bgmac *bgmac, u16 offset)
>> +{
>> +   return bcma_read32(bgmac->bcma.core, offset);
>> +}
>> +
>> +static void bcma_bgmac_write(struct

[PATCH nf-next 3/3] netfilter: replace list_head with single linked list

The netfilter hook list never uses the prev pointer, and so can be
trimmed to be a smaller singly-linked list.

In addition to having a more light weight structure for hook traversal,
struct net becomes 5568 bytes (down from 6400) and struct net_device
becomes 2176 bytes (down from 2240).

Signed-off-by: Aaron Conole 
Signed-off-by: Florian Westphal 
---
 include/linux/netdevice.h |   2 +-
 include/linux/netfilter.h |  18 +++---
 include/linux/netfilter_ingress.h |  14 +++--
 include/net/netfilter/nf_queue.h  |   9 ++-
 include/net/netns/netfilter.h |   2 +-
 net/bridge/br_netfilter_hooks.c   |  21 +++
 net/netfilter/core.c  | 126 --
 net/netfilter/nf_internals.h  |  10 +--
 net/netfilter/nf_queue.c  |  15 +++--
 net/netfilter/nfnetlink_queue.c   |   5 +-
 10 files changed, 129 insertions(+), 93 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e84d9d2..8235f67 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1747,7 +1747,7 @@ struct net_device {
 #endif
struct netdev_queue __rcu *ingress_queue;
 #ifdef CONFIG_NETFILTER_INGRESS
-   struct list_headnf_hooks_ingress;
+   struct nf_hook_entry __rcu *nf_hooks_ingress;
 #endif
 
unsigned char   broadcast[MAX_ADDR_LEN];
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index ad444f0..3390a84 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -55,12 +55,12 @@ struct nf_hook_state {
struct net_device *out;
struct sock *sk;
struct net *net;
-   struct list_head *hook_list;
+   struct nf_hook_entry *hook_list;
int (*okfn)(struct net *, struct sock *, struct sk_buff *);
 };
 
 static inline void nf_hook_state_init(struct nf_hook_state *p,
- struct list_head *hook_list,
+ struct nf_hook_entry *hook_list,
  unsigned int hook,
  int thresh, u_int8_t pf,
  struct net_device *indev,
@@ -97,6 +97,12 @@ struct nf_hook_ops {
int priority;
 };
 
+struct nf_hook_entry {
+   struct nf_hook_entry __rcu  *next;
+   struct nf_hook_ops  ops;
+   const struct nf_hook_ops*orig_ops;
+};
+
 struct nf_sockopt_ops {
struct list_head list;
 
@@ -161,8 +167,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
 int (*okfn)(struct net *, struct sock *, 
struct sk_buff *),
 int thresh)
 {
-   struct list_head *hook_list;
-
 #ifdef HAVE_JUMP_LABEL
if (__builtin_constant_p(pf) &&
__builtin_constant_p(hook) &&
@@ -170,14 +174,14 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
int hook,
return 1;
 #endif
 
-   hook_list = >nf.hooks[pf][hook];
-
-   if (!list_empty(hook_list)) {
+   if (rcu_access_pointer(net->nf.hooks[pf][hook])) {
+   struct nf_hook_entry *hook_list;
struct nf_hook_state state;
int ret;
 
/* We may already have this, but read-locks nest anyway */
rcu_read_lock();
+   hook_list = rcu_dereference(net->nf.hooks[pf][hook]);
nf_hook_state_init(, hook_list, hook, thresh,
   pf, indev, outdev, sk, net, okfn);
 
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index 6965ba0..e3e3f6d 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -11,23 +11,27 @@ static inline bool nf_hook_ingress_active(const struct 
sk_buff *skb)
if 
(!static_key_false(_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_INGRESS]))
return false;
 #endif
-   return !list_empty(>dev->nf_hooks_ingress);
+   return rcu_access_pointer(skb->dev->nf_hooks_ingress) != NULL;
 }
 
 /* caller must hold rcu_read_lock */
 static inline int nf_hook_ingress(struct sk_buff *skb)
 {
+   struct nf_hook_entry *e = rcu_dereference(skb->dev->nf_hooks_ingress);
struct nf_hook_state state;
 
-   nf_hook_state_init(, >dev->nf_hooks_ingress,
-  NF_NETDEV_INGRESS, INT_MIN, NFPROTO_NETDEV,
-  skb->dev, NULL, NULL, dev_net(skb->dev), NULL);
+   if (unlikely(!e))
+   return 0;
+
+   nf_hook_state_init(, e, NF_NETDEV_INGRESS, INT_MIN,
+  NFPROTO_NETDEV, skb->dev, NULL, NULL,
+  dev_net(skb->dev), NULL);
return nf_hook_slow(skb, );
 }
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
 {
-   INIT_LIST_HEAD(>nf_hooks_ingress);
+   RCU_INIT_POINTER(dev->nf_hooks_ingress,

[PATCH nf-next 1/3] netfilter: bridge: add and use br_nf_hook_thresh

From: Florian Westphal 

This replaces the last uses of NF_HOOK_THRESH().
Followup patch will remove it and rename nf_hook_thresh.

The reason is that inet (non-bridge) netfilter no longer invokes the
hooks from hooks, so we do no longer need the thresh value to skip hooks
with a lower priority.

The bridge netfilter however may need to do this. br_nf_hook_thresh is a
wrapper that is supposed to do this, i.e. only call hooks with a
priority that exceeds NF_BR_PRI_BRNF.

It's used only in the recursion cases of br_netfilter.

Signed-off-by: Florian Westphal 
Signed-off-by: Aaron Conole 
---
 include/net/netfilter/br_netfilter.h |  6 
 net/bridge/br_netfilter_hooks.c  | 57 ++--
 net/bridge/br_netfilter_ipv6.c   | 12 
 3 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/include/net/netfilter/br_netfilter.h 
b/include/net/netfilter/br_netfilter.h
index e8d1448..0b0c35c 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -15,6 +15,12 @@ static inline struct nf_bridge_info *nf_bridge_alloc(struct 
sk_buff *skb)
 
 void nf_bridge_update_protocol(struct sk_buff *skb);
 
+int br_nf_hook_thresh(unsigned int hook, struct net *net, struct sock *sk,
+ struct sk_buff *skb, struct net_device *indev,
+ struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *,
+ struct sk_buff *));
+
 static inline struct nf_bridge_info *
 nf_bridge_info_get(const struct sk_buff *skb)
 {
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 2d25979..19f230c 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -395,11 +396,10 @@ bridged_dnat:
skb->dev = nf_bridge->physindev;
nf_bridge_update_protocol(skb);
nf_bridge_push_encap_header(skb);
-   NF_HOOK_THRESH(NFPROTO_BRIDGE,
-  NF_BR_PRE_ROUTING,
-  net, sk, skb, skb->dev, NULL,
-  br_nf_pre_routing_finish_bridge,
-  1);
+   br_nf_hook_thresh(NF_BR_PRE_ROUTING,
+ net, sk, skb, skb->dev,
+ NULL,
+ br_nf_pre_routing_finish);
return 0;
}
ether_addr_copy(eth_hdr(skb)->h_dest, dev->dev_addr);
@@ -417,10 +417,8 @@ bridged_dnat:
skb->dev = nf_bridge->physindev;
nf_bridge_update_protocol(skb);
nf_bridge_push_encap_header(skb);
-   NF_HOOK_THRESH(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, net, sk, skb,
-  skb->dev, NULL,
-  br_handle_frame_finish, 1);
-
+   br_nf_hook_thresh(NF_BR_PRE_ROUTING, net, sk, skb, skb->dev, NULL,
+ br_handle_frame_finish);
return 0;
 }
 
@@ -992,6 +990,47 @@ static struct notifier_block brnf_notifier __read_mostly = 
{
.notifier_call = brnf_device_event,
 };
 
+/* recursively invokes nf_hook_slow (again), skipping already-called
+ * hooks (< NF_BR_PRI_BRNF).
+ *
+ * Called with rcu read lock held.
+ */
+int br_nf_hook_thresh(unsigned int hook, struct net *net,
+ struct sock *sk, struct sk_buff *skb,
+ struct net_device *indev,
+ struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *,
+ struct sk_buf *))
+{
+   struct nf_hook_ops *elem;
+   struct nf_hook_state state;
+   struct list_head *head;
+   int ret;
+
+   head = >nf.hooks[NFPROTO_BRIDGE][hook];
+
+   list_for_each_entry_rcu(elem, head, list) {
+   struct nf_hook_ops *next;
+
+   next = list_entry_rcu(list_next_rcu(>list),
+ struct nf_hook_ops, list);
+   if (next->priority <= NF_BR_PRI_BRNF)
+   continue;
+   }
+
+   if (>list == head)
+   return okfn(net, sk, skb);
+
+   nf_hook_state_init(, head, hook, NF_BR_PRI_BRNF + 1,
+  NFPROTO_BRIDGE, indev, outdev, sk, net, okfn);
+
+   ret = nf_hook_slow(skb, );
+   if (ret == 1)
+   ret = okfn(net, sk, skb);
+
+   return ret;
+}
+
 #ifdef CONFIG_SYSCTL
 static
 int brnf_sysctl_call_tables(struct ctl_table *ctl, int write,
diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index

[PATCH nf-next 0/3] Compact netfilter hooks list

This series makes a simple change to shrink the netfilter hook list
from a double linked list, to a singly linked list.  Since the hooks
are always traversed in-order, there is no need to maintain a previous
pointer.

This series is being submitted for early feedback. This was jointly
developed by Florian Westphal.

Aaron Conole (1):
  netfilter: replace list_head with single linked list

Florian Westphal (2):
  netfilter: bridge: add and use br_nf_hook_thresh
  netfilter: call nf_hook_state_init with rcu_read_lock held

 include/linux/netdevice.h  |   2 +-
 include/linux/netfilter.h  |  26 --
 include/linux/netfilter_ingress.h  |  15 ++--
 include/net/netfilter/br_netfilter.h   |   6 ++
 include/net/netfilter/nf_queue.h   |   9 +-
 include/net/netns/netfilter.h  |   2 +-
 net/bridge/br_netfilter_hooks.c|  50 +--
 net/bridge/br_netfilter_ipv6.c |  12 ++-
 net/bridge/netfilter/ebt_redirect.c|   2 +-
 net/bridge/netfilter/ebtables.c|   2 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |   2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |   2 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |   2 +-
 net/netfilter/core.c   | 120 +++--
 net/netfilter/nf_conntrack_core.c  |   2 +-
 net/netfilter/nf_conntrack_h323_main.c |   2 +-
 net/netfilter/nf_conntrack_helper.c|   2 +-
 net/netfilter/nf_internals.h   |  10 +--
 net/netfilter/nf_queue.c   |  15 ++--
 net/netfilter/nfnetlink_cthelper.c |   2 +-
 net/netfilter/nfnetlink_log.c  |   8 +-
 net/netfilter/nfnetlink_queue.c|   7 +-
 net/netfilter/xt_helper.c  |   2 +-
 24 files changed, 193 insertions(+), 111 deletions(-)

-- 
2.5.5

[PATCH nf-next 2/3] netfilter: call nf_hook_state_init with rcu_read_lock held

From: Florian Westphal 

This makes things simpler because we can store the head of the list
in the nf_state structure without worrying about concurrent add/delete
of hook elements from the list.

Signed-off-by: Florian Westphal 
Signed-off-by: Aaron Conole 
---
 include/linux/netfilter.h  | 8 +++-
 include/linux/netfilter_ingress.h  | 1 +
 net/bridge/netfilter/ebt_redirect.c| 2 +-
 net/bridge/netfilter/ebtables.c| 2 +-
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   | 2 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 2 +-
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 2 +-
 net/netfilter/core.c   | 5 +
 net/netfilter/nf_conntrack_core.c  | 2 +-
 net/netfilter/nf_conntrack_h323_main.c | 2 +-
 net/netfilter/nf_conntrack_helper.c| 2 +-
 net/netfilter/nfnetlink_cthelper.c | 2 +-
 net/netfilter/nfnetlink_log.c  | 8 ++--
 net/netfilter/nfnetlink_queue.c| 2 +-
 net/netfilter/xt_helper.c  | 2 +-
 16 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 9230f9a..ad444f0 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -174,10 +174,16 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
int hook,
 
if (!list_empty(hook_list)) {
struct nf_hook_state state;
+   int ret;
 
+   /* We may already have this, but read-locks nest anyway */
+   rcu_read_lock();
nf_hook_state_init(, hook_list, hook, thresh,
   pf, indev, outdev, sk, net, okfn);
-   return nf_hook_slow(skb, );
+
+   ret = nf_hook_slow(skb, );
+   rcu_read_unlock();
+   return ret;
}
return 1;
 }
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index 5fcd375..6965ba0 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -14,6 +14,7 @@ static inline bool nf_hook_ingress_active(const struct 
sk_buff *skb)
return !list_empty(>dev->nf_hooks_ingress);
 }
 
+/* caller must hold rcu_read_lock */
 static inline int nf_hook_ingress(struct sk_buff *skb)
 {
struct nf_hook_state state;
diff --git a/net/bridge/netfilter/ebt_redirect.c 
b/net/bridge/netfilter/ebt_redirect.c
index 20396499..2e7c4f9 100644
--- a/net/bridge/netfilter/ebt_redirect.c
+++ b/net/bridge/netfilter/ebt_redirect.c
@@ -24,7 +24,7 @@ ebt_redirect_tg(struct sk_buff *skb, const struct 
xt_action_param *par)
return EBT_DROP;
 
if (par->hooknum != NF_BR_BROUTING)
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
ether_addr_copy(eth_hdr(skb)->h_dest,
br_port_get_rcu(par->in)->br->dev->dev_addr);
else
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 5a61f35..6faa2c3 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -148,7 +148,7 @@ ebt_basic_match(const struct ebt_entry *e, const struct 
sk_buff *skb,
return 1;
if (FWINV2(ebt_dev_check(e->out, out), EBT_IOUT))
return 1;
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
if (in && (p = br_port_get_rcu(in)) != NULL &&
FWINV2(ebt_dev_check(e->logical_in, p->br->dev), EBT_ILOGICALIN))
return 1;
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 
b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index ae1a71a..eab0239 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -110,7 +110,7 @@ static unsigned int ipv4_helper(void *priv,
if (!help)
return NF_ACCEPT;
 
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
helper = rcu_dereference(help->helper);
if (!helper)
return NF_ACCEPT;
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index c567e1b..2c08d6a 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -149,7 +149,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
return -NF_ACCEPT;
}
 
-   /* rcu_read_lock()ed by nf_hook_slow */
+   /* rcu_read_lock()ed by nf_hook_thresh */
innerproto = __nf_ct_l4proto_find(PF_INET, origtuple.dst.protonum);
 
/* Ordinarily, we'd expect the inverted tupleproto, but it's
diff

[PATCH net v2] net: bcmsysport: Device stats are unsigned long

On 64bits kernels, device stats are 64bits wide, not 32bits.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
driver")
Signed-off-by: Florian Fainelli 
---
Changes in v2:

- use a plain cast to unsigned long

 drivers/net/ethernet/broadcom/bcmsysport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 543bf38105c9..bfa26a2590c9 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -392,7 +392,7 @@ static void bcm_sysport_get_stats(struct net_device *dev,
else
p = (char *)priv;
p += s->stat_offset;
-   data[i] = *(u32 *)p;
+   data[i] = *(unsigned long *)p;
}
 }
 
-- 
2.7.4

Re: [PATCH net-next 19/19] rxrpc: Use RCU to access a peer's service connection tree

2016-06-30 Thread Peter Zijlstra

On Thu, Jun 30, 2016 at 05:36:51PM +0100, David Howells wrote:
> David Howells  wrote:
> 
> > > You want rb_link_node_rcu() here.
> > 
> > Should there be an rb_replace_node_rcu() also?
> 
> Or I could make rb_replace_node() RCU friendly.  What do you think of the
> attached changes (split into appropriate patches)?  It's a case of changing
> the order in which pointers are set in the rbtree code and inserting a
> barrier.

> diff --git a/lib/rbtree.c b/lib/rbtree.c
> index 1356454e36de..2b1a190c737c 100644
> --- a/lib/rbtree.c
> +++ b/lib/rbtree.c
> @@ -539,15 +539,17 @@ void rb_replace_node(struct rb_node *victim, struct 
> rb_node *new,
>  {
>   struct rb_node *parent = rb_parent(victim);
>  
> + /* Copy the pointers/colour from the victim to the replacement */
> + *new = *victim;
> +
>   /* Set the surrounding nodes to point to the replacement */
> - __rb_change_child(victim, new, parent, root);
>   if (victim->rb_left)
>   rb_set_parent(victim->rb_left, new);
>   if (victim->rb_right)
>   rb_set_parent(victim->rb_right, new);
>  
> - /* Copy the pointers/colour from the victim to the replacement */
> - *new = *victim;
> + /* Set the onward pointer last with an RCU barrier */
> + __rb_change_child_rcu(victim, new, parent, root);
>  }
>  EXPORT_SYMBOL(rb_replace_node);

So back when I did this work there was resistance to making the regular
RB-tree primitives more expensive for the rare RCU user. And I suspect
that this is still so.

Now, rb_replace_node() isn't a widely used primitive, so it might go
unnoticed, but since we already have rb_link_node_rcu() adding
rb_replace_node_rcu() is the consistent thing to do.


> diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
> index dc64211c5ee8..298ec300cfcc 100644
> --- a/net/rxrpc/conn_service.c
> +++ b/net/rxrpc/conn_service.c
> @@ -41,14 +41,14 @@ struct rxrpc_connection 
> *rxrpc_find_service_conn_rcu(struct rxrpc_peer *peer,
>*/
>   read_seqbegin_or_lock(>service_conn_lock, );
>  
> - p = peer->service_conns.rb_node;
> + p = rcu_dereference(peer->service_conns.rb_node);
>   while (p) {
>   conn = rb_entry(p, struct rxrpc_connection, 
> service_node);
>  
>   if (conn->proto.index_key < k.index_key)
> - p = p->rb_left;
> + p = rcu_dereference(p->rb_left);
>   else if (conn->proto.index_key > k.index_key)
> - p = p->rb_right;
> + p = rcu_dereference(p->rb_right);
>   else
>   goto done;
>   conn = NULL;
> @@ -90,7 +90,7 @@ rxrpc_publish_service_conn(struct rxrpc_peer *peer,
>   goto found_extant_conn;
>   }
>  
> - rb_link_node(>service_node, parent, pp);
> + rb_link_node_rcu(>service_node, parent, pp);
>   rb_insert_color(>service_node, >service_conns);
>  conn_published:
>   set_bit(RXRPC_CONN_IN_SERVICE_CONNS, >flags);

Yep, that's about right.

Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))

2016-06-30 Thread Steven Rostedt

On Thu, 30 Jun 2016 18:30:42 +
Trond Myklebust  wrote:

> Wait. So the NFS mount is still active, it’s just that the socket
> disconnected due to no traffic? That should be OK. Granted that the
> port can’t be reused by another process, but you really don’t want
> that: what if there are no other ports available and you start
> writing to a file on the NFS partition?

What would cause the port to be connected to a socket again? I copied a
large file to the nfs mount, and the hidden port is still there?

Remember, this wasn't always the case, the hidden port is a recent
issue.

I ran wireshark on this and it appears to create two ports for NFS. One
of them is canceled by the client (sends a FIN/ACK) and this port is
what lays around never to be used again, and uses the other port for
all connections after that.

When I unmount the NFS directory, the port is finally freed (but has no
socket attached to it). What is the purpose of keeping this port around?

I can reproduce this by having the client unmount and remount the
directory.

-- Steve

Re: [PATCH net] net: bcmsysport: Device stats are unsigned long

On 06/30/2016 11:33 AM, Andrew Lunn wrote:
> On Thu, Jun 30, 2016 at 10:56:29AM -0700, Florian Fainelli wrote:
>> On 64bits kernels, device stats are 64bits wide, not 32bits.
>>
>> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
>> driver")
>> Signed-off-by: Florian Fainelli 
>> ---
>>  drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
>> b/drivers/net/ethernet/broadcom/bcmsysport.c
>> index 543bf38105c9..21f21e23e695 100644
>> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
>> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
>> @@ -392,7 +392,11 @@ static void bcm_sysport_get_stats(struct net_device 
>> *dev,
>>  else
>>  p = (char *)priv;
>>  p += s->stat_offset;
>> -data[i] = *(u32 *)p;
> 
> Hi Florian
> 
> Could you not just change this cast from u32 to unsigned long and be
> done?

Seems like this would work yes, even with our mixture of u32 stats read
from HW and the software netdev stats, thanks!
-- 
Florian

Re: [Patch net] net_sched: fix mirrored packets checksum

2016-06-30 Thread Daniel Borkmann


Hi Cong,

On 06/30/2016 07:15 PM, Cong Wang wrote:

Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum 
validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim 
Cc: Tom Herbert 
Signed-off-by: Cong Wang 
---
  include/linux/skbuff.h | 19 +++
  net/core/skbuff.c  | 18 --
  net/sched/act_mirred.c |  2 +-
  3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ee38a41..61ab566 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2870,6 +2870,25 @@ static inline void skb_postpush_rcsum(struct sk_buff 
*skb,
  }

  /**
+ * skb_push_rcsum - push skb and update receive checksum
+ * @skb: buffer to update
+ * @len: length of data pulled
+ *
+ * This function performs an skb_push on the packet and updates
+ * the CHECKSUM_COMPLETE checksum.  It should be used on
+ * receive path processing instead of skb_push unless you know
+ * that the checksum difference is zero (e.g., a valid IP header)
+ * or you are setting ip_summed to CHECKSUM_NONE.
+ */
+static inline unsigned char *skb_push_rcsum(struct sk_buff *skb,
+   unsigned int len)
+{
+   skb_push(skb, len);
+   skb_postpush_rcsum(skb, skb->data, len);
+   return skb->data;
+}
+
+/**
   *pskb_trim_rcsum - trim received skb and update checksum
   *@skb: buffer to trim
   *@len: new length
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f2b77e5..eb12d21 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3016,24 +3016,6 @@ int skb_append_pagefrags(struct sk_buff *skb, struct 
page *page,
  EXPORT_SYMBOL_GPL(skb_append_pagefrags);

  /**
- * skb_push_rcsum - push skb and update receive checksum
- * @skb: buffer to update
- * @len: length of data pulled
- *
- * This function performs an skb_push on the packet and updates
- * the CHECKSUM_COMPLETE checksum.  It should be used on
- * receive path processing instead of skb_push unless you know
- * that the checksum difference is zero (e.g., a valid IP header)
- * or you are setting ip_summed to CHECKSUM_NONE.
- */
-static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len)
-{
-   skb_push(skb, len);
-   skb_postpush_rcsum(skb, skb->data, len);
-   return skb->data;
-}
-
-/**
   *skb_pull_rcsum - pull skb and update receive checksum
   *@skb: buffer to update
   *@len: length of data pulled


Fix looks good to me, just a minor comment.

Maybe makes sense to move skb_push_rcsum() but /also/ skb_pull_rcsum()
to the header then? Both seem similarly small at least (could be split
f.e into two patches then, first for the move, second for the actual fix).

Thanks,
Daniel

Re: [PATCH] net: stmmac: Fix null-function call in ISR on stmmac1000

2016-06-30 Thread Matt Corallo

Damn mail clients and their helpful corruption of patches...
Resent w/o the extran \n in the diff header.

On 06/29/16 07:58, David Miller wrote:
> From: Matt Corallo 
> Date: Sat, 25 Jun 2016 19:35:03 +
> 
>> At least on Meson GXBB, the CORE_IRQ_MTL_RX_OVERFLOW interrupt is thrown
>> with the stmmac1000 driver, which does not support set_rx_tail_ptr. With
>> this patch and the clock fixes, 1G ethernet works on ODROID-C2.
>>
>> Signed-off-by: Matt Corallo 
> 
> This patch does not apply without rejects to any of my trees.
> 
> ___
> linux-amlogic mailing list
> linux-amlo...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
>

[PATCH] net: stmmac: Fix null-function call in ISR on stmmac1000

2016-06-30 Thread Matt Corallo

(resent due to overhelpful mail client corrupting patch)

At least on Meson GXBB, the CORE_IRQ_MTL_RX_OVERFLOW interrupt is thrown
with the stmmac1000 driver, which does not support set_rx_tail_ptr. With
this patch and the clock fixes, 1G ethernet works on ODROID-C2.

Signed-off-by: Matt Corallo 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a473c18..e407126 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2804,7 +2804,7 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
priv->tx_path_in_lpi_mode = true;
if (status & CORE_IRQ_TX_PATH_EXIT_LPI_MODE)
priv->tx_path_in_lpi_mode = false;
-   if (status & CORE_IRQ_MTL_RX_OVERFLOW)
+   if (status & CORE_IRQ_MTL_RX_OVERFLOW && 
priv->hw->dma->set_rx_tail_ptr)
priv->hw->dma->set_rx_tail_ptr(priv->ioaddr,
priv->rx_tail_addr,
STMMAC_CHAN0);
-- 
2.1.4

[RFC PATCH] ila: Resolver mechanism

2016-06-30 Thread Tom Herbert

This is the first cut at an ILA resolver using LWT to implement
the hook to a userspace resolver.

The idea is that the kernel sets an ILA resolver route to the
SIR prefix, somrhting like:

ip route add ::/64 encap ila-resolve \
 via 2401:db00:20:911a::27:0 dev eth0

When a packet hits the route it is forwarded to the destination
using via path and also a rtnl message is generated with
group RTNLGRP_ILA_NOTIFY and type RTM_ADDR_RESOLVE. A userspace
daemon can listen for such messages and perform an ILA resolution
protocol to determine the ILA mapping. If the mapping is resolved
then a /128 ila encap router is set so that host can perform
ILA translation and send directly to destination.

This is not yet complete, we would still need to some controls
to rate limit number of resolution requests and a means to track
pending requests. I'm posting this as RFC because it seems like
this might be part of a general mechanism to a perform address
resolution in userspace and I would appreciate comments with
regard to that.

---
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |   5 ++
 net/ipv6/ila/Makefile  |   2 +-
 net/ipv6/ila/ila.h |   2 +
 net/ipv6/ila/ila_common.c  |   7 ++
 net/ipv6/ila/ila_resolver.c| 145 +
 6 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv6/ila/ila_resolver.c

diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index a478fe8..d880e49 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_IP,
LWTUNNEL_ENCAP_ILA,
LWTUNNEL_ENCAP_IP6,
+   LWTUNNEL_ENCAP_ILA_NOTIFY,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f037..271215f 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -144,6 +144,9 @@ enum {
RTM_GETSTATS = 94,
 #define RTM_GETSTATS RTM_GETSTATS
 
+   RTM_ADDR_RESOLVE = 95,
+#define RTM_ADDR_RESOLVE RTM_ADDR_RESOLVE
+
__RTM_MAX,
 #define RTM_MAX(((__RTM_MAX + 3) & ~3) - 1)
 };
@@ -656,6 +659,8 @@ enum rtnetlink_groups {
 #define RTNLGRP_MPLS_ROUTE RTNLGRP_MPLS_ROUTE
RTNLGRP_NSID,
 #define RTNLGRP_NSID   RTNLGRP_NSID
+   RTNLGRP_ILA_NOTIFY,
+#define RTNLGRP_ILA_NOTIFY RTNLGRP_ILA_NOTIFY
__RTNLGRP_MAX
 };
 #define RTNLGRP_MAX(__RTNLGRP_MAX - 1)
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
index 4b32e59..f2aadc3 100644
--- a/net/ipv6/ila/Makefile
+++ b/net/ipv6/ila/Makefile
@@ -4,4 +4,4 @@
 
 obj-$(CONFIG_IPV6_ILA) += ila.o
 
-ila-objs := ila_common.o ila_lwt.o ila_xlat.o
+ila-objs := ila_common.o ila_lwt.o ila_xlat.o ila_resolver.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index e0170f6..382d360 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -118,5 +118,7 @@ int ila_lwt_init(void);
 void ila_lwt_fini(void);
 int ila_xlat_init(void);
 void ila_xlat_fini(void);
+int ila_rslv_init(void);
+void ila_rslv_fini(void);
 
 #endif /* __ILA_H */
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index ec9efbc..0a09557 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -157,7 +157,13 @@ static int __init ila_init(void)
if (ret)
goto fail_xlat;
 
+   ret = ila_rslv_init();
+   if (ret)
+   goto fail_rslv;
+
return 0;
+fail_rslv:
+   ila_xlat_fini();
 fail_xlat:
ila_lwt_fini();
 fail_lwt:
@@ -168,6 +174,7 @@ static void __exit ila_fini(void)
 {
ila_xlat_fini();
ila_lwt_fini();
+   ila_rslv_fini();
 }
 
 module_init(ila_init);
diff --git a/net/ipv6/ila/ila_resolver.c b/net/ipv6/ila/ila_resolver.c
new file mode 100644
index 000..22bb2bd
--- /dev/null
+++ b/net/ipv6/ila/ila_resolver.c
@@ -0,0 +1,145 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "ila.h"
+
+struct ila_notify {
+   int type;
+   struct in6_addr addr;
+};
+
+#define ILA_NOTIFY_SIR_DEST 1
+
+static int ila_fill_notify(struct sk_buff *skb, struct in6_addr *addr,
+  u32 pid, u32 seq, int event, int flags)
+{
+   struct ila_notify *nila;
+   struct nlmsghdr *nlh;
+
+   nlh = nlmsg_put(skb, pid, seq, event, sizeof(*nila), flags);
+   if (nlh == NULL)
+   return -EMSGSIZE;
+
+   nila = nlmsg_data(nlh);
+   nila->type = ILA_NOTIFY_SIR_DEST;
+   nila->addr = *addr;
+
+   nlmsg_end(skb, nlh);
+
+   return 0;
+}
+
+void ila_rslv_notify(struct net *net, struct sk_buff *skb)
+{
+   struct ipv6hdr *ip6h = ipv6_hdr(skb);
+   struct sk_buff *nlskb;
+   int err = 0;
+
+   /* Send ILA notification to user */
+   nlskb =

Re: ethtool needs a new maintainer

2016-06-30 Thread Ben Hutchings

On Thu, 2016-06-30 at 14:27 -0500, Jorge Alberto Garcia wrote:
> On Thu, Jun 30, 2016 at 1:15 PM, John W. Linville
>  wrote:
> > On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
> > > On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > > > I've become steadily less enthusiastic and less responsive as a
> > > > maintainer over the past year or so.  I no longer work on networking
> > > > regularly, so it takes a lot more time to get into the right state of
> > > > mind to think about ethtool code, while I have other demands on my time
> > > > that tend to take priority.
> > > > 
> > > > So, I would like to find a new maintainer to take over as soon as
> > > > possible.  Ideally the new maintainer would have previous contributions
> > > > to ethtool and an existing account on kernel.org so that they can push
> > > > to the git repository and the home page.  But neither of those is
> > > > essential.  Please reply if you're interested.
> > > 
> > > I would like to take this responsibility. My previous contributions
> > > to ethtool are meager, but I think my skills and interests are suited
> > > to the task.  Plus, I already have a kernel.org account... :-)
> > 
> > Are there any other takers?  Or is this a done deal?
> > 
> 
> hi guys !, any link to a bugzilla  / patchwork  ?

There's nothing as organised as that, though it might be possible to
add categories for ethtool on  and
.

Ben.

-- 

Ben Hutchings
To err is human; to really foul things up requires a computer.


signature.asc
Description: This is a digitally signed message part

Re: ethtool needs a new maintainer

2016-06-30 Thread Jorge Alberto Garcia

On Thu, Jun 30, 2016 at 1:15 PM, John W. Linville
 wrote:
> On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
>> On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
>> > I've become steadily less enthusiastic and less responsive as a
>> > maintainer over the past year or so.  I no longer work on networking
>> > regularly, so it takes a lot more time to get into the right state of
>> > mind to think about ethtool code, while I have other demands on my time
>> > that tend to take priority.
>> >
>> > So, I would like to find a new maintainer to take over as soon as
>> > possible.  Ideally the new maintainer would have previous contributions
>> > to ethtool and an existing account on kernel.org so that they can push
>> > to the git repository and the home page.  But neither of those is
>> > essential.  Please reply if you're interested.
>>
>> I would like to take this responsibility. My previous contributions
>> to ethtool are meager, but I think my skills and interests are suited
>> to the task.  Plus, I already have a kernel.org account... :-)
>
> Are there any other takers?  Or is this a done deal?
>

hi guys !, any link to a bugzilla  / patchwork  ?

> John
> --
> John W. LinvilleSomeday the world will need a hero, and you
> linvi...@tuxdriver.com  might be all we have.  Be ready.

[PATCH 2/4] net: ethernet: ti: cpsw: add multi queue support

The cpsw h/w supports up to 8 tx and 8 rx channels.This patch adds
multi-queue support to the driver. An ability to configure h/w
shaper will be added with separate patch. Default shaper mode, as
before, priority mode.

The poll function handles all unprocessed channels, till all of
them are free, beginning from hi priority channel.

The statistic for every channel can be read with:
ethtool -S ethX

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  | 334 +---
 drivers/net/ethernet/ti/davinci_cpdma.c |  12 ++
 drivers/net/ethernet/ti/davinci_cpdma.h |   2 +
 3 files changed, 237 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index a713336..14d53eb 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -140,6 +140,8 @@ do {
\
 #define CPSW_CMINTMAX_INTVL(1000 / CPSW_CMINTMIN_CNT)
 #define CPSW_CMINTMIN_INTVL((1000 / CPSW_CMINTMAX_CNT) + 1)
 
+#define CPSW_MAX_QUEUES8
+
 #define cpsw_slave_index(priv) \
((priv->data.dual_emac) ? priv->emac_port : \
priv->data.active_slave)
@@ -383,7 +385,8 @@ struct cpsw_priv {
u8  mac_addr[ETH_ALEN];
struct cpsw_slave   *slaves;
struct cpdma_ctlr   *dma;
-   struct cpdma_chan   *txch, *rxch;
+   struct cpdma_chan   *txch[CPSW_MAX_QUEUES];
+   struct cpdma_chan   *rxch[CPSW_MAX_QUEUES];
struct cpsw_ale *ale;
boolrx_pause;
booltx_pause;
@@ -395,6 +398,7 @@ struct cpsw_priv {
u32 num_irqs;
struct cpts *cpts;
u32 emac_port;
+   int rx_ch_num, tx_ch_num;
 };
 
 struct cpsw_stats {
@@ -455,35 +459,26 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
{ "Rx Start of Frame Overruns", CPSW_STAT(rxsofoverruns) },
{ "Rx Middle of Frame Overruns", CPSW_STAT(rxmofoverruns) },
{ "Rx DMA Overruns", CPSW_STAT(rxdmaoverruns) },
-   { "Rx DMA chan: head_enqueue", CPDMA_RX_STAT(head_enqueue) },
-   { "Rx DMA chan: tail_enqueue", CPDMA_RX_STAT(tail_enqueue) },
-   { "Rx DMA chan: pad_enqueue", CPDMA_RX_STAT(pad_enqueue) },
-   { "Rx DMA chan: misqueued", CPDMA_RX_STAT(misqueued) },
-   { "Rx DMA chan: desc_alloc_fail", CPDMA_RX_STAT(desc_alloc_fail) },
-   { "Rx DMA chan: pad_alloc_fail", CPDMA_RX_STAT(pad_alloc_fail) },
-   { "Rx DMA chan: runt_receive_buf", CPDMA_RX_STAT(runt_receive_buff) },
-   { "Rx DMA chan: runt_transmit_buf", CPDMA_RX_STAT(runt_transmit_buff) },
-   { "Rx DMA chan: empty_dequeue", CPDMA_RX_STAT(empty_dequeue) },
-   { "Rx DMA chan: busy_dequeue", CPDMA_RX_STAT(busy_dequeue) },
-   { "Rx DMA chan: good_dequeue", CPDMA_RX_STAT(good_dequeue) },
-   { "Rx DMA chan: requeue", CPDMA_RX_STAT(requeue) },
-   { "Rx DMA chan: teardown_dequeue", CPDMA_RX_STAT(teardown_dequeue) },
-   { "Tx DMA chan: head_enqueue", CPDMA_TX_STAT(head_enqueue) },
-   { "Tx DMA chan: tail_enqueue", CPDMA_TX_STAT(tail_enqueue) },
-   { "Tx DMA chan: pad_enqueue", CPDMA_TX_STAT(pad_enqueue) },
-   { "Tx DMA chan: misqueued", CPDMA_TX_STAT(misqueued) },
-   { "Tx DMA chan: desc_alloc_fail", CPDMA_TX_STAT(desc_alloc_fail) },
-   { "Tx DMA chan: pad_alloc_fail", CPDMA_TX_STAT(pad_alloc_fail) },
-   { "Tx DMA chan: runt_receive_buf", CPDMA_TX_STAT(runt_receive_buff) },
-   { "Tx DMA chan: runt_transmit_buf", CPDMA_TX_STAT(runt_transmit_buff) },
-   { "Tx DMA chan: empty_dequeue", CPDMA_TX_STAT(empty_dequeue) },
-   { "Tx DMA chan: busy_dequeue", CPDMA_TX_STAT(busy_dequeue) },
-   { "Tx DMA chan: good_dequeue", CPDMA_TX_STAT(good_dequeue) },
-   { "Tx DMA chan: requeue", CPDMA_TX_STAT(requeue) },
-   { "Tx DMA chan: teardown_dequeue", CPDMA_TX_STAT(teardown_dequeue) },
 };
 
-#define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats)
+static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
+   { "head_enqueue", CPDMA_RX_STAT(head_enqueue) },
+   { "tail_enqueue", CPDMA_RX_STAT(tail_enqueue) },
+   { "pad_enqueue", CPDMA_RX_STAT(pad_enqueue) },
+   { "misqueued", CPDMA_RX_STAT(misqueued) },
+   { "desc_alloc_fail", CPDMA_RX_STAT(desc_alloc_fail) },
+   { "pad_alloc_fail", CPDMA_RX_STAT(pad_alloc_fail) },
+   { "runt_receive_buf", CPDMA_RX_STAT(runt_receive_buff) },
+   { "runt_transmit_buf", CPDMA_RX_STAT(runt_transmit_buff) },
+   { "empty_dequeue", CPDMA_RX_STAT(empty_dequeue) },
+   { "busy_dequeue", CPDMA_RX_STAT(busy_dequeue) },
+   { "good_dequeue", CPDMA_RX_STAT(good_dequeue) },
+   { "requeue", CPDMA_RX_STAT(requeue) },
+   { "teardown_dequeue",

[PATCH 3/4] net: ethernet: ti: davinci_cpdma: move cpdma channel struct macroses to internals

Better to move functions that works with channel internals to C file.
Currently it's not required for drivers to know rx or tx a channel
is, except create function. So correct "channel create" function, and
use all channel struct macroses only for internal use.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  |  6 ++
 drivers/net/ethernet/ti/davinci_cpdma.c | 13 +++--
 drivers/net/ethernet/ti/davinci_cpdma.h |  9 +
 drivers/net/ethernet/ti/davinci_emac.c  |  8 
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 14d53eb..595ed56 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2575,10 +2575,8 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_runtime_disable_ret;
}
 
-   priv->txch[0] = cpdma_chan_create(priv->dma, tx_chan_num(0),
- cpsw_tx_handler);
-   priv->rxch[0] = cpdma_chan_create(priv->dma, rx_chan_num(0),
- cpsw_rx_handler);
+   priv->txch[0] = cpdma_chan_create(priv->dma, 0, cpsw_tx_handler, 0);
+   priv->rxch[0] = cpdma_chan_create(priv->dma, 0, cpsw_rx_handler, 1);
 
if (WARN_ON(!priv->rxch[0] || !priv->txch[0])) {
dev_err(priv->dev, "error initializing dma channels\n");
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index a4b299d..d6c4967 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -126,6 +126,13 @@ struct cpdma_chan {
int int_set, int_clear, td;
 };
 
+#define tx_chan_num(chan)  (chan)
+#define rx_chan_num(chan)  ((chan) + CPDMA_MAX_CHANNELS)
+#define is_rx_chan(chan)   ((chan)->chan_num >= CPDMA_MAX_CHANNELS)
+#define is_tx_chan(chan)   (!is_rx_chan(chan))
+#define __chan_linear(chan_num)((chan_num) & (CPDMA_MAX_CHANNELS - 1))
+#define chan_linear(chan)  __chan_linear((chan)->chan_num)
+
 /* The following make access to common cpdma_ctlr params more readable */
 #define dmaregsparams.dmaregs
 #define num_chan   params.num_chan
@@ -520,12 +527,14 @@ static void cpdma_chan_split_pool(struct cpdma_ctlr *ctlr)
 }
 
 struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
-cpdma_handler_fn handler)
+cpdma_handler_fn handler, int rx_type)
 {
+   int offset = chan_num * 4;
struct cpdma_chan *chan;
-   int offset = (chan_num % CPDMA_MAX_CHANNELS) * 4;
unsigned long flags;
 
+   chan_num = rx_type ? rx_chan_num(chan_num) : tx_chan_num(chan_num);
+
if (__chan_linear(chan_num) >= ctlr->num_chan)
return NULL;
 
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.h 
b/drivers/net/ethernet/ti/davinci_cpdma.h
index 3ce91a1..52db03a 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.h
+++ b/drivers/net/ethernet/ti/davinci_cpdma.h
@@ -17,13 +17,6 @@
 
 #define CPDMA_MAX_CHANNELS BITS_PER_LONG
 
-#define tx_chan_num(chan)  (chan)
-#define rx_chan_num(chan)  ((chan) + CPDMA_MAX_CHANNELS)
-#define is_rx_chan(chan)   ((chan)->chan_num >= CPDMA_MAX_CHANNELS)
-#define is_tx_chan(chan)   (!is_rx_chan(chan))
-#define __chan_linear(chan_num)((chan_num) & (CPDMA_MAX_CHANNELS - 1))
-#define chan_linear(chan)  __chan_linear((chan)->chan_num)
-
 #define CPDMA_RX_SOURCE_PORT(__status__)   ((__status__ >> 16) & 0x7)
 
 #define CPDMA_EOI_RX_THRESH0x0
@@ -80,7 +73,7 @@ int cpdma_ctlr_stop(struct cpdma_ctlr *ctlr);
 int cpdma_ctlr_dump(struct cpdma_ctlr *ctlr);
 
 struct cpdma_chan *cpdma_chan_create(struct cpdma_ctlr *ctlr, int chan_num,
-cpdma_handler_fn handler);
+cpdma_handler_fn handler, int rx_type);
 int cpdma_chan_get_rx_buf_num(struct cpdma_chan *chan);
 int cpdma_chan_destroy(struct cpdma_chan *chan);
 int cpdma_chan_start(struct cpdma_chan *chan);
diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f56d66e..1df0c89 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -2008,10 +2008,10 @@ static int davinci_emac_probe(struct platform_device 
*pdev)
goto no_pdata;
}
 
-   priv->txchan = cpdma_chan_create(priv->dma, tx_chan_num(EMAC_DEF_TX_CH),
-  emac_tx_handler);
-   priv->rxchan = cpdma_chan_create(priv->dma, rx_chan_num(EMAC_DEF_RX_CH),
-  emac_rx_handler);
+   priv->txchan = cpdma_chan_create(priv->dma, EMAC_DEF_TX_CH,
+emac_tx_handler, 0);
+   priv->rxchan = cpdma_chan_create(priv->dma,

[PATCH 4/4] net: ethernet: ti: cpsw: add ethtool channels support

These ops allow to control number of channels driver is allowed to
work with. The maximum number of channels is 8 for rx and 8 for tx.
After this patch the following commands are possible:

$ ethtool -l eth0
$ ethtool -L eth0 rx 6 tx 6

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 188 +
 1 file changed, 188 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 595ed56..729b8be 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -740,6 +740,11 @@ static void cpsw_rx_handler(void *token, int len, int 
status)
}
 
 requeue:
+   if (netif_dormant(ndev)) {
+   dev_kfree_skb_any(new_skb);
+   return;
+   }
+
ch = priv->rxch[skb_get_queue_mapping(new_skb)];
ret = cpdma_chan_submit(ch, new_skb, new_skb->data,
skb_tailroom(new_skb), 0);
@@ -2077,6 +2082,187 @@ static void cpsw_ethtool_op_complete(struct net_device 
*ndev)
cpsw_err(priv, drv, "ethtool complete failed %d\n", ret);
 }
 
+static void cpsw_get_channels(struct net_device *dev,
+ struct ethtool_channels *ch)
+{
+   struct cpsw_priv *priv = netdev_priv(dev);
+
+   ch->max_combined = 0;
+   ch->max_rx = CPSW_MAX_QUEUES;
+   ch->max_tx = CPSW_MAX_QUEUES;
+   ch->max_other = 0;
+   ch->other_count = 0;
+   ch->rx_count = priv->rx_ch_num;
+   ch->tx_count = priv->tx_ch_num;
+   ch->combined_count = 0;
+}
+
+static int cpsw_check_ch_settings(struct cpsw_priv *priv,
+ struct ethtool_channels *ch)
+{
+   if (ch->combined_count)
+   return -EINVAL;
+
+   /* verify we have at least one channel in each direction */
+   if (!ch->rx_count || !ch->tx_count)
+   return -EINVAL;
+
+   if (ch->rx_count > priv->data.channels ||
+   ch->tx_count > priv->data.channels)
+   return -EINVAL;
+
+   return 0;
+}
+
+static void cpsw_sync_dual_ch_list(struct net_device *sdev,
+  struct net_device *ddev)
+{
+   struct cpsw_priv *priv_s, *priv_d;
+   int i;
+
+   priv_s = netdev_priv(sdev);
+   priv_d = netdev_priv(ddev);
+
+   priv_d->rx_ch_num = priv_s->rx_ch_num;
+   priv_d->tx_ch_num = priv_s->tx_ch_num;
+
+   for (i = 0; i < priv_d->tx_ch_num; i++)
+   priv_d->txch[i] = priv_s->txch[i];
+   for (i = 0; i < priv_d->rx_ch_num; i++)
+   priv_d->rxch[i] = priv_s->rxch[i];
+}
+
+static int cpsw_update_channels_res(struct cpsw_priv *priv, int ch_num, int rx)
+{
+   int (*poll)(struct napi_struct *, int);
+   void (*handler)(void *, int, int);
+   struct cpdma_chan **chan;
+   int *ch;
+   int ret;
+
+   if (rx) {
+   ch = >rx_ch_num;
+   chan = priv->rxch;
+   handler = cpsw_rx_handler;
+   poll = cpsw_rx_poll;
+   } else {
+   ch = >tx_ch_num;
+   chan = priv->txch;
+   handler = cpsw_tx_handler;
+   poll = cpsw_tx_poll;
+   }
+
+   while (*ch < ch_num) {
+   chan[*ch] = cpdma_chan_create(priv->dma, *ch, handler, rx);
+
+   if (IS_ERR(chan[*ch]))
+   return PTR_ERR(chan[*ch]);
+
+   if (!chan[*ch])
+   return -EINVAL;
+
+   dev_info(priv->dev, "created new %d %s channel\n", *ch,
+(rx ? "rx" : "tx"));
+   (*ch)++;
+   }
+
+   while (*ch > ch_num) {
+   int tch = *ch - 1;
+
+   ret = cpdma_chan_destroy(chan[tch]);
+   if (ret)
+   return ret;
+
+   dev_info(priv->dev, "destroyed %d %s channel\n", tch,
+(rx ? "rx" : "tx"));
+   (*ch)--;
+   }
+
+   return 0;
+}
+
+static int cpsw_update_channels(struct net_device *dev,
+   struct ethtool_channels *ch)
+{
+   struct cpsw_priv *priv;
+   int ret;
+
+   priv = netdev_priv(dev);
+
+   ret = cpsw_update_channels_res(priv, ch->rx_count, 1);
+   if (ret)
+   return ret;
+
+   ret = cpsw_update_channels_res(priv, ch->tx_count, 0);
+   if (ret)
+   return ret;
+
+   if (priv->data.dual_emac) {
+   int i;
+   /* mirror channels for another SL */
+   for (i = 0; i < priv->data.slaves; i++) {
+   if (priv->slaves[i].ndev == dev)
+   continue;
+
+   cpsw_sync_dual_ch_list(dev, priv->slaves[i].ndev);
+   }
+   }
+
+   return 0;
+}
+
+static int cpsw_set_channels(struct net_device *ndev,
+struct ethtool_channels *chs)
+{
+   struct

[PATCH 1/4] net: ethernet: ti: davinci_cpdma: split descs num between all channels

Currently the tx channels are using the same pool of descriptors.
Thus one channel can block another if pool is emptied by one.
But, the shaper should decide which channel is allowed to send
packets. To avoid such impact of one channel on another let every
channel to have its own peace of pool.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c  | 59 +
 drivers/net/ethernet/ti/davinci_cpdma.c | 54 --
 drivers/net/ethernet/ti/davinci_cpdma.h |  2 +-
 3 files changed, 89 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1a93a1f..a713336 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1230,6 +1230,39 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
}
 }
 
+static int cpsw_fill_rx_channels(struct net_device *ndev)
+{
+   struct cpsw_priv *priv = netdev_priv(ndev);
+   struct sk_buff *skb;
+   int ch_buf_num;
+   int i, ret;
+
+   ch_buf_num = cpdma_chan_get_rx_buf_num(priv->rxch);
+   for (i = 0; i < ch_buf_num; i++) {
+   skb = __netdev_alloc_skb_ip_align(ndev,
+ priv->rx_packet_max,
+ GFP_KERNEL);
+   if (!skb) {
+   dev_err(priv->dev, "cannot allocate skb\n");
+   return -ENOMEM;
+   }
+
+   ret = cpdma_chan_submit(priv->rxch, skb, skb->data,
+   skb_tailroom(skb), 0);
+   if (ret < 0) {
+   dev_err(priv->dev,
+   "cannot submit skb to rx channel, error %d\n",
+   ret);
+   kfree_skb(skb);
+   return ret;
+   }
+   }
+
+   cpsw_info(priv, ifup, "submitted %d rx descriptors\n", ch_buf_num);
+
+   return ch_buf_num;
+}
+
 static void cpsw_slave_stop(struct cpsw_slave *slave, struct cpsw_priv *priv)
 {
u32 slave_port;
@@ -1249,7 +1282,7 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, 
struct cpsw_priv *priv)
 static int cpsw_ndo_open(struct net_device *ndev)
 {
struct cpsw_priv *priv = netdev_priv(ndev);
-   int i, ret;
+   int ret;
u32 reg;
 
ret = pm_runtime_get_sync(>pdev->dev);
@@ -1282,7 +1315,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
 
if (!cpsw_common_res_usage_state(priv)) {
struct cpsw_priv *priv_sl0 = cpsw_get_slave_priv(priv, 0);
-   int buf_num;
 
/* setup tx dma to fixed prio and zero offset */
cpdma_control_set(priv->dma, CPDMA_TX_PRIO_FIXED, 1);
@@ -1310,26 +1342,9 @@ static int cpsw_ndo_open(struct net_device *ndev)
enable_irq(priv->irqs_table[0]);
}
 
-   buf_num = cpdma_chan_get_rx_buf_num(priv->dma);
-   for (i = 0; i < buf_num; i++) {
-   struct sk_buff *skb;
-
-   ret = -ENOMEM;
-   skb = __netdev_alloc_skb_ip_align(priv->ndev,
-   priv->rx_packet_max, GFP_KERNEL);
-   if (!skb)
-   goto err_cleanup;
-   ret = cpdma_chan_submit(priv->rxch, skb, skb->data,
-   skb_tailroom(skb), 0);
-   if (ret < 0) {
-   kfree_skb(skb);
-   goto err_cleanup;
-   }
-   }
-   /* continue even if we didn't manage to submit all
-* receive descs
-*/
-   cpsw_info(priv, ifup, "submitted %d rx descriptors\n", i);
+   ret = cpsw_fill_rx_channels(ndev);
+   if (ret < 0)
+   goto err_cleanup;
 
if (cpts_register(>pdev->dev, priv->cpts,
  priv->data.cpts_clock_mult,
diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 1c653ca..2f4b571 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -106,6 +106,7 @@ struct cpdma_ctlr {
struct cpdma_desc_pool  *pool;
spinlock_t  lock;
struct cpdma_chan   *channels[2 * CPDMA_MAX_CHANNELS];
+   int chan_num;
 };
 
 struct cpdma_chan {
@@ -262,6 +263,7 @@ struct cpdma_ctlr *cpdma_ctlr_create(struct cpdma_params 
*params)
ctlr->state = CPDMA_STATE_IDLE;
ctlr->params = *params;
ctlr->dev = params->dev;
+   ctlr->chan_num = 0;
spin_lock_init(>lock);
 
ctlr->pool = cpdma_desc_pool_create(ctlr->dev,
@@ -479,6 +481,32 @@ void cpdma_ctlr_eoi(struct

[PATCH 0/4] net: ethernet: ti: cpsw: add multi-queue support

This series is intended to allow cpsw driver to use its ability of h/w
shaper to send/receive data with up to 8 tx and rx queues. This series
doesn't contain interface to configure h/w shaper itself, it contains
only multi queue support part and ability to configure number of tx/rx
queues with ethtool. Default shaper mode - priority mode. The h/w
shaper configuration will be added with separate patch series.
This series doesn't affect on net throughput.

Tested on:
am572x-idk, 1Gbps link
am335-boneblack, 100Mbps link.

Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git

A simple example for splitting traffic on queues:

#check how many queues are supported and active:
$ ethtool -l eth0

#increase number of active rx and tx queues,
#by default 1 rx and 1 tx queue
#can be set any combination of 0 < rx <= 8 and 0 < tx <= 8
$ ethtool -L eth0 rx 8 tx 8

#set multi-queue-aware queuing discipline
$ tc qdisc add dev eth0 root handle 1: multiq

#send packets with ip 172.22.39.12 to queue #5 which can be
#prioritized or throughput limited by h/w shaper.
$ tc filter add dev eth0 parent 1: protocol ip prio 1 u32 \
match ip dst 172.22.39.12 \
action skbedit queue_mapping 5

#get statistic for active channels:
ethtool -S eth0

Ivan Khoronzhuk (4):
  net: ethernet: ti: davinci_cpdma: split descs num between all channels
  net: ethernet: ti: cpsw: add multi queue support
  net: ethernet: ti: davinci_cpdma: move cpdma channel struct macroses
to internals
  net: ethernet: ti: cpsw: add ethtool channels support

 drivers/net/ethernet/ti/cpsw.c  | 533 +---
 drivers/net/ethernet/ti/davinci_cpdma.c |  79 -
 drivers/net/ethernet/ti/davinci_cpdma.h |  13 +-
 drivers/net/ethernet/ti/davinci_emac.c  |   8 +-
 4 files changed, 505 insertions(+), 128 deletions(-)

-- 
1.9.1

Re: [PATCH net] net: bcmsysport: Device stats are unsigned long

2016-06-30 Thread Andrew Lunn

On Thu, Jun 30, 2016 at 10:56:29AM -0700, Florian Fainelli wrote:
> On 64bits kernels, device stats are 64bits wide, not 32bits.
> 
> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
> driver")
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
> b/drivers/net/ethernet/broadcom/bcmsysport.c
> index 543bf38105c9..21f21e23e695 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -392,7 +392,11 @@ static void bcm_sysport_get_stats(struct net_device *dev,
>   else
>   p = (char *)priv;
>   p += s->stat_offset;
> - data[i] = *(u32 *)p;

Hi Florian

Could you not just change this cast from u32 to unsigned long and be
done?

160 struct net_device_stats {
161 unsigned long   rx_packets;
162 unsigned long   tx_packets;

Andrew

Re: ethtool needs a new maintainer

2016-06-30 Thread John W. Linville

On Mon, Jun 27, 2016 at 09:51:47AM -0400, John W. Linville wrote:
> On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > I've become steadily less enthusiastic and less responsive as a
> > maintainer over the past year or so.  I no longer work on networking
> > regularly, so it takes a lot more time to get into the right state of
> > mind to think about ethtool code, while I have other demands on my time
> > that tend to take priority.
> > 
> > So, I would like to find a new maintainer to take over as soon as
> > possible.  Ideally the new maintainer would have previous contributions
> > to ethtool and an existing account on kernel.org so that they can push
> > to the git repository and the home page.  But neither of those is
> > essential.  Please reply if you're interested.
> 
> I would like to take this responsibility. My previous contributions
> to ethtool are meager, but I think my skills and interests are suited
> to the task.  Plus, I already have a kernel.org account... :-)

Are there any other takers?  Or is this a done deal?

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.

Re: [RFC 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-06-30 Thread Ray Jui


Hi Jon,

On 6/28/2016 12:34 PM, Jon Mason wrote:

Signed-off-by: Jon Mason 
---
 .../devicetree/bindings/net/brcm,bgmac-enet.txt | 21 +
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt

diff --git a/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt 
b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
new file mode 100644
index 000..efd36d5
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/brcm,bgmac-enet.txt
@@ -0,0 +1,21 @@
+Broadcom GMAC Ethernet Controller Device Tree Bindings
+-
+
+Required properties:
+ - compatible: "brcm,bgmac-enet"
+ - reg:Address and length of the GMAC registers,
+   Address and length of the GMAC IDM registers


As we know there will be additional optional register banks required for 
some of the other SoCs that the current driver has not yet supported. In 
my opinion, we should consider to make "reg-names" a mandatory property 
now and map the register blocks based on names.


I think this will help to make our life easier in the future when new 
optional SoC specific register blocks are added, such that we can map 
the register blocks based on names instead of indices, which will change 
and be different among different SoCs and will require much more complex 
logic in the driver to deal with.



+ - interrupts: Interrupt number
+
+Optional properties:
+- mac-address: mac address to be assigned to the device
+
+Examples:
+
+gmac0: enet@18022000 {
+   compatible = "brcm,bgmac-enet";
+   reg = <0x18022000 0x1000>,
+ <0x1811 0x1000>;
+   interrupts = ;
+   status = "disabled";
+};



Btw, I think Rob Herring should be included in the review for device 
tree binding document changes.


Thanks,

Ray

Re: [RFC 5/7] net: ethernet: bgmac: Add platform device support

2016-06-30 Thread Ray Jui


Hi Jon,

On 6/28/2016 12:34 PM, Jon Mason wrote:

The bcma portion of the driver has been split off into a bcma specific
driver.  This has been mirrored for the platform driver.  The last
references to the bcma core struct have been changed into a generic
function call.  These function calls are wrappers to either the original
bcma code or new platform functions that access the same areas via MMIO.
This necessitated adding function pointers for both platform and bcma to
hide which backend is being used from the generic bgmac code.

Signed-off-by: Jon Mason 
---
 drivers/net/ethernet/broadcom/Kconfig  |  23 +-
 drivers/net/ethernet/broadcom/Makefile |   4 +-
 drivers/net/ethernet/broadcom/bgmac-bcma.c | 315 
 drivers/net/ethernet/broadcom/bgmac-platform.c | 208 
 drivers/net/ethernet/broadcom/bgmac.c  | 327 -
 drivers/net/ethernet/broadcom/bgmac.h  |  73 +-
 6 files changed, 666 insertions(+), 284 deletions(-)
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
 create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index d74a92e..bd8c80c 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -140,10 +140,18 @@ config BNX2X_SRIOV
  allows for virtual function acceleration in virtual environments.

 config BGMAC
-   tristate "BCMA bus GBit core support"
+   tristate
+   help
+ This enables the integrated ethernet controller support for many
+ Broadcom (mostly iProc) SoCs. An appropriate bus interface driver
+ needs to be enabled to select this.
+
+config BGMAC_BCMA
+   tristate "Broadcom iProc GBit BCMA support"
depends on BCMA && BCMA_HOST_SOC
depends on HAS_DMA
depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
+   select BGMAC
select PHYLIB
select FIXED_PHY
---help---
@@ -152,6 +160,19 @@ config BGMAC
  In case of using this driver on BCM4706 it's also requires to enable
  BCMA_DRIVER_GMAC_CMN to make it work.

+config BGMAC_PLATFORM
+   tristate "Broadcom iProc GBit platform support"
+   depends on HAS_DMA
+   depends on ARCH_BCM_IPROC || COMPILE_TEST
+   depends on OF
+   select BGMAC
+   select PHYLIB
+   select FIXED_PHY
+   default ARCH_BCM_IPROC
+   ---help---
+ Say Y here if you want to use the Broadcom iProc Gigabit Ethernet
+ controller through the generic platform interface
+
 config SYSTEMPORT
tristate "Broadcom SYSTEMPORT internal MAC support"
depends on OF
diff --git a/drivers/net/ethernet/broadcom/Makefile 
b/drivers/net/ethernet/broadcom/Makefile
index f559794..79f2372 100644
--- a/drivers/net/ethernet/broadcom/Makefile
+++ b/drivers/net/ethernet/broadcom/Makefile
@@ -10,6 +10,8 @@ obj-$(CONFIG_CNIC) += cnic.o
 obj-$(CONFIG_BNX2X) += bnx2x/
 obj-$(CONFIG_SB1250_MAC) += sb1250-mac.o
 obj-$(CONFIG_TIGON3) += tg3.o
-obj-$(CONFIG_BGMAC) += bgmac.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC) += bgmac.o
+obj-$(CONFIG_BGMAC_BCMA) += bgmac-bcma.o bgmac-bcma-mdio.o
+obj-$(CONFIG_BGMAC_PLATFORM) += bgmac-platform.o
 obj-$(CONFIG_SYSTEMPORT) += bcmsysport.o
 obj-$(CONFIG_BNXT) += bnxt/
diff --git a/drivers/net/ethernet/broadcom/bgmac-bcma.c 
b/drivers/net/ethernet/broadcom/bgmac-bcma.c
new file mode 100644
index 000..9a9745c4
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bgmac-bcma.c
@@ -0,0 +1,315 @@
+/*
+ * Driver for (BCM4706)? GBit MAC core on BCMA bus.
+ *
+ * Copyright (C) 2012 Rafał Miłecki 
+ *
+ * Licensed under the GNU/GPL. See COPYING for details.
+ */
+
+#define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include "bgmac.h"
+
+static inline bool bgmac_is_bcm4707_family(struct bcma_device *core)
+{
+   switch (core->bus->chipinfo.id) {
+   case BCMA_CHIP_ID_BCM4707:
+   case BCMA_CHIP_ID_BCM47094:
+   case BCMA_CHIP_ID_BCM53018:
+   return true;
+   default:
+   return false;
+   }
+}
+
+/**
+ * BCMA bus ops
+ **/
+
+static u32 bcma_bgmac_read(struct bgmac *bgmac, u16 offset)
+{
+   return bcma_read32(bgmac->bcma.core, offset);
+}
+
+static void bcma_bgmac_write(struct bgmac *bgmac, u16 offset, u32 value)
+{
+   bcma_write32(bgmac->bcma.core, offset, value);
+}
+
+static u32 bcma_bgmac_idm_read(struct bgmac *bgmac, u16 offset)
+{
+   return bcma_aread32(bgmac->bcma.core, offset);
+}
+
+static void bcma_bgmac_idm_write(struct bgmac *bgmac, u16 offset, u32 value)
+{
+   return bcma_awrite32(bgmac->bcma.core, offset, value);
+}
+
+static bool bcma_bgmac_clk_enabled(struct bgmac *bgmac)
+{
+   return

[PATCH net] net: bcmsysport: Device stats are unsigned long

On 64bits kernels, device stats are 64bits wide, not 32bits.

Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC 
driver")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index 543bf38105c9..21f21e23e695 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -392,7 +392,11 @@ static void bcm_sysport_get_stats(struct net_device *dev,
else
p = (char *)priv;
p += s->stat_offset;
-   data[i] = *(u32 *)p;
+   if (sizeof(unsigned long) != sizeof(u32) &&
+   s->stat_sizeof == sizeof(unsigned long))
+   data[i] = *(unsigned long *)p;
+   else
+   data[i] = *(u32 *)p;
}
 }
 
-- 
2.7.4

[PATCH] wcn36xx: Implement print_reg indication

2016-06-30 Thread Bjorn Andersson

Some firmware versions sends a "print register indication", handle this
by printing out the content.

Cc: Nicolas Dechesne 
Signed-off-by: Bjorn Andersson 
---
 drivers/net/wireless/ath/wcn36xx/hal.h | 16 
 drivers/net/wireless/ath/wcn36xx/smd.c | 30 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/net/wireless/ath/wcn36xx/hal.h 
b/drivers/net/wireless/ath/wcn36xx/hal.h
index 4f87ef1e1eb8..b765c647319d 100644
--- a/drivers/net/wireless/ath/wcn36xx/hal.h
+++ b/drivers/net/wireless/ath/wcn36xx/hal.h
@@ -350,6 +350,8 @@ enum wcn36xx_hal_host_msg_type {
 
WCN36XX_HAL_AVOID_FREQ_RANGE_IND = 233,
 
+   WCN36XX_HAL_PRINT_REG_INFO_IND = 259,
+
WCN36XX_HAL_MSG_MAX = WCN36XX_HAL_MSG_TYPE_MAX_ENUM_SIZE
 };
 
@@ -4703,4 +4705,18 @@ struct stats_class_b_ind {
u32 rx_time_total;
 };
 
+/* WCN36XX_HAL_PRINT_REG_INFO_IND */
+struct wcn36xx_hal_print_reg_info_ind {
+   struct wcn36xx_hal_msg_header header;
+
+   u32 count;
+   u32 scenario;
+   u32 reason;
+
+   struct {
+   u32 addr;
+   u32 value;
+   } regs[];
+} __packed;
+
 #endif /* _HAL_H_ */
diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
b/drivers/net/wireless/ath/wcn36xx/smd.c
index 87a62eb6228c..28d6ca0ca819 100644
--- a/drivers/net/wireless/ath/wcn36xx/smd.c
+++ b/drivers/net/wireless/ath/wcn36xx/smd.c
@@ -2109,6 +2109,30 @@ static int wcn36xx_smd_delete_sta_context_ind(struct 
wcn36xx *wcn,
return -ENOENT;
 }
 
+static int wcn36xx_smd_print_reg_info_ind(struct wcn36xx *wcn,
+ void *buf,
+ size_t len)
+{
+   struct wcn36xx_hal_print_reg_info_ind *rsp = buf;
+   int i;
+
+   if (len < sizeof(*rsp)) {
+   wcn36xx_warn("Corrupted print reg info indication\n");
+   return -EIO;
+   }
+
+   wcn36xx_dbg(WCN36XX_DBG_HAL,
+   "reginfo indication, scenario: 0x%x reason: 0x%x\n",
+   rsp->scenario, rsp->reason);
+
+   for (i = 0; i < rsp->count; i++) {
+   wcn36xx_dbg(WCN36XX_DBG_HAL, "\t0x%x: 0x%x\n",
+   rsp->regs[i].addr, rsp->regs[i].value);
+   }
+
+   return 0;
+}
+
 int wcn36xx_smd_update_cfg(struct wcn36xx *wcn, u32 cfg_id, u32 value)
 {
struct wcn36xx_hal_update_cfg_req_msg msg_body, *body;
@@ -2238,6 +2262,7 @@ int wcn36xx_smd_rsp_process(struct qcom_smd_channel 
*channel,
case WCN36XX_HAL_OTA_TX_COMPL_IND:
case WCN36XX_HAL_MISSED_BEACON_IND:
case WCN36XX_HAL_DELETE_STA_CONTEXT_IND:
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
msg_ind = kmalloc(sizeof(*msg_ind) + len, GFP_ATOMIC);
if (!msg_ind) {
/*
@@ -2300,6 +2325,11 @@ static void wcn36xx_ind_smd_work(struct work_struct 
*work)
   hal_ind_msg->msg,
   hal_ind_msg->msg_len);
break;
+   case WCN36XX_HAL_PRINT_REG_INFO_IND:
+   wcn36xx_smd_print_reg_info_ind(wcn,
+  hal_ind_msg->msg,
+  hal_ind_msg->msg_len);
+   break;
default:
wcn36xx_err("SMD_EVENT (%d) not supported\n",
  msg_header->msg_type);
-- 
2.5.0

[PATCH net-next v3 4/4] cgroup: bpf: Add an example to do cgroup checking in BPF

test_cgrp2_array_pin.c:
A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
pouplates/updates it with a cgroup2's backed fd and pins it to a
bpf-fs's file.  The pinned file can be loaded by tc and then used
by the bpf prog later.  This program can also update an existing pinned
array and it could be useful for debugging/testing purpose.

test_cgrp2_tc_kern.c:
A bpf prog which should be loaded by tc.  It is to demonstrate
the usage of bpf_skb_in_cgroup.

test_cgrp2_tc.sh:
A script that glues the test_cgrp2_array_pin.c and
test_cgrp2_tc_kern.c together.  The idea is like:
1. Load the test_cgrp2_tc_kern.o by tc
2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
   with a cgroup fd
3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
   dropped because of a match on the cgroup

Most of the lines in test_cgrp2_tc.sh is the boilerplate
to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
not bulletproof on errors but should work well enough and
give enough debug info if things did not go well.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 samples/bpf/Makefile   |   3 +
 samples/bpf/bpf_helpers.h  |   2 +
 samples/bpf/test_cgrp2_array_pin.c | 109 ++
 samples/bpf/test_cgrp2_tc.sh   | 184 +
 samples/bpf/test_cgrp2_tc_kern.c   |  69 ++
 5 files changed, 367 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_array_pin.c
 create mode 100755 samples/bpf/test_cgrp2_tc.sh
 create mode 100644 samples/bpf/test_cgrp2_tc_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0bf2478..a98b780 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -20,6 +20,7 @@ hostprogs-y += offwaketime
 hostprogs-y += spintest
 hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
+hostprogs-y += test_cgrp2_array_pin
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -40,6 +41,7 @@ offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
 spintest-objs := bpf_load.o libbpf.o spintest_user.o
 map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
+test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -61,6 +63,7 @@ always += map_perf_test_kern.o
 always += test_overhead_tp_kern.o
 always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
+always += test_cgrp2_tc_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 7904a2a..84e3fd9 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -70,6 +70,8 @@ static int (*bpf_l3_csum_replace)(void *ctx, int off, int 
from, int to, int flag
(void *) BPF_FUNC_l3_csum_replace;
 static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int 
flags) =
(void *) BPF_FUNC_l4_csum_replace;
+static int (*bpf_skb_in_cgroup)(void *ctx, void *map, int index) =
+   (void *) BPF_FUNC_skb_in_cgroup;
 
 #if defined(__x86_64__)
 
diff --git a/samples/bpf/test_cgrp2_array_pin.c 
b/samples/bpf/test_cgrp2_array_pin.c
new file mode 100644
index 000..70e86f7
--- /dev/null
+++ b/samples/bpf/test_cgrp2_array_pin.c
@@ -0,0 +1,109 @@
+/* Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "libbpf.h"
+
+static void usage(void)
+{
+   printf("Usage: test_cgrp2_array_pin [...]\n");
+   printf("   -FFile to pin an BPF cgroup array\n");
+   printf("   -UUpdate an already pinned BPF cgroup 
array\n");
+   printf("   -v   Full path of the cgroup2\n");
+   printf("   -h  Display this help\n");
+}
+
+int main(int argc, char **argv)
+{
+   const char *pinned_file = NULL, *cg2 = NULL;
+   int create_array = 1;
+   int array_key = 0;
+   int array_fd = -1;
+   int cg2_fd = -1;
+   int ret = -1;
+   int opt;
+
+   while ((opt = getopt(argc, argv, "F:U:v:")) != -1) {
+   switch (opt) {
+   /* General args */
+   case 'F':
+   pinned_file = optarg;
+   break;
+   case 'U':
+   pinned_file = optarg;
+   create_array = 0;
+   break;
+   case 'v':
+   cg2 = optarg;
+   break;
+   default:
+

[PATCH net-next v3 3/4] cgroup: bpf: Add bpf_skb_in_cgroup_proto

Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
belongs to a descendant of a cgroup2.  It is similar to the
feature added in netfilter:
commit c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")

The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
which will be used by the bpf_skb_in_cgroup.

Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
and bpf_skb_in_cgroup() are always used together.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h | 12 
 kernel/bpf/verifier.c|  8 +++-
 net/core/filter.c| 38 ++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ef4e386..bad309f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -314,6 +314,18 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_get_tunnel_opt,
BPF_FUNC_skb_set_tunnel_opt,
+
+   /**
+* bpf_skb_in_cgroup(skb, map, index) - Check cgroup2 membership of skb
+* @skb: pointer to skb
+* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type
+* @index: index of the cgroup in the bpf_map
+* Return:
+*   == 0 skb failed the cgroup2 descendant test
+*   == 1 skb succeeded the cgroup2 descendant test
+*< 0 error
+*/
+   BPF_FUNC_skb_in_cgroup,
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0f6db58..68753e0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1063,7 +1063,9 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
goto error;
break;
case BPF_MAP_TYPE_CGROUP_ARRAY:
-   goto error;
+   if (func_id != BPF_FUNC_skb_in_cgroup)
+   goto error;
+   break;
default:
break;
}
@@ -1083,6 +1085,10 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
goto error;
break;
+   case BPF_FUNC_skb_in_cgroup:
+   if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY)
+   goto error;
+   break;
default:
break;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index df6860c..8134c98 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2024,6 +2024,40 @@ bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
}
 }
 
+#ifdef CONFIG_SOCK_CGROUP_DATA
+static u64 bpf_skb_in_cgroup(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   struct sk_buff *skb = (struct sk_buff *)(long)r1;
+   struct bpf_map *map = (struct bpf_map *)(long)r2;
+   struct bpf_array *array = container_of(map, struct bpf_array, map);
+   struct cgroup *cgrp;
+   struct sock *sk;
+   u32 i = (u32)r3;
+
+   sk = skb->sk;
+   if (!sk || !sk_fullsock(sk))
+   return -ENOENT;
+
+   if (unlikely(i >= array->map.max_entries))
+   return -E2BIG;
+
+   cgrp = READ_ONCE(array->ptrs[i]);
+   if (unlikely(!cgrp))
+   return -EAGAIN;
+
+   return cgroup_is_descendant(sock_cgroup_ptr(>sk_cgrp_data), cgrp);
+}
+
+static const struct bpf_func_proto bpf_skb_in_cgroup_proto = {
+   .func   = bpf_skb_in_cgroup,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_CONST_MAP_PTR,
+   .arg3_type  = ARG_ANYTHING,
+};
+#endif
+
 static const struct bpf_func_proto *
 sk_filter_func_proto(enum bpf_func_id func_id)
 {
@@ -2086,6 +2120,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return _get_route_realm_proto;
case BPF_FUNC_perf_event_output:
return bpf_get_event_output_proto();
+#ifdef CONFIG_SOCK_CGROUP_DATA
+   case BPF_FUNC_skb_in_cgroup:
+   return _skb_in_cgroup_proto;
+#endif
default:
return sk_filter_func_proto(func_id);
}
-- 
2.5.1

[PATCH net-next v3 2/4] cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY

Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
To update an element, the caller is expected to obtain a cgroup2 backed
fd by open(cgroup2_dir) and then update the array with that fd.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/arraymap.c| 43 +++
 kernel/bpf/syscall.c |  3 ++-
 kernel/bpf/verifier.c|  2 ++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406459b..ef4e386 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_PERCPU_HASH,
BPF_MAP_TYPE_PERCPU_ARRAY,
BPF_MAP_TYPE_STACK_TRACE,
+   BPF_MAP_TYPE_CGROUP_ARRAY,
 };
 
 enum bpf_prog_type {
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5af3073..588d66e 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -539,3 +539,46 @@ static int __init register_perf_event_array_map(void)
return 0;
 }
 late_initcall(register_perf_event_array_map);
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+static void *cgroup_fd_array_get_ptr(struct bpf_map *map,
+struct file *map_file /* not used */,
+int fd)
+{
+   return cgroup_get_from_fd(fd);
+}
+
+static void cgroup_fd_array_put_ptr(void *ptr)
+{
+   /* cgroup_put free cgrp after a rcu grace period */
+   cgroup_put(ptr);
+}
+
+static void cgroup_fd_array_free(struct bpf_map *map)
+{
+   bpf_fd_array_map_clear(map);
+   fd_array_map_free(map);
+}
+
+static const struct bpf_map_ops cgroup_array_ops = {
+   .map_alloc = fd_array_map_alloc,
+   .map_free = cgroup_fd_array_free,
+   .map_get_next_key = array_map_get_next_key,
+   .map_lookup_elem = fd_array_map_lookup_elem,
+   .map_delete_elem = fd_array_map_delete_elem,
+   .map_fd_get_ptr = cgroup_fd_array_get_ptr,
+   .map_fd_put_ptr = cgroup_fd_array_put_ptr,
+};
+
+static struct bpf_map_type_list cgroup_array_type __read_mostly = {
+   .ops = _array_ops,
+   .type = BPF_MAP_TYPE_CGROUP_ARRAY,
+};
+
+static int __init register_cgroup_array_map(void)
+{
+   bpf_register_map_type(_array_type);
+   return 0;
+}
+late_initcall(register_cgroup_array_map);
+#endif
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c23a4e93..cac13f1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -393,7 +393,8 @@ static int map_update_elem(union bpf_attr *attr)
} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
err = bpf_percpu_array_update(map, key, value, attr->flags);
} else if (map->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY ||
-  map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
+  map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+  map->map_type == BPF_MAP_TYPE_CGROUP_ARRAY) {
rcu_read_lock();
err = bpf_fd_array_map_update_elem(map, f.file, key, value,
   attr->flags);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 668e079..0f6db58 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1062,6 +1062,8 @@ static int check_map_func_compatibility(struct bpf_map 
*map, int func_id)
if (func_id != BPF_FUNC_get_stackid)
goto error;
break;
+   case BPF_MAP_TYPE_CGROUP_ARRAY:
+   goto error;
default:
break;
}
-- 
2.5.1

[Patch net] net_sched: fix mirrored packets checksum

Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum 
validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim 
Cc: Tom Herbert 
Signed-off-by: Cong Wang 
---
 include/linux/skbuff.h | 19 +++
 net/core/skbuff.c  | 18 --
 net/sched/act_mirred.c |  2 +-
 3 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ee38a41..61ab566 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2870,6 +2870,25 @@ static inline void skb_postpush_rcsum(struct sk_buff 
*skb,
 }
 
 /**
+ * skb_push_rcsum - push skb and update receive checksum
+ * @skb: buffer to update
+ * @len: length of data pulled
+ *
+ * This function performs an skb_push on the packet and updates
+ * the CHECKSUM_COMPLETE checksum.  It should be used on
+ * receive path processing instead of skb_push unless you know
+ * that the checksum difference is zero (e.g., a valid IP header)
+ * or you are setting ip_summed to CHECKSUM_NONE.
+ */
+static inline unsigned char *skb_push_rcsum(struct sk_buff *skb,
+   unsigned int len)
+{
+   skb_push(skb, len);
+   skb_postpush_rcsum(skb, skb->data, len);
+   return skb->data;
+}
+
+/**
  * pskb_trim_rcsum - trim received skb and update checksum
  * @skb: buffer to trim
  * @len: new length
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f2b77e5..eb12d21 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3016,24 +3016,6 @@ int skb_append_pagefrags(struct sk_buff *skb, struct 
page *page,
 EXPORT_SYMBOL_GPL(skb_append_pagefrags);
 
 /**
- * skb_push_rcsum - push skb and update receive checksum
- * @skb: buffer to update
- * @len: length of data pulled
- *
- * This function performs an skb_push on the packet and updates
- * the CHECKSUM_COMPLETE checksum.  It should be used on
- * receive path processing instead of skb_push unless you know
- * that the checksum difference is zero (e.g., a valid IP header)
- * or you are setting ip_summed to CHECKSUM_NONE.
- */
-static unsigned char *skb_push_rcsum(struct sk_buff *skb, unsigned len)
-{
-   skb_push(skb, len);
-   skb_postpush_rcsum(skb, skb->data, len);
-   return skb->data;
-}
-
-/**
  * skb_pull_rcsum - pull skb and update receive checksum
  * @skb: buffer to update
  * @len: length of data pulled
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 128942b..1f5bd6c 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -181,7 +181,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
 
if (!(at & AT_EGRESS)) {
if (m->tcfm_ok_push)
-   skb_push(skb2, skb->mac_len);
+   skb_push_rcsum(skb2, skb->mac_len);
}
 
/* mirror is always swallowed */
-- 
2.1.0

[PATCH net-next v3 1/4] cgroup: Add cgroup_get_from_fd

Add a helper function to get a cgroup2 from a fd.  It will be
stored in a bpf array (BPF_MAP_TYPE_CGROUP_ARRAY) which will
be introduced in the later patch.

Signed-off-by: Martin KaFai Lau 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Tejun Heo 
Acked-by: Tejun Heo 
---
 include/linux/cgroup.h |  1 +
 kernel/cgroup.c| 35 +++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index a20320c..984f73b 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -87,6 +87,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct 
dentry *dentry,
   struct cgroup_subsys 
*ss);
 
 struct cgroup *cgroup_get_from_path(const char *path);
+struct cgroup *cgroup_get_from_fd(int fd);
 
 int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
 int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 86cb5c6..14617968 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -62,6 +62,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -6205,6 +6206,40 @@ struct cgroup *cgroup_get_from_path(const char *path)
 }
 EXPORT_SYMBOL_GPL(cgroup_get_from_path);
 
+/**
+ * cgroup_get_from_fd - get a cgroup pointer from a fd
+ * @fd: fd obtained by open(cgroup2_dir)
+ *
+ * Find the cgroup from a fd which should be obtained
+ * by opening a cgroup directory.  Returns a pointer to the
+ * cgroup on success. ERR_PTR is returned if the cgroup
+ * cannot be found.
+ */
+struct cgroup *cgroup_get_from_fd(int fd)
+{
+   struct cgroup_subsys_state *css;
+   struct cgroup *cgrp;
+   struct file *f;
+
+   f = fget_raw(fd);
+   if (!f)
+   return ERR_PTR(-EBADF);
+
+   css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
+   fput(f);
+   if (IS_ERR(css))
+   return ERR_CAST(css);
+
+   cgrp = css->cgroup;
+   if (!cgroup_on_dfl(cgrp)) {
+   cgroup_put(cgrp);
+   return ERR_PTR(-EBADF);
+   }
+
+   return cgrp;
+}
+EXPORT_SYMBOL_GPL(cgroup_get_from_fd);
+
 /*
  * sock->sk_cgrp_data handling.  For more info, see sock_cgroup_data
  * definition in cgroup-defs.h.
-- 
2.5.1

[PATCH net-next v3 0/4] cgroup: bpf: cgroup2 membership test on skb

v3:
- Remove WARN_ON_ONCE(!rcu_read_lock_held())
- Stop BPF_MAP_TYPE_CGROUP_ARRAY usage in patch 2/4
- Avoid mounting bpf fs manually in patch 4/4

- Thanks for Daniel's review and the above suggestions

- Check CONFIG_SOCK_CGROUP_DATA instead of CONFIG_CGROUPS.  Thanks to
  the kbuild bot's report.
  Patch 2/4 only needs CONFIG_CGROUPS while patch 3/4 needs
  CONFIG_SOCK_CGROUP_DATA.  Since a single bpf cgrp2 array alone is
  not useful for now, CONFIG_SOCK_CGROUP_DATA is also used in
  patch 2/4.  We can fine tune it later if we find other use cases
  for the cgrp2 array.
- Return EAGAIN instead of ENOENT if the cgrp2 array entry is
  NULL.  It is to distinguish these two cases: 1) the userland has
  not populated this array entry yet. or 2) not finding cgrp2 from the skb.

- Be-lated thanks to Alexei and Tejun on reviewing v1 and giving advice on
  this work.

v2:
- Fix two return cases in cgroup_get_from_fd()
- Fix compilation errors when CONFIG_CGROUPS is not used:
  - arraymap.c: avoid registering BPF_MAP_TYPE_CGROUP_ARRAY
  - filter.c: tc_cls_act_func_proto() returns NULL on BPF_FUNC_skb_in_cgroup
- Add comments to BPF_FUNC_skb_in_cgroup and cgroup_get_from_fd()

v1 cover letter:
This series is to implement a bpf-way to
check the cgroup2 membership of a skb (sk_buff).

It is similar to the feature added in netfilter:
c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")

The current target is the tc-like usage.

Re: [PATCH v3] wlcore: Add support for get_expected_throughput opcode

2016-06-30 Thread kbuild test robot

Hi,

[auto build test ERROR on wireless-drivers-next/master]
[also build test ERROR on v4.7-rc5 next-20160630]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Maxim-Altshul/wlcore-Add-support-for-get_expected_throughput-opcode/20160630-234034
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
master
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha 

All error/warnings (new ones prefixed by >>):

   drivers/net/wireless/ti/wlcore/main.c: In function 
'wlcore_op_get_expected_throughput':
>> drivers/net/wireless/ti/wlcore/main.c:5678:28: error: 'struct 
>> wl1271_station' has no member named 'wl'
 struct wl1271 *wl = wl_sta->wl;
   ^
>> drivers/net/wireless/ti/wlcore/main.c:5682:26: error: 'struct wl1271_link' 
>> has no member named 'fw_rate_mbps'
  return (wl->links[hlid].fw_rate_mbps * 1000);
 ^
>> drivers/net/wireless/ti/wlcore/main.c:5683:1: warning: control reaches end 
>> of non-void function [-Wreturn-type]
}
^

vim +5678 drivers/net/wireless/ti/wlcore/main.c

  5672  mutex_unlock(>mutex);
  5673  }
  5674  
  5675  static u32 wlcore_op_get_expected_throughput(struct ieee80211_sta *sta)
  5676  {
  5677  struct wl1271_station *wl_sta = (struct wl1271_station 
*)sta->drv_priv;
> 5678  struct wl1271 *wl = wl_sta->wl;
  5679  u8 hlid = wl_sta->hlid;
  5680  
  5681  /* return in units of Kbps */
> 5682  return (wl->links[hlid].fw_rate_mbps * 1000);
> 5683  }
  5684  
  5685  static bool wl1271_tx_frames_pending(struct ieee80211_hw *hw)
  5686  {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

On Wed, Jun 29, 2016 at 7:23 AM, Tariq Toukan  wrote:
> Hi Cong,
>
>> See below. Does commit f8c6455bb04b944edb69e rely on any firmware
>> change to get an expected checksum?
>>
>> $ lspci -nn | grep -i mellanox
>> 82:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500
>> Family [ConnectX-3] [15b3:1003]
>>
>> $ ethtool -i eth0
>> driver: mlx4_en
>> version: 2.2-1 (Feb 2014)
>> firmware-version: 2.33.5220
>> bus-info: :82:00.0
>
> I used same HW and FW, but was not able to reproduce.
> I have kernel HEAD of net-next: 8a79813c1401 net: ethernet: dwc_eth_qos: use
> phy_ethtool_{get|set}_link_ksettings
>
> I ran regular ping, and ping6. Checksum complete counter increases, but no
> warnings in dmesg.

Thanks for testing it! This helps me a lot to identify the real bug.
Actually the bug is in mirred action instead of mlx4 driver, I have
a patch and just confirmed it fixes the bug. I will send it out in a
few minutes.

Again, thanks a lot for your help!

Re: [net-next PATCH v2 1/2] net: pktgen: support injecting packets for qdisc testing

On 16-06-30 01:37 AM, Jesper Dangaard Brouer wrote:
> On Wed, 29 Jun 2016 13:03:06 -0700
> John Fastabend  wrote:
> 
>> Add another xmit_mode to pktgen to allow testing xmit functionality
>> of qdiscs. The new mode "queue_xmit" injects packets at
>> __dev_queue_xmit() so that qdisc is called.
>>
>> Signed-off-by: John Fastabend 
> 
> I generally like this.
> 

[...]

>> @@ -3434,6 +3442,36 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
>>  #endif
>>  } while (--burst > 0);
>>  goto out; /* Skips xmit_mode M_START_XMIT */
>> +} else if (pkt_dev->xmit_mode == M_QUEUE_XMIT) {
>> +local_bh_disable();
>> +atomic_add(burst, _dev->skb->users);
> 
> Reading the code, people might think that "burst" is allowed for this
> mode, which it is not. (You do handle this earlier in this patch when
> configuring this mode).

Right we never get here without burst == 1 but sure it does read
a bit strange I'll use atomic_inc().


Thanks,
John

Re: [net-next PATCH 1/2] net: pktgen: support injecting packets for qdisc testing

On 16-06-30 03:21 AM, Jamal Hadi Salim wrote:
> On 16-06-29 03:47 PM, John Fastabend wrote:
>> Add another xmit_mode to pktgen to allow testing xmit functionality
>> of qdiscs. The new mode "queue_xmit" injects packets at
>> __dev_queue_xmit() so that qdisc is called.
>>
>> Signed-off-by: John Fastabend 
>> ---

[...]

> 
> Acked-by: Jamal Hadi Salim 
> 
> In travel mode, dont have much cycles right now - but can you review
> again:
> http://www.spinics.net/lists/netdev/msg359545.html
> I think you should disallow clone for example and i wasnt sure if you
> covered all error scenarios etc.
> 

Taking a look at the link couple differences exist. First the patch
linked does a 'netif_xmit_frozen_or_drv_stopped(txq)' check but this
really shouldn't be needed it is handled by the sch_direct_xmit()
logic in ./net/sched

Also in this patch I made it way more conservative on when to back
off then my original patch and now its closer to the one linked except
I also back off with return code NET_XMIT_CN.

As for clones what is the concern exactly? We allow them through the
ingress pktgen mode that can hit classifiers and I don't see any issues
testing with them.

.John

> cheers,
> jamal

Re: [PATCH net-next 19/19] rxrpc: Use RCU to access a peer's service connection tree

2016-06-30 Thread David Howells

David Howells  wrote:

> > You want rb_link_node_rcu() here.
> 
> Should there be an rb_replace_node_rcu() also?

Or I could make rb_replace_node() RCU friendly.  What do you think of the
attached changes (split into appropriate patches)?  It's a case of changing
the order in which pointers are set in the rbtree code and inserting a
barrier.

I also wonder if rb_insert_color() needs some attention - though possibly
that's okay as it doesn't start with unset pointers (since you call
rb_link_node_rcu() first).

David
---
diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
index 14d7b831b63a..d076183e49be 100644
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -130,6 +130,19 @@ __rb_change_child(struct rb_node *old, struct rb_node *new,
WRITE_ONCE(root->rb_node, new);
 }
 
+static inline void
+__rb_change_child_rcu(struct rb_node *old, struct rb_node *new,
+ struct rb_node *parent, struct rb_root *root)
+{
+   if (parent) {
+   if (parent->rb_left == old)
+   rcu_assign_pointer(parent->rb_left, new);
+   else
+   rcu_assign_pointer(parent->rb_right, new);
+   } else
+   rcu_assign_pointer(root->rb_node, new);
+}
+
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
 
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 1356454e36de..2b1a190c737c 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -539,15 +539,17 @@ void rb_replace_node(struct rb_node *victim, struct 
rb_node *new,
 {
struct rb_node *parent = rb_parent(victim);
 
+   /* Copy the pointers/colour from the victim to the replacement */
+   *new = *victim;
+
/* Set the surrounding nodes to point to the replacement */
-   __rb_change_child(victim, new, parent, root);
if (victim->rb_left)
rb_set_parent(victim->rb_left, new);
if (victim->rb_right)
rb_set_parent(victim->rb_right, new);
 
-   /* Copy the pointers/colour from the victim to the replacement */
-   *new = *victim;
+   /* Set the onward pointer last with an RCU barrier */
+   __rb_change_child_rcu(victim, new, parent, root);
 }
 EXPORT_SYMBOL(rb_replace_node);
 
diff --git a/net/rxrpc/conn_service.c b/net/rxrpc/conn_service.c
index dc64211c5ee8..298ec300cfcc 100644
--- a/net/rxrpc/conn_service.c
+++ b/net/rxrpc/conn_service.c
@@ -41,14 +41,14 @@ struct rxrpc_connection *rxrpc_find_service_conn_rcu(struct 
rxrpc_peer *peer,
 */
read_seqbegin_or_lock(>service_conn_lock, );
 
-   p = peer->service_conns.rb_node;
+   p = rcu_dereference(peer->service_conns.rb_node);
while (p) {
conn = rb_entry(p, struct rxrpc_connection, 
service_node);
 
if (conn->proto.index_key < k.index_key)
-   p = p->rb_left;
+   p = rcu_dereference(p->rb_left);
else if (conn->proto.index_key > k.index_key)
-   p = p->rb_right;
+   p = rcu_dereference(p->rb_right);
else
goto done;
conn = NULL;
@@ -90,7 +90,7 @@ rxrpc_publish_service_conn(struct rxrpc_peer *peer,
goto found_extant_conn;
}
 
-   rb_link_node(>service_node, parent, pp);
+   rb_link_node_rcu(>service_node, parent, pp);
rb_insert_color(>service_node, >service_conns);
 conn_published:
set_bit(RXRPC_CONN_IN_SERVICE_CONNS, >flags);

Re: [net-next PATCH v2 2/2] net: samples: pktgen mode samples/tests for qdisc layer

On 16-06-30 01:23 AM, Jesper Dangaard Brouer wrote:
> On Wed, 29 Jun 2016 13:03:26 -0700
> John Fastabend  wrote:
> 
>> This adds samples for pktgen to use with new mode to inject pkts into
>> the qdisc layer. This also doubles as nice test cases to test any
>> patches against qdisc layer.

[...]

>> +#
>> +# Benchmark script:
>> +#  - developed for benchmarking egress qdisc path, derived from
>> +#ingress benchmark script.
>> +#

As you probably gathered 'derived' is giving me too much credit here
its more like cut'n'pasted from ingress benchmark scrip :)

>> +# Script for injecting packets into egress qdisc path of the stack
>> +# with pktgen "xmit_mode queue_xmit".
>> +#
>> +basedir=`dirname $0`
>> +source ${basedir}/functions.sh
>> +root_check_run_with_sudo "$@"
>> +
>> +# Parameter parsing via include
>> +source ${basedir}/parameters.sh
>> +# Using invalid DST_MAC will cause the packets to get dropped in
>> +# ip_rcv() which is part of the test
>> +[ -z "$DEST_IP" ] && DEST_IP="198.18.0.42"
>> +[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
>> +
>> +# Burst greater than 1 are invalid but allow users to specify it and
>> +# get an error instead of silently ignoring it.
>> +[ -z "$BURST" ] && BURST=1
> 
> In other scripts I've rejected this at this step, instead of depending
> on failure when sending the burst option to pktgen. Like:
> 
> https://github.com/netoptimizer/network-testing/blob/master/pktgen/pktgen_sample04_many_flows.sh#L31-L33
> 

Agreed that is nicer. I had originally left it to make sure I was
catching the burst > 1 case in pktgen but will remove.

>> +
>> +# Base Config
>> +DELAY="0"# Zero means max speed
>> +COUNT="1000" # Zero means indefinitely
>> +
>> +# General cleanup everything since last run
>> +pg_ctrl "reset"
>> +
>> +# Threads are specified with parameter -t value in $THREADS
>> +for ((thread = 0; thread < $THREADS; thread++)); do
>> +# The device name is extended with @name, using thread number to
>> +# make then unique, but any name will do.
>> +dev=${DEV}@${thread}
>> +
>> +# Add remove all other devices and add_device $dev to thread
>> +pg_thread $thread "rem_device_all"
>> +pg_thread $thread "add_device" $dev
>> +
>> +# Base config of dev
>> +pg_set $dev "flag QUEUE_MAP_CPU"
>> +pg_set $dev "count $COUNT"
>> +pg_set $dev "pkt_size $PKT_SIZE"
>> +pg_set $dev "delay $DELAY"
>> +pg_set $dev "flag NO_TIMESTAMP"
>> +
>> +# Destination
>> +pg_set $dev "dst_mac $DST_MAC"
>> +pg_set $dev "dst $DEST_IP"
>> +
>> +# Inject packet into RX path of stack
> 
> Hmmm, maybe above comment need to be adjusted...

Yep.

> 
>> +pg_set $dev "xmit_mode queue_xmit"
>> +
>> +# Burst allow us to avoid measuring SKB alloc/free overhead
> 
> This comment is confusing, maybe just remove. Didn't think burst is a
> valid use-case.

Yep.

Re: [PATCH net-next V4 5/6] net: introduce NETDEV_CHANGE_TX_QUEUE_LEN

On 16-06-29 11:45 PM, Jason Wang wrote:
> This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this
> will be triggered when tx_queue_len. It could be used by net device
> who want to do some processing at that time. An example is tun who may
> want to resize tx array when tx_queue_len is changed.
> 
> Cc: John Fastabend 
> Signed-off-by: Jason Wang 
> ---


Thanks for adding the setlink case.

Acked-by: John Fastabend

Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

On 16-06-30 08:53 AM, Jiri Pirko wrote:
> Thu, Jun 30, 2016 at 05:40:57PM CEST, john.fastab...@gmail.com wrote:
>> On 16-06-30 03:52 AM, Jiri Pirko wrote:
>>> Thu, Jun 30, 2016 at 09:57:21AM CEST, john.fastab...@gmail.com wrote:
 On 16-06-30 12:41 AM, Jiri Pirko wrote:
> Thu, Jun 30, 2016 at 09:13:55AM CEST, sridhar.samudr...@intel.com wrote:
>>
>>
>> On 6/29/2016 11:25 PM, Jiri Pirko wrote:
>>> Thu, Jun 30, 2016 at 06:04:39AM CEST, john.fastab...@gmail.com wrote:
 On 16-06-29 08:35 PM, John Fastabend wrote:
> On 16-06-29 03:09 PM, John Fastabend wrote:
>> On 16-06-29 02:33 PM, Or Gerlitz wrote:
>>> On Wed, Jun 29, 2016 at 7:35 PM, John Fastabend
>>>  wrote:
 On 16-06-29 07:48 AM, Or Gerlitz wrote:
> On 6/28/2016 10:31 PM, John Fastabend wrote:
>> On 16-06-28 12:12 PM, Jiri Pirko wrote:
>>> Why?! Please, leave legacy be legacy. Use the new mode for
>>> implementing new features. Don't make things any more 
>>> complicated :(
>>> [...]
>> Maybe I'm reading to much into the devlink flag names and if 
>> instead
>> you use a switch like the following,
>>VF representer : enable/disable the creation VF netdev's to 
>> represent
>> the virtual functions on the PF
>> Much less complicated then magic switching between forwarding 
>> logic IMO
>> and you don't whack a default configuration that an entire stack 
>> (e.g.
>> libvirt) has been built to use.
> Re letting the user to observe/modify the rules added by the
> driver/firmware while legacy mode. Even if possible with 
> bridge/fdb, it
> will be really pragmatical and doesn't make sense to get that 
> donefor
> the TC subsystem. So this isn't a well defined solution and 
> anyway, as
> you said, legacy mode enhancements is a different exercise. 
> Personally,
> I agree with Jiri, that we should legacy be legacyand focus on 
> adding
> the new model.
 The ixgbe driver already supports bridge and tc commands without 
 the VF
 representer.  Adding the VF representer to these drivers just 
 extends
 the existing support so we have an identifier for VFs and now the
 redirect action works and the fdb commands can specify the VF 
 netdevs.
 I don't see this as a problem because we already do it today with
 'ip' and bridge tools.
>>> To be precise, for both ixgbe and mlx5, the existing tc support
>>> (u32/ixgbe, flower/mlx5) is not for switching functionality but 
>>> rather
>>> for NIC-ish one, e.g drop, mark, etc. Indeed in ixgbe you added
>>> redirect to VF, but this is only for south --> north (wire --> VF)
>>> traffic, w.o the VF rep you can't do the other way around.
>>>
>> Correct which is why we need the VF rep. So we are completely in
>> sync there.
>>
>>> Just to clarify, to what exact bridge command support did you refer 
>>> for ixgbe?
>> 'bridge fdb' commands are supported today on the PF. But its the
>> same story as above we need the VF rep to also use it on the
>> VF representer
>>
>> Also 'bridge link' command for veb/vepa modes is supported and the
>> other link attributes could be supported with additional driver
>> support. No need for core changes here. But again yes only on the
>> PF so again we need the VF reps.
>>
>>> The forwarding done in the legacy mode is not well defined, and
>>> different across vendors, adding there the VF reps will not make it
>>> any better b/c some steering rules will be set by tc/bridge offloads
>>> while other rules will be put by the driver.
>>> I don't see how this takes us to better place.
>> In legacy mode or any other mode you are defining some default policy
>> and rules.
>>
>> In the legacy mode we use mac/vlan assigned l2 forwarding entries in 
>> the
>> hardware fdb which are seen when you query 'ip link' and 'bridge fdb'
>> today. And similarly can be modified today using 'ip link' and 
>> 'bridge
>> fdb' at least on the intel devices. Its not undefined in any way with
>> a quick query of the tools we can learn exactly what the 
>> configuration
>> is and even change it. This works fairly well with existing 
>> controllers
>> and stacks.
>>
>> The limitations are 'ip' only supports a single MAC address per

Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))

2016-06-30 Thread Steven Rostedt

On Thu, 30 Jun 2016 11:23:41 -0400
Steven Rostedt  wrote:


> I can add more trace_printk()s if it would help.

I added a trace_printk() in inet_bind_bucket_destroy() to print out
some information on the socket used by xs_bind(), and it shows that the
bind destroy is called, but the list is not empty.



/*
 * Caller must hold hashbucket lock for this tb with local BH disabled
 */
void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct 
inet_bind_bucket *tb)
{
if (!current->mm && xs_port == tb->port) {
trace_printk("destroy %d empty=%d %p\n",
 tb->port, hlist_empty(>owners), tb);
trace_dump_stack(1);
}
if (hlist_empty(>owners)) {
__hlist_del(>node);
kmem_cache_free(cachep, tb);
}
}

I created "xs_port" to hold the port of the variable used by xs_bind,
and when it is called, the hlist_empty(>owners) returns false.

I'll add more trace_printks to find out where those owners are being
added.

-- Steve

Re: [PATCH net-next 19/19] rxrpc: Use RCU to access a peer's service connection tree

2016-06-30 Thread David Howells

Peter Zijlstra  wrote:

> > +   if (conn->proto.index_key < k.index_key)
> > +   p = p->rb_left;
> > +   else if (conn->proto.index_key > k.index_key)
> > +   p = p->rb_right;
> 
> You still very much need rcu_dereference() for both left and right
> pointers. As well as the first p load.

Bah...  Yes.  Good point.

> > +   rb_link_node(>service_node, parent, pp);
> 
> You want rb_link_node_rcu() here.

Should there be an rb_replace_node_rcu() also?

David

RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui

> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: Friday, July 1, 2016 0:12
> To: Dexuan Cui 
> Cc: da...@davemloft.net; gre...@linuxfoundation.org;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; a...@canonical.com; jasow...@redhat.com;
> Vitaly Kuznetsov ; Cathy Avery ;
> KY Srinivasan ; Haiyang Zhang
> ; j...@perches.com; Rolf Neugebauer
> 
> Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On Thu, Jun 30, Dexuan Cui wrote:
> 
> > -#define AF_MAX 43  /* For now.. */
> > +#define AF_MAX 44  /* For now.. */
> 
> Should this patch also change the places where AF_MAX is used,
> like all the arrays in net/core/sock.c?
> 
> Olaf

Thanks for the reminder, Olaf!

I think we may as well make a separate patch for this. 
It is in my To-Do list.

Thanks,
-- Dexuan

Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Olaf Hering

On Thu, Jun 30, Dexuan Cui wrote:

> -#define AF_MAX   43  /* For now.. */
> +#define AF_MAX   44  /* For now.. */

Should this patch also change the places where AF_MAX is used,
like all the arrays in net/core/sock.c?

Olaf


signature.asc
Description: PGP signature

[PATCH v14 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-06-30 Thread Dexuan Cui

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by
introducing a new socket address family AF_HYPERV.

You can also get the patch by:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v14

Note: the VMBus driver side's supporting patches have been in the mainline
tree.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://marc.info/?l=linux-netdev=145952064004765=2

However, though Hyper-V Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: , but in
AF_HYPERV, the endpoint type is: . Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like .notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patch.

Looking forward to your comments, especially comments from David. :-)

Changes since v1:
- updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature"
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: "default m" -> "default m if HYPERV"
- MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL"

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Changes since v4 (https://lkml.org/lkml/2015/7/28/404):
- addressed all the comments about V4.
- treat the hvsock offers/channels as special VMBus devices
- add a mechanism to pass hvsock events to the hvsock driver
- fixed some corner cases with proper locking when a connection is closed
- rebased to the latest Greg's tree

Changes since v5 (https://lkml.org/lkml/2015/12/24/103):
- addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!)
- used a better coding for the per-channel rescind callback (Thank Vitaly!)
- avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock()
and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket()
in the higher level (i.e., the vmsock driver). Thank Vitaly!

Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html)
- only a few minor changes of coding style and comments

Changes since v7
- a few minor changes of coding style: thanks, Joe Perches!
- added some lines of comments about GUID/UUID before the struct sockaddr_hv.

Changes since v8
- removed the unnecessary __packed for some definitions: thanks, David!
- hvsock_open_connection: use offer.u.pipe.user_def[0] to know the connection
and reorganized the function
direction
- reorganized the code according to suggestions from Cathy Avery: split big
functions into small ones, set .setsockopt and getsockopt to
sock_no_setsockopt/sock_no_getsockopt
- inline'd some small list helper functions

Changes since v9
- minimized struct hvsock_sock by making the send/recv buffers pointers.
the buffers are allocated by kmalloc() in __hvsock_create() now.
- minimized the sizes of the send/recv buffers and the vmbus ringbuffers.

Changes since v10

1) add module params: send_ring_page, recv_ring_page. They can be used to
enlarge the ringbuffer size to get better performance, e.g.,
# modprobe hv_sock recv_ring_page=16 send_ring_page=16
By default, recv_ring_page is 3 and send_ring_page is 2.

2) add module param max_socket_number (the default is 1024).
A user can enlarge the number to create more than 1024 hv_sock sockets.
By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here

[PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Vitaly Kuznetsov 
Cc: Cathy Avery 
---

You can also get the patch here (8ba95c8ec9):
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v14

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

Looking forward to your comments!
 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   24 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1519 +++
 10 files changed, 1635 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. */
+#define AF_MAX 44  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -251,6 +252,7 @@ struct ucred {
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
 #define PF_QIPCRTR AF_QIPCRTR
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h

Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread Jiri Pirko

Thu, Jun 30, 2016 at 05:40:57PM CEST, john.fastab...@gmail.com wrote:
>On 16-06-30 03:52 AM, Jiri Pirko wrote:
>> Thu, Jun 30, 2016 at 09:57:21AM CEST, john.fastab...@gmail.com wrote:
>>> On 16-06-30 12:41 AM, Jiri Pirko wrote:
 Thu, Jun 30, 2016 at 09:13:55AM CEST, sridhar.samudr...@intel.com wrote:
>
>
> On 6/29/2016 11:25 PM, Jiri Pirko wrote:
>> Thu, Jun 30, 2016 at 06:04:39AM CEST, john.fastab...@gmail.com wrote:
>>> On 16-06-29 08:35 PM, John Fastabend wrote:
 On 16-06-29 03:09 PM, John Fastabend wrote:
> On 16-06-29 02:33 PM, Or Gerlitz wrote:
>> On Wed, Jun 29, 2016 at 7:35 PM, John Fastabend
>>  wrote:
>>> On 16-06-29 07:48 AM, Or Gerlitz wrote:
 On 6/28/2016 10:31 PM, John Fastabend wrote:
> On 16-06-28 12:12 PM, Jiri Pirko wrote:
>> Why?! Please, leave legacy be legacy. Use the new mode for
>> implementing new features. Don't make things any more 
>> complicated :(
>> [...]
> Maybe I'm reading to much into the devlink flag names and if 
> instead
> you use a switch like the following,
>VF representer : enable/disable the creation VF netdev's to 
> represent
> the virtual functions on the PF
> Much less complicated then magic switching between forwarding 
> logic IMO
> and you don't whack a default configuration that an entire stack 
> (e.g.
> libvirt) has been built to use.
 Re letting the user to observe/modify the rules added by the
 driver/firmware while legacy mode. Even if possible with 
 bridge/fdb, it
 will be really pragmatical and doesn't make sense to get that 
 donefor
 the TC subsystem. So this isn't a well defined solution and 
 anyway, as
 you said, legacy mode enhancements is a different exercise. 
 Personally,
 I agree with Jiri, that we should legacy be legacyand focus on 
 adding
 the new model.
>>> The ixgbe driver already supports bridge and tc commands without 
>>> the VF
>>> representer.  Adding the VF representer to these drivers just 
>>> extends
>>> the existing support so we have an identifier for VFs and now the
>>> redirect action works and the fdb commands can specify the VF 
>>> netdevs.
>>> I don't see this as a problem because we already do it today with
>>> 'ip' and bridge tools.
>> To be precise, for both ixgbe and mlx5, the existing tc support
>> (u32/ixgbe, flower/mlx5) is not for switching functionality but 
>> rather
>> for NIC-ish one, e.g drop, mark, etc. Indeed in ixgbe you added
>> redirect to VF, but this is only for south --> north (wire --> VF)
>> traffic, w.o the VF rep you can't do the other way around.
>>
> Correct which is why we need the VF rep. So we are completely in
> sync there.
>
>> Just to clarify, to what exact bridge command support did you refer 
>> for ixgbe?
> 'bridge fdb' commands are supported today on the PF. But its the
> same story as above we need the VF rep to also use it on the
> VF representer
>
> Also 'bridge link' command for veb/vepa modes is supported and the
> other link attributes could be supported with additional driver
> support. No need for core changes here. But again yes only on the
> PF so again we need the VF reps.
>
>> The forwarding done in the legacy mode is not well defined, and
>> different across vendors, adding there the VF reps will not make it
>> any better b/c some steering rules will be set by tc/bridge offloads
>> while other rules will be put by the driver.
>> I don't see how this takes us to better place.
> In legacy mode or any other mode you are defining some default policy
> and rules.
>
> In the legacy mode we use mac/vlan assigned l2 forwarding entries in 
> the
> hardware fdb which are seen when you query 'ip link' and 'bridge fdb'
> today. And similarly can be modified today using 'ip link' and 'bridge
> fdb' at least on the intel devices. Its not undefined in any way with
> a quick query of the tools we can learn exactly what the configuration
> is and even change it. This works fairly well with existing 
> controllers
> and stacks.
>
> The limitations are 'ip' only supports a single MAC address per VF and
> 'tc' doesn't work on VF ports because when the VF is assigned to a VM
> or namespace we lose visibility of it. Providing a VF rep for

Re: [PATCH net-next 2/3] cxgb4/cxgb4vf: Add set VF mac address support

2016-06-30 Thread Hariprasad Shenai

On Thu, Jun 30, 2016 at 13:13:15 +, Yuval Mintz wrote:
> > +   /* verify MAC addr is valid */
> > +   if (!is_zero_ether_addr(mac) && !is_valid_ether_addr(mac) &&
> > +   is_multicast_ether_addr(mac)) {
> 
> This is really odd as verification goes; Currently this is a very elaborate
> way of checking for multicast, but I guess it's  probably a mistake.
> 
My bad, will send a V2

Re: [PATCH net-next V4 0/6] switch to use tx skb array in tun

2016-06-30 Thread Michael S. Tsirkin

On Thu, Jun 30, 2016 at 02:45:30PM +0800, Jason Wang wrote:
> Hi all:
> 
> This series tries to switch to use skb array in tun. This is used to
> eliminate the spinlock contention between producer and consumer. The
> conversion was straightforward: just introdce a tx skb array and use
> it instead of sk_receive_queue.
> 
> A minor issue is to keep the tx_queue_len behaviour, since tun used to
> use it for the length of sk_receive_queue. This is done through:
> 
> - add the ability to resize multiple rings at once to avoid handling
>   partial resize failure for mutiple rings.
> - add the support for zero length ring.
> - introduce a notifier which was triggered when tx_queue_len was
>   changed for a netdev.
> - resize all queues during the tx_queue_len changing.
> 
> Tests shows about 15% improvement on guest rx pps:
> 
> Before: ~130pps
> After : ~150pps

Acked-by: Michael S. Tsirkin 

Acked-from-altitude: 34697 feet.


> Changes from V3:
> - fix kbuild warnings
> - call NETDEV_CHANGE_TX_QUEUE_LEN on IFLA_TXQLEN
> 
> Changes from V2:
> - add multiple rings resizing support for ptr_ring/skb_array
> - add zero length ring support
> - introdce a NETDEV_CHANGE_TX_QUEUE_LEN
> - drop new flags
> 
> Changes from V1:
> - switch to use skb array instead of a customized circular buffer
> - add non-blocking support
> - rename .peek to .peek_len
> - drop lockless peeking since test show very minor improvement
> 
> Jason Wang (5):
>   ptr_ring: support zero length ring
>   skb_array: minor tweak
>   skb_array: add wrappers for resizing
>   net: introduce NETDEV_CHANGE_TX_QUEUE_LEN
>   tun: switch to use skb array for tx
> 
> Michael S. Tsirkin (1):
>   ptr_ring: support resizing multiple queues
> 
>  drivers/net/tun.c| 138 
> ---
>  drivers/vhost/net.c  |  16 -
>  include/linux/net.h  |   1 +
>  include/linux/netdevice.h|   1 +
>  include/linux/ptr_ring.h |  77 ++
>  include/linux/skb_array.h|  13 +++-
>  net/core/net-sysfs.c |  15 -
>  net/core/rtnetlink.c |  16 +++--
>  tools/virtio/ringtest/ptr_ring.c |   5 ++
>  9 files changed, 255 insertions(+), 27 deletions(-)
> 
> -- 
> 2.7.4

Re: [PATCH net-next V2 08/16] net/devlink: Add E-Switch mode control

2016-06-30 Thread Jiri Pirko

Thu, Jun 30, 2016 at 05:23:27PM CEST, sae...@mellanox.com wrote:
>From: Or Gerlitz 
>
>Add the commands to set and show the mode of SRIOV E-Switch, two modes
>are supported:
>
>* legacy: operating in the "old" L2 based mode (DMAC --> VF vport)
>
>* switchdev: the E-Switch is referred to as whitebox switch configured
>using standard tools such as tc, bridge, openvswitch etc. To allow
>working with the tools, for each VF, a VF representor netdevice is
>created by the E-Switch manager vendor device driver instance (e.g PF).
>
>Signed-off-by: Or Gerlitz 
>Signed-off-by: Saeed Mahameed 

Acked-by: Jiri Pirko

Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control