Re: [PATCH net-next v7 0/3] nfp act_vlan: Rewrite of the TC vlan action to use the RCU, and incorporated review comments

2017-11-10 Thread Manish Kurup
Hi Dave,

On Fri, Nov 10, 2017 at 8:39 AM, David Miller <da...@davemloft.net> wrote:
> From: Manish Kurup <kurup.man...@gmail.com>
> Date: Fri, 10 Nov 2017 07:30:09 -0500
>
>> The last version I sent out (with changes for comments), was v10, But
>> I noticed that you applied v7 (without the reverse xmas tree patch on
>> patch #3).
>
> Check my tree, I actually applied a later version than v7.
>
> Sorry for the confusion.
Yes, you're right. I did a 'git pull' this morning, and saw the log
with my patches in there, but the changes didnt include the reverse
xmas tree patches that I'd made after comments by Pieter, Alexander,
Jamal and you,my latest review version mail - v10. What I see after
the git pull, in the repo (net-next): v9.

I guess you dropped my reverse xmas tree changes (v10). Sorry for the confusion.

Thanks,


Re: [PATCH net-next v7 0/3] nfp act_vlan: Rewrite of the TC vlan action to use the RCU, and incorporated review comments

2017-11-10 Thread Manish Kurup
Hi Dave,

On Fri, Nov 10, 2017 at 12:53 AM, David Miller <da...@davemloft.net> wrote:
> From: Manish Kurup <kurup.man...@gmail.com>
> Date: Tue,  7 Nov 2017 08:50:00 -0500
>
>> This commit consists of 3 patches:
>>
>> patch1 (1/3):
>> The VLAN action maintains one set of stats across all cores, and uses a
>> spinlock to synchronize updates to it from the same. Changed this to use a
>> per-CPU stats context instead.
>> This change will result in better performance.
>>
>> patch2 (2/3):
>> Modified netronome nfp flower action to use VLAN helper functions instead
>> of accessing/referencing TC act_vlan private structures directly.
>>
>> patch3 (3/3):
>> Using a spinlock in the VLAN action causes performance issues when the VLAN
>> action is used on multiple cores. Rewrote the VLAN action to use RCU read
>> locking for reads and updates instead.
>> All functions now use an RCU dereferenced pointer to access the VLAN action
>> context. Modified helper functions used by other modules, to use the RCU as
>> opposed to directly accessing the structure.
>>
>> As part of this review, there were some changes suggested by reviewers.
>> I have incorporated all the changes that were requested.
>  ...
>
> Series applied, thank you.

Thanks for applying the patch, however -

The last version I sent out (with changes for comments), was v10, But
I noticed that you applied v7 (without the reverse xmas tree patch on
patch #3).

Was there something wrong with that patch? Pleas let me know.

Thanks.


Re: [PATCH net-next v10 1/3] act_vlan: Change stats update to use per-core stats

2017-11-08 Thread Manish Kurup
Hi Or,

On Wed, Nov 8, 2017 at 9:20 AM, Or Gerlitz <gerlitz...@gmail.com> wrote:
> On Wed, Nov 8, 2017 at 9:03 PM, Manish Kurup <kurup.man...@gmail.com> wrote:
>> The VLAN action maintains one set of stats across all cores, and uses a
>> spinlock to synchronize updates to it from the same. Changed this to use a
>> per-CPU stats context instead.
>> This change will result in better performance.
>>
>> Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
>> Acked-by: Jiri Pirko <j...@mellanox.com>
>> Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
>> ---
>>  net/sched/act_vlan.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
>> index 115fc33..8a35efe 100644
>> --- a/net/sched/act_vlan.c
>> +++ b/net/sched/act_vlan.c
>> @@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
>> tc_action *a,
>> int err;
>> u16 tci;
>>
>> -   spin_lock(>tcf_lock);
>> tcf_lastuse_update(>tcf_tm);
>> -   bstats_update(>tcf_bstats, skb);
>> +   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
>> +
>> +   spin_lock(>tcf_lock);
>> action = v->tcf_action;
>
> (if this was asked && answered in earlier Vs, sorry for that, if not and I got
> some small real problem here && you're @ netdev, maybe buy me Korean beer?)
>
> before your changes the spin lock also protected the lastuse update call but
> now it doesn't, why?
Phase I of my changes, was to get rid of spin_locks, and convert the
stats to a per-cpu stats model to get better forwarding performance.
While doing this, I looked at a few 'model TC actions' within
net/sched (tcf_mirred for example). Neither of them protected the
tcf_lastuse_update(). I assumed that this was the case because this
was a 'display-only' field, and as long as it changed to a latest
timestamp based on packets received, it was OK.

I tested this using our suite of traffic tests, and verified that the
last-use field did update, and did not cause any other problems.

Do you envision any issues that could be caused due to this?

Thanks,

-Manish


Re: [PATCH net-next v6 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-08 Thread Manish Kurup
Hi Dave,

On Wed, Nov 8, 2017 at 6:40 AM, Manish Kurup <kurup.man...@gmail.com> wrote:
> Hi Dave,
>
> On Tue, Nov 7, 2017 at 7:07 PM, David Miller <da...@davemloft.net> wrote:
>>
>> From: Alexander Duyck <alexander.du...@gmail.com>
>> Date: Tue, 7 Nov 2017 08:54:20 -0800
>>
>> > Are we really going to be so strict about the reverse xmas-tree that
>> > we won't allow for assignment w/ variable declaration because the
>> > dependency order won't fit into that format?
>>
>> Yes.
>>
>> > Last I knew this kind of setup was an exception to the reverse
>> > xmas-tree layout requirement because in this case 'p' relies on 'v' so
>> > we can't reorder these without having to kick the assignment of 'p'
>> > off onto a line by itself.
>>
>> Please just declare the variable naked without the assignment and do
>> the assignment down in the code.
>
> I have a changeset that I had made to incorporate the reverse xmas tree,
> doing the very thing you talk about, above.
> The only reason I didnt not send it out because it made more than minimal
> changes, especially how the 'opt' struct is defined.
>
> I will make the changes and send the review around once more.
>
> Thanks,
>
I have made the required changes, and sent the review around once more
(v10). Please let me know if this looks OK.

Thanks!

-Manish


[PATCH net-next v10 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-08 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 ++-
 net/sched/act_vlan.c | 88 +---
 2 files changed, 95 insertions(+), 39 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 8a35efe..f9e6b80 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-  

[PATCH net-next v10 2/3] nfp flower action: Modified to use VLAN helper functions

2017-11-08 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v10 0/3] net_sched actions: act_vlan now uses RCU

2017-11-08 Thread Manish Kurup
This commit consists of 3 patches:

patch1 (1/3):
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

patch2 (2/3):
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly. 

patch3 (3/3):
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

As part of this review, there were some changes suggested by reviewers.
I have incorporated all the changes that were requested.

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman
v6: Had forgotten cleanup functions for RCU alloc, added them - Dave Miller
v7: Re-formatted more locals to reverse-christmas-tree - Pieter V
v8: Reverted reverse-christmas-tree(v7), not required when dependencies
make it difficult to implement - Alexander D
v9: Cover letter subject change - Jamal
v10: Re-formatted locals in v7 back to using reverse xmas tree - Dave M

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (3):
  act_vlan: Change stats update to use per-core stats
  nfp flower action: Modified to use VLAN helper functions
  act_vlan: VLAN action rewrite to use RCU lock/unlock and update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 ---
 net/sched/act_vlan.c   | 94 ++
 3 files changed, 101 insertions(+), 44 deletions(-)

-- 
2.7.4



[PATCH net-next v10 1/3] act_vlan: Change stats update to use per-core stats

2017-11-08 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 115fc33..8a35efe 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



Fwd: [PATCH net-next v6 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-08 Thread Manish Kurup
Hi Dave,

On Tue, Nov 7, 2017 at 7:07 PM, David Miller  wrote:
>
> From: Alexander Duyck 
> Date: Tue, 7 Nov 2017 08:54:20 -0800
>
> > Are we really going to be so strict about the reverse xmas-tree that
> > we won't allow for assignment w/ variable declaration because the
> > dependency order won't fit into that format?
>
> Yes.
>
> > Last I knew this kind of setup was an exception to the reverse
> > xmas-tree layout requirement because in this case 'p' relies on 'v' so
> > we can't reorder these without having to kick the assignment of 'p'
> > off onto a line by itself.
>
> Please just declare the variable naked without the assignment and do
> the assignment down in the code.

I have a changeset that I had made to incorporate the reverse xmas
tree, doing the very thing you talk about, above.
The only reason I didnt not send it out because it made more than
minimal changes, especially how the 'opt' struct is defined.

I will make the changes and send the review around once more.

Thanks,


[PATCH net-next v9 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-07 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 75 ++--
 2 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 8a35efe..156a552 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-  

[PATCH net-next v9 2/3] nfp flower action: Modified to use VLAN helper functions

2017-11-07 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v9 1/3] act_vlan: Change stats update to use per-core stats

2017-11-07 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 115fc33..8a35efe 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v9 0/3] net_sched actions: act_vlan now uses RCU

2017-11-07 Thread Manish Kurup
This commit consists of 3 patches:

patch1 (1/3):
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

patch2 (2/3):
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly. 

patch3 (3/3):
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

As part of this review, there were some changes suggested by reviewers.
I have incorporated all the changes that were requested.

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman
v6: Had forgotten cleanup functions for RCU alloc, added them - Dave Miller
v7: Re-formatted more locals to reverse-christmas-tree - Pieter V
v8: Reverted reverse-christmas-tree(v7), not required when dependencies
make it difficult to implement - Alexander D
v9: Cover letter subject change - Jamal

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (3):
  act_vlan: Change stats update to use per-core stats
  nfp flower action: Modified to use VLAN helper functions
  act_vlan: VLAN action rewrite to use RCU lock/unlock and update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +---
 net/sched/act_vlan.c   | 81 +++---
 3 files changed, 94 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v8 2/3] nfp flower action: Modified to use VLAN helper functions

2017-11-07 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v8 1/3] act_vlan: Change stats update to use per-core stats

2017-11-07 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 115fc33..8a35efe 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v8 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-07 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 75 ++--
 2 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 8a35efe..156a552 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-  

[PATCH net-next v8 0/3] nfp act_vlan: Rewrite of the TC vlan action to use the RCU, and incorporated review comments

2017-11-07 Thread Manish Kurup
This commit consists of 3 patches:

patch1 (1/3):
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

patch2 (2/3):
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly. 

patch3 (3/3):
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

As part of this review, there were some changes suggested by reviewers.
I have incorporated all the changes that were requested.

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman
v6: Had forgotten cleanup functions for RCU alloc, added them - Dave Miller
v7: Re-formatted more locals to reverse-christmas-tree - Pieter V
v8: Reverted reverse-christmas-tree(v7), not required when dependencies
make it difficult to implement - Alexander D

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (3):
  act_vlan: Change stats update to use per-core stats
  nfp flower action: Modified to use VLAN helper functions
  act_vlan: VLAN action rewrite to use RCU lock/unlock and update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +---
 net/sched/act_vlan.c   | 81 +++---
 3 files changed, 94 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v7 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-07 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 75 ++--
 2 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 8a35efe..1f80dfc 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-  

[PATCH net-next v7 2/3] nfp flower action: Modified to use VLAN helper functions

2017-11-07 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly.

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v7 1/3] act_vlan: Change stats update to use per-core stats

2017-11-07 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 115fc33..8a35efe 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v7 0/3] nfp act_vlan: Rewrite of the TC vlan action to use the RCU, and incorporated review comments

2017-11-07 Thread Manish Kurup
This commit consists of 3 patches:

patch1 (1/3):
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

patch2 (2/3):
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing/referencing TC act_vlan private structures directly. 

patch3 (3/3):
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
All functions now use an RCU dereferenced pointer to access the VLAN action
context. Modified helper functions used by other modules, to use the RCU as
opposed to directly accessing the structure.

As part of this review, there were some changes suggested by reviewers.
I have incorporated all the changes that were requested.

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman
v6: Had forgotten cleanup functions for RCU alloc, added them - Dave Miller
v7: Re-formatted more locals to reverse-christmas-tree - Pieter V

Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuu...@netronome.com>
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>


Manish Kurup (3):
  act_vlan: Change stats update to use per-core stats
  nfp flower action: Modified to use VLAN helper functions
  act_vlan: VLAN action rewrite to use RCU lock/unlock and update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +---
 net/sched/act_vlan.c   | 81 +++---
 3 files changed, 94 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v6 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-03 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 75 ++--
 2 files changed, 88 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..97f717a 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio <

[PATCH net-next v6 2/3] nfp flower action: Modified to use VLAN helper functions

2017-11-03 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing the structure directly.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v6 0/3] Incorporated all required changes

2017-11-03 Thread Manish Kurup
Hi everyone,

Modified the netronome drivers (flower action) to use the VLAN helper
functions instead of dereferencing the structure directly. This is
required for the VLAN action patch.

Could you please review?

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman
v6: Added cleanup functions for RCU alloc - Dave Miller

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (3):
  act_vlan: Change stats update to use per-core stats
  nfp flower action: Modified to use VLAN helper functions
  act_vlan: VLAN action rewrite to use RCU lock/unlock and update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +---
 net/sched/act_vlan.c   | 81 +++---
 3 files changed, 94 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v6 1/3] act_vlan: Change stats update to use per-core stats

2017-11-03 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v5 3/3] act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-11-01 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 65 
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..148efc6d 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio <

[PATCH net-next v5 2/3] nfp flower action: Modified to use VLAN helper functions

2017-11-01 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing the structure directly.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v5 1/3] act_vlan: Change stats update to use per-core stats

2017-11-01 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v4 1/3] net sched act_vlan: Change stats update to use per-core stats

2017-11-01 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v5 0/3] act_vlan rewrite, review comments incorporated

2017-11-01 Thread Manish Kurup
Hi everyone,

Modified the netronome drivers (flower action) to use the VLAN helper functions
instead of dereferencing the structure directly. This is required for the VLAN
action patch.

Could you please review?

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen
v5: Got rid of the unlikely() for the allocation case - Simon Horman

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (3):
  act_vlan: Change stats update to use per-core stats
  nfp flower action: Modified to use VLAN helper functions
  act_vlan: VLAN action rewrite to use RCU lock/unlock and update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +++---
 net/sched/act_vlan.c   | 71 ++
 3 files changed, 84 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v4 3/3] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-10-31 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 65 
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..7f461f9 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio <

[PATCH net-next v4 2/3] drivers net ethernet netronome nfp flower: netronome nfp flower action

2017-10-31 Thread Manish Kurup
Modified netronome nfp flower action to use VLAN helper functions instead
of accessing the structure directly.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index de64ced..c1c595f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
-- 
2.7.4



[PATCH net-next v4 1/3] net sched act_vlan: Change stats update to use per-core stats

2017-10-31 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v4 0/3] Completed requested changes

2017-10-31 Thread Manish Kurup
Hi everyone,

Modified the netronome drivers (flower action) to use the VLAN helper functions
instead of dereferencing the structure directly. This is required for the VLAN
action patch.

Could you please review?

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v4: Modified the nfp flower action code to use the VLAN helper functions
instead of referencing the structure directly. Isolated this into a
separate patch - Pieter Jansen

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (3):
  net sched act_vlan: Change stats update to use per-core stats
  drivers net ethernet netronome nfp flower: netronome nfp flower action
  net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and
update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +++---
 net/sched/act_vlan.c   | 71 ++
 3 files changed, 84 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v3 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-10-29 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.
Fixed nxp flower action to use VLAN helper functions instead of accessing the
structure directly (build break error).

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 ---
 net/sched/act_vlan.c   | 65 ++
 3 files changed, 80 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 0a5fc9f..cbaa8ea 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -58,7 +58,6 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
 const struct tc_action *action)
 {
size_t act_size = sizeof(struct nfp_fl_push_vlan);
-   struct tcf_vlan *vlan = to_vlan(action);
u16 tmp_push_vlan_tci;
 
push_vlan->head.jump_id = NFP_FL_ACTION_OPCODE_PUSH_VLAN;
@@ -67,8 +66,8 @@ nfp_fl_push_vlan(struct nfp_fl_push_vlan *push_vlan,
push_vlan->vlan_tpid = tcf_vlan_push_proto(action);
 
tmp_push_vlan_tci =
-   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, vlan->tcfv_push_prio) |
-   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, vlan->tcfv_push_vid) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_PRIO, tcf_vlan_push_prio(action)) |
+   FIELD_PREP(NFP_FL_PUSH_VLAN_VID, tcf_vlan_push_vid(action)) |
NFP_FL_PUSH_VLAN_CFI;
push_vlan->vlan_tci = cpu_to_be16(tmp_push_vlan_tci);
 }
diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..7f461f9 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_acti

[PATCH net-next v3 1/2] net sched act_vlan: Change stats update to use per-core stats

2017-10-29 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v3 0/2] Completed requested changes

2017-10-29 Thread Manish Kurup
Hi everyone,

The kbuild robot detected an issue with building the netronome drivers (flower
action). Fixed them to use the VLAN helper functions instead of dereferencing
the structure directly.

Could you please review?

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri
v3: Fixed a build break issue with nxp flower action. Modified the NXP flower
action code to use the VLAN helper functions instead of referencing the
structure directly. Found by kbuild robot.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (2):
  net sched act_vlan: Change stats update to use per-core stats
  net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and
update

 drivers/net/ethernet/netronome/nfp/flower/action.c |  5 +-
 include/net/tc_act/tc_vlan.h   | 46 +++---
 net/sched/act_vlan.c   | 71 ++
 3 files changed, 84 insertions(+), 38 deletions(-)

-- 
2.7.4



[PATCH net-next v2 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-10-27 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 65 
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..7f461f9 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio <

[PATCH net-next v2 1/2] net sched act_vlan: Change stats update to use per-core stats

2017-10-27 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next v2 0/2] Completed requested changes

2017-10-27 Thread Manish Kurup
Hi everyone,

Apologies for the resend (newbie mistakes), but I made the following
corrections to the cover letter and patches:
* Added version and comment info to subjects of patches and all the changes
  (below). I used v2, since my earlier mails had no versions.
* Added ACKs from Jamal and Jiri to all patches and this cover letter

I've made all requested changes to the act_vlan TC action, tested them, and
here's the new changeset. Thanks a lot!

Here're the changes:
v2: Fixed all helper functions to use RCU (rtnl_dereference) - Eric, Jamal
v2: Fixed indentation, extra line nits - Jamal, Jiri
v2: Moved rcu_head to the end of the struct - Jiri
v2: Re-formatted locals to reverse-christmas-tree - Jiri
v2: Removed mismatched spin_lock() - Cong
v2: Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should
suffice - Cong, Jiri

Thanks for all the comments!
 
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (2):
  net sched act_vlan: Change stats update to use per-core stats
  net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and
update

 include/net/tc_act/tc_vlan.h | 46 ++--
 net/sched/act_vlan.c | 71 
 2 files changed, 82 insertions(+), 35 deletions(-)

-- 
2.7.4



[PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-10-27 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 65 
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..7f461f9 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio <

[PATCH net-next 0/2] Completed requested changes - RESEND with ACKS

2017-10-27 Thread Manish Kurup
Hi everyone,

Apologies for the resend, but wanted to add the acks by Jamal and Jiri for this
review contingent upon the changes that are made herein. Had forgotten to add
them the first time around.

I've made all requested changes to the act_vlan TC action, tested them, and
here's the new changeset. Could you please review and let me know
if I can commit this? Thanks a lot!

Here're the fixes:
*> Fixed all helper functions to use RCU (rtnl_dereference).
*> Fixed indentation, extra line nits
*> Moved rcu_head to the end of the struct
*> Converted to reverse-christmas-tree
*> Removed mismatched spin_lock()
*> Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should suffice.

Thanks for all the comments!
 
Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>

Manish Kurup (2):
  net sched act_vlan: Change stats update to use per-core stats
  net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and
update

 include/net/tc_act/tc_vlan.h | 46 ++--
 net/sched/act_vlan.c | 71 
 2 files changed, 82 insertions(+), 35 deletions(-)

-- 
2.7.4



[PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats

2017-10-27 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Acked-by: Jamal Hadi Salim <j...@mojatatu.com>
Acked-by: Jiri Pirko <j...@mellanox.com>
Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next 0/2] Completed requested changes

2017-10-26 Thread Manish Kurup
Hi everyone,

I've made all requested changes to the act_vlan TC action, tested them, and
here's the new changeset. Could you please review and let me know
if I can commit this? Thanks a lot!

Here're the fixes:
*> Fixed all helper functions to use RCU (rtnl_dereference).
*> Fixed indentation, extra line nits
*> Moved rcu_head to the end of the struct
*> Converted to reverse-christmas-tree
*> Removed mismatched spin_lock()
*> Removed spin_lock_bh() in tcf_vlan_init, rtnl_dereference() should suffice.

Thanks for all the comments!
 

Manish Kurup (2):
  net sched act_vlan: Change stats update to use per-core stats
  net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and
update

 include/net/tc_act/tc_vlan.h | 46 ++--
 net/sched/act_vlan.c | 71 
 2 files changed, 82 insertions(+), 35 deletions(-)

-- 
2.7.4



[PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-10-26 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 46 +--
 net/sched/act_vlan.c | 65 
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..22ae260 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+   struct rcu_head   rcu;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,45 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   u32 tcfv_action;
+
+   rcu_read_lock();
+   tcfv_action = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_action;
+   rcu_read_unlock();
+
+   return tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   u16 tcfv_push_vid;
+
+   rcu_read_lock();
+   tcfv_push_vid = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_vid;
+   rcu_read_unlock();
+
+   return tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   __be16 tcfv_push_proto;
+
+   rcu_read_lock();
+   tcfv_push_proto = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_proto;
+   rcu_read_unlock();
+
+   return tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
-}
+   u8 tcfv_push_prio;
 
+   rcu_read_lock();
+   tcfv_push_prio = rcu_dereference(to_vlan(a)->vlan_p)->tcfv_push_prio;
+   rcu_read_unlock();
+
+   return tcfv_push_prio;
+}
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index b093bad..7f461f9 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -26,6 +26,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
struct tcf_result *res)
 {
struct tcf_vlan *v = to_vlan(a);
+   struct tcf_vlan_params *p;
int action;
int err;
u16 tci;
@@ -33,24 +34,27 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
@@ -69,14 +73,14 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio << VLAN_PRIO_SHIFT;
+   tci |= p->tcfv_push_prio << VLAN_PRIO_SHIF

[PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats

2017-10-26 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..b093bad 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+_vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4



[PATCH net-next 2/2] net sched act_vlan: VLAN action rewrite to use RCU lock/unlock and update

2017-10-10 Thread Manish Kurup
Using a spinlock in the VLAN action causes performance issues when the VLAN
action is used on multiple cores. Rewrote the VLAN action to use RCU read
locking for reads and updates instead.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 include/net/tc_act/tc_vlan.h | 21 -
 net/sched/act_vlan.c | 73 ++--
 2 files changed, 63 insertions(+), 31 deletions(-)

diff --git a/include/net/tc_act/tc_vlan.h b/include/net/tc_act/tc_vlan.h
index c2090df..67fd355 100644
--- a/include/net/tc_act/tc_vlan.h
+++ b/include/net/tc_act/tc_vlan.h
@@ -13,12 +13,17 @@
 #include 
 #include 
 
+struct tcf_vlan_params {
+   struct rcu_head   rcu;
+   int   tcfv_action;
+   u16   tcfv_push_vid;
+   __be16tcfv_push_proto;
+   u8tcfv_push_prio;
+};
+
 struct tcf_vlan {
struct tc_actioncommon;
-   int tcfv_action;
-   u16 tcfv_push_vid;
-   __be16  tcfv_push_proto;
-   u8  tcfv_push_prio;
+   struct tcf_vlan_params __rcu *vlan_p;
 };
 #define to_vlan(a) ((struct tcf_vlan *)a)
 
@@ -33,22 +38,22 @@ static inline bool is_tcf_vlan(const struct tc_action *a)
 
 static inline u32 tcf_vlan_action(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_action;
+   return to_vlan(a)->vlan_p->tcfv_action;
 }
 
 static inline u16 tcf_vlan_push_vid(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_vid;
+   return to_vlan(a)->vlan_p->tcfv_push_vid;
 }
 
 static inline __be16 tcf_vlan_push_proto(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_proto;
+   return to_vlan(a)->vlan_p->tcfv_push_proto;
 }
 
 static inline u8 tcf_vlan_push_prio(const struct tc_action *a)
 {
-   return to_vlan(a)->tcfv_push_prio;
+   return to_vlan(a)->vlan_p->tcfv_push_prio;
 }
 
 #endif /* __NET_TC_VLAN_H */
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 14c262c..9bb0236 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -29,31 +29,37 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int action;
int err;
u16 tci;
+   struct tcf_vlan_params *p;
 
tcf_lastuse_update(>tcf_tm);
bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
 
-   spin_lock(>tcf_lock);
-   action = v->tcf_action;
-
/* Ensure 'data' points at mac_header prior calling vlan manipulating
 * functions.
 */
if (skb_at_tc_ingress(skb))
skb_push_rcsum(skb, skb->mac_len);
 
-   switch (v->tcfv_action) {
+   rcu_read_lock();
+
+   action = READ_ONCE(v->tcf_action);
+
+   p = rcu_dereference(v->vlan_p);
+
+   switch (p->tcfv_action) {
case TCA_VLAN_ACT_POP:
err = skb_vlan_pop(skb);
if (err)
goto drop;
break;
+
case TCA_VLAN_ACT_PUSH:
-   err = skb_vlan_push(skb, v->tcfv_push_proto, v->tcfv_push_vid |
-   (v->tcfv_push_prio << VLAN_PRIO_SHIFT));
+   err = skb_vlan_push(skb, p->tcfv_push_proto, p->tcfv_push_vid |
+   (p->tcfv_push_prio << VLAN_PRIO_SHIFT));
if (err)
goto drop;
break;
+
case TCA_VLAN_ACT_MODIFY:
/* No-op if no vlan tag (either hw-accel or in-payload) */
if (!skb_vlan_tagged(skb))
@@ -69,15 +75,16 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
goto drop;
}
/* replace the vid */
-   tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+   tci = (tci & ~VLAN_VID_MASK) | p->tcfv_push_vid;
/* replace prio bits, if tcfv_push_prio specified */
-   if (v->tcfv_push_prio) {
+   if (p->tcfv_push_prio) {
tci &= ~VLAN_PRIO_MASK;
-   tci |= v->tcfv_push_prio << VLAN_PRIO_SHIFT;
+   tci |= p->tcfv_push_prio << VLAN_PRIO_SHIFT;
}
/* put updated tci as hwaccel tag */
-   __vlan_hwaccel_put_tag(skb, v->tcfv_push_proto, tci);
+   __vlan_hwaccel_put_tag(skb, p->tcfv_push_proto, tci);
break;
+
default:
BUG();
}
@@ -89,6 +96,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
 
 unlock:
+   rcu_read_unlock();
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
 
@@ -111,6 +119,7 @@ static i

[PATCH net-next 1/2] net sched act_vlan: Change stats update to use per-core stats

2017-10-10 Thread Manish Kurup
The VLAN action maintains one set of stats across all cores, and uses a
spinlock to synchronize updates to it from the same. Changed this to use a
per-CPU stats context instead.
This change will result in better performance.

Signed-off-by: Manish Kurup <manish.ku...@verizon.com>
---
 net/sched/act_vlan.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 16eb067..14c262c 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,9 +30,10 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
int err;
u16 tci;
 
-   spin_lock(>tcf_lock);
tcf_lastuse_update(>tcf_tm);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(v->common.cpu_bstats), skb);
+
+   spin_lock(>tcf_lock);
action = v->tcf_action;
 
/* Ensure 'data' points at mac_header prior calling vlan manipulating
@@ -85,7 +86,8 @@ static int tcf_vlan(struct sk_buff *skb, const struct 
tc_action *a,
 
 drop:
action = TC_ACT_SHOT;
-   v->tcf_qstats.drops++;
+   qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+
 unlock:
if (skb_at_tc_ingress(skb))
skb_pull_rcsum(skb, skb->mac_len);
@@ -172,7 +174,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a,
-_vlan_ops, bind, false);
+   _vlan_ops, bind, true);
if (ret)
return ret;
 
-- 
2.7.4