Re: [PATCH iproute2/net-next v2] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-10-02 Thread Eelco Chaudron



On 1 Oct 2018, at 17:12, David Ahern wrote:

> On 10/1/18 4:29 AM, Eelco Chaudron wrote:
>>>> Hi Stephen, anything else required for this patch to be accepted?
>>>>
>>>> FYI the kernel side of this patch has been excepted on net-next.
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>>
>>> David Ahern handles net-next see patchwork
>>>   https://patchwork.ozlabs.org/patch/956225/
>>>
>>> I think he was just waiting for the kernel part to merge.
>>
>> Thanks for making me aware of the patchwork for iproute.
>>
>
> Now that the kernel side is in, please re-send the iproute2 patches.

Rebased the patch on the latest iproute2-next and sent a v3.

//Eelco


[PATCH iproute2/net-next v3] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-10-02 Thread Eelco Chaudron
Add support for showing hardware specific counters to easy
troubleshooting hardware offload.

$ tc -s filter show dev enp3s0np0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 1 ref 1 bind 1 installed 0 sec used 0 sec
Action statistics:
Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 187542 bytes 4077 pkt
Sent hardware 534697200 bytes 8911620 pkt
backlog 0b 0p requeues 0
cookie 89173e6a7001becfd486bda17e29


Signed-off-by: Eelco Chaudron 

---
v3:
 * Resent as kernel side is now in
 * Rebased on latest iproute2-next branch

v2:
 * Removed unnecessary initialization
 * Made not displaying of missing TCA_STATS_BASIC_HW more obvious
 * Use _SL_ macro for single line output

 tc/tc_util.c |   41 +
 1 file changed, 41 insertions(+)

diff --git a/tc/tc_util.c b/tc/tc_util.c
index cafbe49..a082c73 100644
--- a/tc/tc_util.c
+++ b/tc/tc_util.c
@@ -754,6 +754,44 @@ void print_tm(FILE *f, const struct tcf_t *tm)
}
 }
 
+static void print_tcstats_basic_hw(struct rtattr **tbs, char *prefix)
+{
+   struct gnet_stats_basic bs_hw;
+
+   if (!tbs[TCA_STATS_BASIC_HW])
+   return;
+
+   memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
+
+   if (bs_hw.bytes == 0 && bs_hw.packets == 0)
+   return;
+
+   if (tbs[TCA_STATS_BASIC]) {
+   struct gnet_stats_basic bs;
+
+   memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
+  sizeof(bs)));
+
+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "%s", _SL_);
+   print_string(PRINT_FP, NULL, "%s", prefix);
+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+   }
+
+   print_string(PRINT_FP, NULL, "%s", _SL_);
+   print_string(PRINT_FP, NULL, "%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);
+   print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
+}
+
 void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char *prefix, struct 
rtattr **xstats)
 {
SPRINT_BUF(b1);
@@ -780,6 +818,9 @@ void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char 
*prefix, struct rtat
print_uint(PRINT_ANY, "requeues", " requeues %u) ", q.requeues);
}
 
+   if (tbs[TCA_STATS_BASIC_HW])
+   print_tcstats_basic_hw(tbs, prefix);
+
if (tbs[TCA_STATS_RATE_EST64]) {
struct gnet_stats_rate_est64 re = {0};
 



Re: [PATCH iproute2/net-next v2] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-10-01 Thread Eelco Chaudron



On 1 Oct 2018, at 11:10, Stephen Hemminger wrote:

> On Mon, 01 Oct 2018 09:08:32 +0200
> "Eelco Chaudron"  wrote:
>
>> On 10 Aug 2018, at 16:48, Eelco Chaudron wrote:
>>
>>> On 10 Aug 2018, at 16:44, Stephen Hemminger wrote:
>>>
>>>> On Fri, 10 Aug 2018 07:59:30 -0400
>>>> Eelco Chaudron  wrote:
>>>>
>>>>> + if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
>>>>> + print_string(PRINT_FP, NULL, "%s", _SL_);
>>>>> + print_string(PRINT_FP, NULL, "%s", prefix);
>>>>> + print_lluint(PRINT_ANY, "sw_bytes",
>>>>> +  "Sent software %llu bytes",
>>>>> +  bs.bytes - bs_hw.bytes);
>>>>> + print_uint(PRINT_ANY, "sw_packets", " %u pkt",
>>>>> +bs.packets - bs_hw.packets);
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + print_string(PRINT_FP, NULL, "%s", _SL_);
>>>>> + print_string(PRINT_FP, NULL, "%s", prefix);
>>>>> + print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
>>>>> +  bs_hw.bytes);
>>>>
>>>> What does the output look like?
>>>
>>> See the two +’es below:
>>>
>>> $ tc -s filter show dev enp3s0np0 parent :
>>> filter protocol ip pref 1 flower chain 0
>>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>>   eth_type ipv4
>>>   dst_ip 2.0.0.0
>>>   src_ip 1.0.0.0
>>>   ip_flags nofrag
>>>   in_hw
>>> action order 1: mirred (Egress Redirect to device eth1) stolen
>>> index 1 ref 1 bind 1 installed 0 sec used 0 sec
>>> Action statistics:
>>> Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0
>>> requeues 0)
>>> +   Sent software 187542 bytes 4077 pkt
>>> +   Sent hardware 534697200 bytes 8911620 pkt
>>> backlog 0b 0p requeues 0
>>> cookie 89173e6a7001becfd486bda17e29
>>
>> Hi Stephen, anything else required for this patch to be accepted?
>>
>> FYI the kernel side of this patch has been excepted on net-next.
>>
>> Cheers,
>>
>> Eelco
>
> David Ahern handles net-next see patchwork
>   https://patchwork.ozlabs.org/patch/956225/
>
> I think he was just waiting for the kernel part to merge.

Thanks for making me aware of the patchwork for iproute.


Re: [PATCH iproute2/net-next v2] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-10-01 Thread Eelco Chaudron



On 10 Aug 2018, at 16:48, Eelco Chaudron wrote:


On 10 Aug 2018, at 16:44, Stephen Hemminger wrote:


On Fri, 10 Aug 2018 07:59:30 -0400
Eelco Chaudron  wrote:


+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "%s", _SL_);
+   print_string(PRINT_FP, NULL, "%s", prefix);
+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+   }
+
+   print_string(PRINT_FP, NULL, "%s", _SL_);
+   print_string(PRINT_FP, NULL, "%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);


What does the output look like?


See the two +’es below:

$ tc -s filter show dev enp3s0np0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 1 ref 1 bind 1 installed 0 sec used 0 sec
Action statistics:
Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 
requeues 0)

+   Sent software 187542 bytes 4077 pkt
+   Sent hardware 534697200 bytes 8911620 pkt
backlog 0b 0p requeues 0
cookie 89173e6a7001becfd486bda17e29


Hi Stephen, anything else required for this patch to be accepted?

FYI the kernel side of this patch has been excepted on net-next.

Cheers,

Eelco


Re: [PATCH net-next] net/core: make function ___gnet_stats_copy_basic() static

2018-09-26 Thread Eelco Chaudron



On 26 Sep 2018, at 14:09, Wei Yongjun wrote:

> Fixes the following sparse warning:
>
> net/core/gen_stats.c:166:1: warning:
>  symbol '___gnet_stats_copy_basic' was not declared. Should it be static?
>

Thanks for fixing my commit ;)

Acked-by: Eelco Chaudron 

> Fixes: 5e111210a443 ("net/core: Add new basic hardware counter")
> Signed-off-by: Wei Yongjun 
> ---
>  net/core/gen_stats.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
> index 65a2e82..9bf1b9a 100644
> --- a/net/core/gen_stats.c
> +++ b/net/core/gen_stats.c
> @@ -162,7 +162,7 @@
>  }
>  EXPORT_SYMBOL(__gnet_stats_copy_basic);
>
> -int
> +static int
>  ___gnet_stats_copy_basic(const seqcount_t *running,
>struct gnet_dump *d,
>struct gnet_stats_basic_cpu __percpu *cpu,


Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-09-21 Thread Eelco Chaudron



On 20 Sep 2018, at 16:14, Jakub Kicinski wrote:

> On Thu, 20 Sep 2018 09:14:08 +0200, Eelco Chaudron wrote:
>> Is there anything else blocking from getting this into net-next?
>>
>> I still think this patch is beneficial for the full user experience, and
>> I’ve got requests from QA and others for this.
>
> Not from my perspective, the numbers look okay, feel free to repost!

Thanks, I sent out a v2 re-based on the latest net-next.


[PATCH net-next V2 2/2] net/sched: Add hardware specific counters to TC actions

2018-09-21 Thread Eelco Chaudron
Add additional counters that will store the bytes/packets processed by
hardware. These will be exported through the netlink interface for
displaying by the iproute2 tc tool

Signed-off-by: Eelco Chaudron 
---
 include/net/act_api.h  |8 +---
 include/net/pkt_cls.h  |2 +-
 net/sched/act_api.c|   14 +++---
 net/sched/act_gact.c   |6 +-
 net/sched/act_mirred.c |5 -
 5 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index c6f195b..1ddff33 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -31,10 +31,12 @@ struct tc_action {
int tcfa_action;
struct tcf_ttcfa_tm;
struct gnet_stats_basic_packed  tcfa_bstats;
+   struct gnet_stats_basic_packed  tcfa_bstats_hw;
struct gnet_stats_queue tcfa_qstats;
struct net_rate_estimator __rcu *tcfa_rate_est;
spinlock_t  tcfa_lock;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats_hw;
struct gnet_stats_queue __percpu *cpu_qstats;
struct tc_cookie__rcu *act_cookie;
struct tcf_chain*goto_chain;
@@ -94,7 +96,7 @@ struct tc_action_ops {
struct netlink_callback *, int,
const struct tc_action_ops *,
struct netlink_ext_ack *);
-   void(*stats_update)(struct tc_action *, u64, u32, u64);
+   void(*stats_update)(struct tc_action *, u64, u32, u64, bool);
size_t  (*get_fill_size)(const struct tc_action *act);
struct net_device *(*get_dev)(const struct tc_action *a);
void(*put_dev)(struct net_device *dev);
@@ -182,13 +184,13 @@ int tcf_action_dump(struct sk_buff *skb, struct tc_action 
*actions[], int bind,
 #endif /* CONFIG_NET_CLS_ACT */
 
 static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
-  u64 packets, u64 lastuse)
+  u64 packets, u64 lastuse, bool hw)
 {
 #ifdef CONFIG_NET_CLS_ACT
if (!a->ops->stats_update)
return;
 
-   a->ops->stats_update(a, bytes, packets, lastuse);
+   a->ops->stats_update(a, bytes, packets, lastuse, hw);
 #endif
 }
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 75a3f3f..bbfe27f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -318,7 +318,7 @@ static inline void tcf_exts_put_net(struct tcf_exts *exts)
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];
 
-   tcf_action_stats_update(a, bytes, packets, lastuse);
+   tcf_action_stats_update(a, bytes, packets, lastuse, true);
}
 
preempt_enable();
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 6f118d6..1acd3a8 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -81,6 +81,7 @@ static void tcf_set_action_cookie(struct tc_cookie __rcu 
**old_cookie,
 static void free_tcf(struct tc_action *p)
 {
free_percpu(p->cpu_bstats);
+   free_percpu(p->cpu_bstats_hw);
free_percpu(p->cpu_qstats);
 
tcf_set_action_cookie(>act_cookie, NULL);
@@ -372,9 +373,12 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
p->cpu_bstats = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
if (!p->cpu_bstats)
goto err1;
+   p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
+   if (!p->cpu_bstats_hw)
+   goto err2;
p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
if (!p->cpu_qstats)
-   goto err2;
+   goto err3;
}
spin_lock_init(>tcfa_lock);
p->tcfa_index = index;
@@ -386,15 +390,17 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
>tcfa_rate_est,
>tcfa_lock, NULL, est);
if (err)
-   goto err3;
+   goto err4;
}
 
p->idrinfo = idrinfo;
p->ops = ops;
*a = p;
return 0;
-err3:
+err4:
free_percpu(p->cpu_qstats);
+err3:
+   free_percpu(p->cpu_bstats_hw);
 err2:
free_percpu(p->cpu_bstats);
 err1:
@@ -979,6 +985,8 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct 
tc_action *p,
goto errout;
 
if (gnet_stats_copy_basic(NULL, , p->cpu_bstats, >tcfa_bstats) < 0 
||
+   gnet_stats_copy_basic_hw(NULL, , p->cpu_bstats_hw,
+>tcfa_bstats_hw) < 0 

[PATCH net-next V2 1/2] net/core: Add new basic hardware counter

2018-09-21 Thread Eelco Chaudron
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can
be used to count packets/bytes processed by hardware offload.


Signed-off-by: Eelco Chaudron 
---
 include/net/gen_stats.h|4 ++
 include/uapi/linux/gen_stats.h |1 +
 net/core/gen_stats.c   |   73 ++--
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 883bb90..946bd53 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -44,6 +44,10 @@ void __gnet_stats_copy_basic(const seqcount_t *running,
 struct gnet_stats_basic_packed *bstats,
 struct gnet_stats_basic_cpu __percpu *cpu,
 struct gnet_stats_basic_packed *b);
+int gnet_stats_copy_basic_hw(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
index 24a861c..065408e 100644
--- a/include/uapi/linux/gen_stats.h
+++ b/include/uapi/linux/gen_stats.h
@@ -12,6 +12,7 @@ enum {
TCA_STATS_APP,
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
+   TCA_STATS_BASIC_HW,
__TCA_STATS_MAX,
 };
 #define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 188d693..65a2e82 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -162,30 +162,18 @@
 }
 EXPORT_SYMBOL(__gnet_stats_copy_basic);
 
-/**
- * gnet_stats_copy_basic - copy basic statistics into statistic TLV
- * @running: seqcount_t pointer
- * @d: dumping handle
- * @cpu: copy statistic per cpu
- * @b: basic statistics
- *
- * Appends the basic statistics to the top level TLV created by
- * gnet_stats_start_copy().
- *
- * Returns 0 on success or -1 with the statistic lock released
- * if the room in the socket buffer was not sufficient.
- */
 int
-gnet_stats_copy_basic(const seqcount_t *running,
- struct gnet_dump *d,
- struct gnet_stats_basic_cpu __percpu *cpu,
- struct gnet_stats_basic_packed *b)
+___gnet_stats_copy_basic(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b,
+int type)
 {
struct gnet_stats_basic_packed bstats = {0};
 
__gnet_stats_copy_basic(running, , cpu, b);
 
-   if (d->compat_tc_stats) {
+   if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
d->tc_stats.bytes = bstats.bytes;
d->tc_stats.packets = bstats.packets;
}
@@ -196,14 +184,61 @@
memset(, 0, sizeof(sb));
sb.bytes = bstats.bytes;
sb.packets = bstats.packets;
-   return gnet_stats_copy(d, TCA_STATS_BASIC, , sizeof(sb),
+   return gnet_stats_copy(d, type, , sizeof(sb),
   TCA_STATS_PAD);
}
return 0;
 }
+
+/**
+ * gnet_stats_copy_basic - copy basic statistics into statistic TLV
+ * @running: seqcount_t pointer
+ * @d: dumping handle
+ * @cpu: copy statistic per cpu
+ * @b: basic statistics
+ *
+ * Appends the basic statistics to the top level TLV created by
+ * gnet_stats_start_copy().
+ *
+ * Returns 0 on success or -1 with the statistic lock released
+ * if the room in the socket buffer was not sufficient.
+ */
+int
+gnet_stats_copy_basic(const seqcount_t *running,
+ struct gnet_dump *d,
+ struct gnet_stats_basic_cpu __percpu *cpu,
+ struct gnet_stats_basic_packed *b)
+{
+   return ___gnet_stats_copy_basic(running, d, cpu, b,
+   TCA_STATS_BASIC);
+}
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
 /**
+ * gnet_stats_copy_basic_hw - copy basic hw statistics into statistic TLV
+ * @running: seqcount_t pointer
+ * @d: dumping handle
+ * @cpu: copy statistic per cpu
+ * @b: basic statistics
+ *
+ * Appends the basic statistics to the top level TLV created by
+ * gnet_stats_start_copy().
+ *
+ * Returns 0 on success or -1 with the statistic lock released
+ * if the room in the socket buffer was not sufficient.
+ */
+int
+gnet_stats_copy_basic_hw(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b)
+{
+   return ___gnet_stats_copy_basic(running, d, cpu, b,
+ 

[PATCH net-next V2 0/2] net/sched: Add hardware specific counters to TC actions

2018-09-21 Thread Eelco Chaudron
Add hardware specific counters to TC actions which will be exported
through the netlink API. This makes troubleshooting TC flower offload
easier, as it possible to differentiate the packets being offloaded.

Signed-off-by: Eelco Chaudron 

v2 - Rebased on latest net-next

Eelco Chaudron (2):
  net/core: Add new basic hardware counter
  net/sched: Add hardware specific counters to TC actions


 include/net/act_api.h  |8 +++-
 include/net/gen_stats.h|4 ++
 include/net/pkt_cls.h  |2 +
 include/uapi/linux/gen_stats.h |1 +
 net/core/gen_stats.c   |   73 ++--
 net/sched/act_api.c|   14 ++--
 net/sched/act_gact.c   |6 +++
 net/sched/act_mirred.c |5 ++-
 8 files changed, 85 insertions(+), 28 deletions(-)


Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-09-20 Thread Eelco Chaudron




On 29 Aug 2018, at 20:12, Jakub Kicinski wrote:


On Wed, 29 Aug 2018 11:43:47 +0200, Eelco Chaudron wrote:

On 23 Aug 2018, at 20:14, Jakub Kicinski wrote:


On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote:

On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:

On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:

On 11 Aug 2018, at 21:06, David Miller wrote:


From: Jakub Kicinski 
Date: Thu, 9 Aug 2018 20:26:08 -0700


It is not immediately clear why this is needed.  The memory and
updating two sets of counters won't come for free, so perhaps a
stronger justification than troubleshooting is due? :S

Netdev has counters for fallback vs forwarded traffic, so you'd
know
that traffic hits the SW datapath, plus the rules which are 
in_hw

will
most likely not match as of today for flower (assuming
correctness).


I strongly believe that these counters are a requirement for a
mixed
software/hardware (flow) based forwarding environment. The global
counters will not help much here as you might have chosen to have
certain traffic forwarded by software.

These counters are probably the only option you have to figure 
out

why
forwarding is not as fast as expected, and you want to blame the 
TC

offload NIC.


The suggested debugging flow would be:
 (1) check the global counter for fallback are incrementing;
 (2) find a flow with high stats but no in_hw flag set.

The in_hw indication should be sufficient in most cases (unless
there
are shared blocks between netdevs of different ASICs...).


I guess the aim is to find miss behaving hardware, i.e. having the
in_hw
flag set, but flows still coming to the kernel.


For misbehaving hardware in_hw will not work indeed.  Whether we 
need

these extra always-on stats for such use case could be debated :)

I'm slightly concerned about potential performance impact, 
would

you
be able to share some numbers for non-trivial number of flows
(100k
active?)?


Agreed, features used for diagnostics cannot have a harmful
penalty
for fast path performance.


Fast path performance is not affected as these counters are not
incremented there. They are only incremented by the nic driver 
when

they
gather their statistics from hardware.


Not by much, you are adding state to performance-critical
structures,
though, for what is effectively debugging purposes.

I was mostly talking about the HW offload stat updates (sorry for
not
being clear).

We can have some hundreds of thousands active offloaded flows, 
each

of
them can have multiple actions, and stats have to be updated
multiple
times per second and dumped probably around once a second, too.  
On

a
busy system the stats will get evicted from cache between each
round.

But I'm speculating let's see if I can get some numbers on it (if
you
could get some too, that would be great!).


I’ll try to measure some of this later this week/early next week.


I asked Louis to run some tests while I'm travelling, and he reports
that my worry about reporting the extra stats was unfounded.  Update
function does not show up in traces at all.  It seems under stress
(generated with stress-ng) the thread dumping the stats in userspace
(in OvS it would be the revalidator) actually consumes less CPU in
__gnet_stats_copy_basic (0.4% less for ~2.0% total).

Would this match with your results?  I'm not sure why dumping would 
be

faster with your change..


Tested with OVS and https://github.com/chaudron/ovs_perf using 300K 
TC

rules installed in HW.

For __gnet_stats_copy_basic() being faster I have (had) a theory. Now
this function is called twice, and I assumed the first call would 
cache

memory and the second call would be faster.

Sampling a lot of perf data, I get an average of 1115ns with the base
kernel and 954ns with the fix applied, so about ~14%.

Thought I would perf tcf_action_copy_stats() as it is the place 
updating

the additional counter. But even in this case, I see a better
performance with the patch applied.

In average 13581ns with the fix, vs base kernel at 1391ns, so about
2.3%.

I guess the changes to the tc_action structure got better cache
alignment.


Interesting you could reproduce the speed up too!  +1 for the guess.
Seems like my caution about slowing down SW paths to support HW 
offload

landed on a very unfortunate patch :)


Is there anything else blocking from getting this into net-next?

I still think this patch is beneficial for the full user experience, and 
I’ve got requests from QA and others for this.




Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-29 Thread Eelco Chaudron




On 23 Aug 2018, at 20:14, Jakub Kicinski wrote:


On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote:

On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:

On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:

On 11 Aug 2018, at 21:06, David Miller wrote:


From: Jakub Kicinski 
Date: Thu, 9 Aug 2018 20:26:08 -0700


It is not immediately clear why this is needed.  The memory and
updating two sets of counters won't come for free, so perhaps a
stronger justification than troubleshooting is due? :S

Netdev has counters for fallback vs forwarded traffic, so you'd
know
that traffic hits the SW datapath, plus the rules which are in_hw
will
most likely not match as of today for flower (assuming
correctness).


I strongly believe that these counters are a requirement for a 
mixed

software/hardware (flow) based forwarding environment. The global
counters will not help much here as you might have chosen to have
certain traffic forwarded by software.

These counters are probably the only option you have to figure out
why
forwarding is not as fast as expected, and you want to blame the TC
offload NIC.


The suggested debugging flow would be:
 (1) check the global counter for fallback are incrementing;
 (2) find a flow with high stats but no in_hw flag set.

The in_hw indication should be sufficient in most cases (unless 
there

are shared blocks between netdevs of different ASICs...).


I guess the aim is to find miss behaving hardware, i.e. having the 
in_hw

flag set, but flows still coming to the kernel.


For misbehaving hardware in_hw will not work indeed.  Whether we need
these extra always-on stats for such use case could be debated :)


I'm slightly concerned about potential performance impact, would
you
be able to share some numbers for non-trivial number of flows 
(100k

active?)?


Agreed, features used for diagnostics cannot have a harmful 
penalty

for fast path performance.


Fast path performance is not affected as these counters are not
incremented there. They are only incremented by the nic driver when
they
gather their statistics from hardware.


Not by much, you are adding state to performance-critical 
structures,

though, for what is effectively debugging purposes.

I was mostly talking about the HW offload stat updates (sorry for 
not

being clear).

We can have some hundreds of thousands active offloaded flows, each 
of
them can have multiple actions, and stats have to be updated 
multiple
times per second and dumped probably around once a second, too.  On 
a
busy system the stats will get evicted from cache between each 
round.


But I'm speculating let's see if I can get some numbers on it (if 
you

could get some too, that would be great!).


I’ll try to measure some of this later this week/early next week.


I asked Louis to run some tests while I'm travelling, and he reports
that my worry about reporting the extra stats was unfounded.  Update
function does not show up in traces at all.  It seems under stress
(generated with stress-ng) the thread dumping the stats in userspace
(in OvS it would be the revalidator) actually consumes less CPU in
__gnet_stats_copy_basic (0.4% less for ~2.0% total).

Would this match with your results?  I'm not sure why dumping would be
faster with your change..


Tested with OVS and https://github.com/chaudron/ovs_perf using 300K TC 
rules installed in HW.


For __gnet_stats_copy_basic() being faster I have (had) a theory. Now 
this function is called twice, and I assumed the first call would cache 
memory and the second call would be faster.


Sampling a lot of perf data, I get an average of 1115ns with the base 
kernel and 954ns with the fix applied, so about ~14%.


Thought I would perf tcf_action_copy_stats() as it is the place updating 
the additional counter. But even in this case, I see a better 
performance with the patch applied.


In average 13581ns with the fix, vs base kernel at 1391ns, so about 
2.3%.


I guess the changes to the tc_action structure got better cache 
alignment.




However, the flow creation is effected, as this is where the extra
memory gets allocated. I had done some 40K flow tests before and 
did

not
see any noticeable change in flow insertion performance. As 
requested

by
Jakub I did it again for 100K (and threw a Netronome blade in the 
mix

;). I used Marcelo’s test tool,
https://github.com/marceloleitner/perf-flower.git.

Here are the numbers (time in seconds) for 10 runs in sorted order:

+-++
| Base_kernel | Change_applied |
+-++
|5.684019 |   5.656388 |
|5.699658 |   5.674974 |
|5.725220 |   5.722107 |
|5.739285 |   5.839855 |
|5.748088 |   5.865238 |
|5.766231 |   5.873913 |
|5.842264 |   5.909259 |
|5.902202 |   5.912685 |
|5.905391 |   5.947138 |
|6.032997 |   5.997779 |
+-++

I guess the deviation is in the userspace part, which is where

Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-20 Thread Eelco Chaudron




On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:


On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:

On 11 Aug 2018, at 21:06, David Miller wrote:


From: Jakub Kicinski 
Date: Thu, 9 Aug 2018 20:26:08 -0700


It is not immediately clear why this is needed.  The memory and
updating two sets of counters won't come for free, so perhaps a
stronger justification than troubleshooting is due? :S

Netdev has counters for fallback vs forwarded traffic, so you'd 
know

that traffic hits the SW datapath, plus the rules which are in_hw
will
most likely not match as of today for flower (assuming 
correctness).


I strongly believe that these counters are a requirement for a mixed
software/hardware (flow) based forwarding environment. The global
counters will not help much here as you might have chosen to have
certain traffic forwarded by software.

These counters are probably the only option you have to figure out 
why

forwarding is not as fast as expected, and you want to blame the TC
offload NIC.


The suggested debugging flow would be:
 (1) check the global counter for fallback are incrementing;
 (2) find a flow with high stats but no in_hw flag set.

The in_hw indication should be sufficient in most cases (unless there
are shared blocks between netdevs of different ASICs...).


I guess the aim is to find miss behaving hardware, i.e. having the in_hw 
flag set, but flows still coming to the kernel.


I'm slightly concerned about potential performance impact, would 
you

be able to share some numbers for non-trivial number of flows (100k
active?)?


Agreed, features used for diagnostics cannot have a harmful penalty
for fast path performance.


Fast path performance is not affected as these counters are not
incremented there. They are only incremented by the nic driver when 
they

gather their statistics from hardware.


Not by much, you are adding state to performance-critical structures,
though, for what is effectively debugging purposes.

I was mostly talking about the HW offload stat updates (sorry for not
being clear).

We can have some hundreds of thousands active offloaded flows, each of
them can have multiple actions, and stats have to be updated multiple
times per second and dumped probably around once a second, too.  On a
busy system the stats will get evicted from cache between each round.

But I'm speculating let's see if I can get some numbers on it (if you
could get some too, that would be great!).


I’ll try to measure some of this later this week/early next week.


However, the flow creation is effected, as this is where the extra
memory gets allocated. I had done some 40K flow tests before and did 
not
see any noticeable change in flow insertion performance. As requested 
by

Jakub I did it again for 100K (and threw a Netronome blade in the mix
;). I used Marcelo’s test tool,
https://github.com/marceloleitner/perf-flower.git.

Here are the numbers (time in seconds) for 10 runs in sorted order:

+-++
| Base_kernel | Change_applied |
+-++
|5.684019 |   5.656388 |
|5.699658 |   5.674974 |
|5.725220 |   5.722107 |
|5.739285 |   5.839855 |
|5.748088 |   5.865238 |
|5.766231 |   5.873913 |
|5.842264 |   5.909259 |
|5.902202 |   5.912685 |
|5.905391 |   5.947138 |
|6.032997 |   5.997779 |
+-++

I guess the deviation is in the userspace part, which is where in 
real

life flows get added anyway.

Let me know if more is unclear.


Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-16 Thread Eelco Chaudron

On 11 Aug 2018, at 21:06, David Miller wrote:


From: Jakub Kicinski 
Date: Thu, 9 Aug 2018 20:26:08 -0700


It is not immediately clear why this is needed.  The memory and
updating two sets of counters won't come for free, so perhaps a
stronger justification than troubleshooting is due? :S

Netdev has counters for fallback vs forwarded traffic, so you'd know
that traffic hits the SW datapath, plus the rules which are in_hw 
will

most likely not match as of today for flower (assuming correctness).


I strongly believe that these counters are a requirement for a mixed 
software/hardware (flow) based forwarding environment. The global 
counters will not help much here as you might have chosen to have 
certain traffic forwarded by software.


These counters are probably the only option you have to figure out why 
forwarding is not as fast as expected, and you want to blame the TC 
offload NIC.



I'm slightly concerned about potential performance impact, would you
be able to share some numbers for non-trivial number of flows (100k
active?)?


Agreed, features used for diagnostics cannot have a harmful penalty 
for

fast path performance.


Fast path performance is not affected as these counters are not 
incremented there. They are only incremented by the nic driver when they 
gather their statistics from hardware.


However, the flow creation is effected, as this is where the extra 
memory gets allocated. I had done some 40K flow tests before and did not 
see any noticeable change in flow insertion performance. As requested by 
Jakub I did it again for 100K (and threw a Netronome blade in the mix 
;). I used Marcelo’s test tool, 
https://github.com/marceloleitner/perf-flower.git.


Here are the numbers (time in seconds) for 10 runs in sorted order:

+-++
| Base_kernel | Change_applied |
+-++
|5.684019 |   5.656388 |
|5.699658 |   5.674974 |
|5.725220 |   5.722107 |
|5.739285 |   5.839855 |
|5.748088 |   5.865238 |
|5.766231 |   5.873913 |
|5.842264 |   5.909259 |
|5.902202 |   5.912685 |
|5.905391 |   5.947138 |
|6.032997 |   5.997779 |
+-++

I guess the deviation is in the userspace part, which is where in real 
life flows get added anyway.



Let me know if more is unclear.


//Eelco


Re: [PATCH iproute2/net-next v2] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-10 Thread Eelco Chaudron



On 10 Aug 2018, at 16:44, Stephen Hemminger wrote:

> On Fri, 10 Aug 2018 07:59:30 -0400
> Eelco Chaudron  wrote:
>
>> +if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
>> +print_string(PRINT_FP, NULL, "%s", _SL_);
>> +print_string(PRINT_FP, NULL, "%s", prefix);
>> +print_lluint(PRINT_ANY, "sw_bytes",
>> + "Sent software %llu bytes",
>> + bs.bytes - bs_hw.bytes);
>> +print_uint(PRINT_ANY, "sw_packets", " %u pkt",
>> +   bs.packets - bs_hw.packets);
>> +}
>> +}
>> +
>> +print_string(PRINT_FP, NULL, "%s", _SL_);
>> +print_string(PRINT_FP, NULL, "%s", prefix);
>> +print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
>> + bs_hw.bytes);
>
> What does the output look like?

See the two +’es below:

$ tc -s filter show dev enp3s0np0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 1 ref 1 bind 1 installed 0 sec used 0 sec
Action statistics:
Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
+   Sent software 187542 bytes 4077 pkt
+   Sent hardware 534697200 bytes 8911620 pkt
backlog 0b 0p requeues 0
cookie 89173e6a7001becfd486bda17e29


[PATCH iproute2/net-next v2] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-10 Thread Eelco Chaudron
Add support for showing hardware specific counters to easy
troubleshooting hardware offload.

$ tc -s filter show dev enp3s0np0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 1 ref 1 bind 1 installed 0 sec used 0 sec
Action statistics:
Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 187542 bytes 4077 pkt
Sent hardware 534697200 bytes 8911620 pkt
backlog 0b 0p requeues 0
cookie 89173e6a7001becfd486bda17e29


Signed-off-by: Eelco Chaudron 
---
v2:
 * Removed unnecessary initialization
 * Made not displaying of missing TCA_STATS_BASIC_HW more obvious
 * Use _SL_ macro for single line output

 include/uapi/linux/gen_stats.h |1 +
 tc/tc_util.c   |   41 
 2 files changed, 42 insertions(+)

diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
index 24a861c..065408e 100644
--- a/include/uapi/linux/gen_stats.h
+++ b/include/uapi/linux/gen_stats.h
@@ -12,6 +12,7 @@ enum {
TCA_STATS_APP,
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
+   TCA_STATS_BASIC_HW,
__TCA_STATS_MAX,
 };
 #define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
diff --git a/tc/tc_util.c b/tc/tc_util.c
index d757852..5a1bbf2 100644
--- a/tc/tc_util.c
+++ b/tc/tc_util.c
@@ -800,6 +800,44 @@ void print_tm(FILE *f, const struct tcf_t *tm)
}
 }
 
+static void print_tcstats_basic_hw(struct rtattr **tbs, char *prefix)
+{
+   struct gnet_stats_basic bs_hw;
+
+   if (!tbs[TCA_STATS_BASIC_HW])
+   return;
+
+   memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
+
+   if (bs_hw.bytes == 0 && bs_hw.packets == 0)
+   return;
+
+   if (tbs[TCA_STATS_BASIC]) {
+   struct gnet_stats_basic bs;
+
+   memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
+  sizeof(bs)));
+
+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "%s", _SL_);
+   print_string(PRINT_FP, NULL, "%s", prefix);
+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+   }
+
+   print_string(PRINT_FP, NULL, "%s", _SL_);
+   print_string(PRINT_FP, NULL, "%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);
+   print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
+}
+
 void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char *prefix, struct 
rtattr **xstats)
 {
SPRINT_BUF(b1);
@@ -826,6 +864,9 @@ void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char 
*prefix, struct rtat
print_uint(PRINT_ANY, "requeues", " requeues %u) ", q.requeues);
}
 
+   if (tbs[TCA_STATS_BASIC_HW])
+   print_tcstats_basic_hw(tbs, prefix);
+
if (tbs[TCA_STATS_RATE_EST64]) {
struct gnet_stats_rate_est64 re = {0};
 



Re: [PATCH iproute2/net-next] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-09 Thread Eelco Chaudron

Thanks for the quick reply, see inline responses.

On 9 Aug 2018, at 18:07, Stephen Hemminger wrote:


On Thu,  9 Aug 2018 11:16:02 -0400
Eelco Chaudron  wrote:



+static void print_tcstats_basic_hw(struct rtattr **tbs, char 
*prefix)

+{
+   struct gnet_stats_basic bs = {0};


If not present don't print it rather than printing zero.



This is used to print separate SW counters below, which is not displayed 
if 0, i.e. not present.
However I will move it under the “if (tbs[TCA_STATS_BASIC])” 
statement, so it’s more explicit.



+   struct gnet_stats_basic bs_hw = {0};


This initialization is unnecessary since you always overwrite it.



Thanks will remove it in the v2


+
+   if (!tbs[TCA_STATS_BASIC_HW])
+   return;
+
+   memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
+
+   if (bs_hw.bytes == 0 && bs_hw.packets == 0)
+   return;
+
+   if (tbs[TCA_STATS_BASIC]) {
+   memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
+  sizeof(bs)));
+   }
+
+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "\n%s", prefix);


Please use the magic string _SL_ to allow supporting single line 
output mode.




Will do in the V2.


+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+
+   print_string(PRINT_FP, NULL, "\n%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);
+   print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
+}


[PATCH iproute2/net-next] tc_util: Add support for showing TCA_STATS_BASIC_HW statistics

2018-08-09 Thread Eelco Chaudron
Add support for showing hardware specific counters to easily
troubleshoot hardware offload.

$ tc -s filter show dev enp3s0np0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 2.0.0.0
  src_ip 1.0.0.0
  ip_flags nofrag
  in_hw
action order 1: mirred (Egress Redirect to device eth1) stolen
index 1 ref 1 bind 1 installed 0 sec used 0 sec
Action statistics:
Sent 534884742 bytes 8915697 pkt (dropped 0, overlimits 0 requeues 0)
Sent software 187542 bytes 4077 pkt
Sent hardware 534697200 bytes 8911620 pkt
backlog 0b 0p requeues 0
cookie 89173e6a7001becfd486bda17e29


Signed-off-by: Eelco Chaudron 
---
 include/uapi/linux/gen_stats.h |1 +
 tc/tc_util.c   |   38 ++
 2 files changed, 39 insertions(+)

diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
index 24a861c..065408e 100644
--- a/include/uapi/linux/gen_stats.h
+++ b/include/uapi/linux/gen_stats.h
@@ -12,6 +12,7 @@ enum {
TCA_STATS_APP,
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
+   TCA_STATS_BASIC_HW,
__TCA_STATS_MAX,
 };
 #define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
diff --git a/tc/tc_util.c b/tc/tc_util.c
index d757852..43a2013 100644
--- a/tc/tc_util.c
+++ b/tc/tc_util.c
@@ -800,6 +800,41 @@ void print_tm(FILE *f, const struct tcf_t *tm)
}
 }
 
+static void print_tcstats_basic_hw(struct rtattr **tbs, char *prefix)
+{
+   struct gnet_stats_basic bs = {0};
+   struct gnet_stats_basic bs_hw = {0};
+
+   if (!tbs[TCA_STATS_BASIC_HW])
+   return;
+
+   memcpy(_hw, RTA_DATA(tbs[TCA_STATS_BASIC_HW]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC_HW]), sizeof(bs_hw)));
+
+   if (bs_hw.bytes == 0 && bs_hw.packets == 0)
+   return;
+
+   if (tbs[TCA_STATS_BASIC]) {
+   memcpy(, RTA_DATA(tbs[TCA_STATS_BASIC]),
+  MIN(RTA_PAYLOAD(tbs[TCA_STATS_BASIC]),
+  sizeof(bs)));
+   }
+
+   if (bs.bytes >= bs_hw.bytes && bs.packets >= bs_hw.packets) {
+   print_string(PRINT_FP, NULL, "\n%s", prefix);
+   print_lluint(PRINT_ANY, "sw_bytes",
+"Sent software %llu bytes",
+bs.bytes - bs_hw.bytes);
+   print_uint(PRINT_ANY, "sw_packets", " %u pkt",
+  bs.packets - bs_hw.packets);
+   }
+
+   print_string(PRINT_FP, NULL, "\n%s", prefix);
+   print_lluint(PRINT_ANY, "hw_bytes", "Sent hardware %llu bytes",
+bs_hw.bytes);
+   print_uint(PRINT_ANY, "hw_packets", " %u pkt", bs_hw.packets);
+}
+
 void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char *prefix, struct 
rtattr **xstats)
 {
SPRINT_BUF(b1);
@@ -826,6 +861,9 @@ void print_tcstats2_attr(FILE *fp, struct rtattr *rta, char 
*prefix, struct rtat
print_uint(PRINT_ANY, "requeues", " requeues %u) ", q.requeues);
}
 
+   if (tbs[TCA_STATS_BASIC_HW])
+   print_tcstats_basic_hw(tbs, prefix);
+
if (tbs[TCA_STATS_RATE_EST64]) {
struct gnet_stats_rate_est64 re = {0};
 



[PATCH 2/2] net/sched: Add hardware specific counters to TC actions

2018-08-09 Thread Eelco Chaudron
Add additional counters that will store the bytes/packets processed by
hardware. These will be exported through the netlink interface for
displaying by the iproute2 tc tool

Signed-off-by: Eelco Chaudron 
---
 include/net/act_api.h  |8 +---
 include/net/pkt_cls.h  |2 +-
 net/sched/act_api.c|   14 +++---
 net/sched/act_gact.c   |6 +-
 net/sched/act_mirred.c |5 -
 5 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c9bc02..9931d8a 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -33,10 +33,12 @@ struct tc_action {
int tcfa_action;
struct tcf_ttcfa_tm;
struct gnet_stats_basic_packed  tcfa_bstats;
+   struct gnet_stats_basic_packed  tcfa_bstats_hw;
struct gnet_stats_queue tcfa_qstats;
struct net_rate_estimator __rcu *tcfa_rate_est;
spinlock_t  tcfa_lock;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats_hw;
struct gnet_stats_queue __percpu *cpu_qstats;
struct tc_cookie__rcu *act_cookie;
struct tcf_chain*goto_chain;
@@ -98,7 +100,7 @@ struct tc_action_ops {
struct netlink_callback *, int,
const struct tc_action_ops *,
struct netlink_ext_ack *);
-   void(*stats_update)(struct tc_action *, u64, u32, u64);
+   void(*stats_update)(struct tc_action *, u64, u32, u64, bool);
size_t  (*get_fill_size)(const struct tc_action *act);
struct net_device *(*get_dev)(const struct tc_action *a);
int (*delete)(struct net *net, u32 index);
@@ -189,13 +191,13 @@ int tcf_action_dump(struct sk_buff *skb, struct tc_action 
*actions[], int bind,
 #endif /* CONFIG_NET_CLS_ACT */
 
 static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
-  u64 packets, u64 lastuse)
+  u64 packets, u64 lastuse, bool hw)
 {
 #ifdef CONFIG_NET_CLS_ACT
if (!a->ops->stats_update)
return;
 
-   a->ops->stats_update(a, bytes, packets, lastuse);
+   a->ops->stats_update(a, bytes, packets, lastuse, hw);
 #endif
 }
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ef727f7..de1f06a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -324,7 +324,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
*exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];
 
-   tcf_action_stats_update(a, bytes, packets, lastuse);
+   tcf_action_stats_update(a, bytes, packets, lastuse, true);
}
 
preempt_enable();
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 229d63c..9ab3061 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -81,6 +81,7 @@ static void tcf_set_action_cookie(struct tc_cookie __rcu 
**old_cookie,
 static void free_tcf(struct tc_action *p)
 {
free_percpu(p->cpu_bstats);
+   free_percpu(p->cpu_bstats_hw);
free_percpu(p->cpu_qstats);
 
tcf_set_action_cookie(>act_cookie, NULL);
@@ -390,9 +391,12 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
p->cpu_bstats = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
if (!p->cpu_bstats)
goto err1;
+   p->cpu_bstats_hw = netdev_alloc_pcpu_stats(struct 
gnet_stats_basic_cpu);
+   if (!p->cpu_bstats_hw)
+   goto err2;
p->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
if (!p->cpu_qstats)
-   goto err2;
+   goto err3;
}
spin_lock_init(>tcfa_lock);
p->tcfa_index = index;
@@ -404,7 +408,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
>tcfa_rate_est,
>tcfa_lock, NULL, est);
if (err)
-   goto err3;
+   goto err4;
}
 
p->idrinfo = idrinfo;
@@ -412,8 +416,10 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, 
struct nlattr *est,
INIT_LIST_HEAD(>list);
*a = p;
return 0;
-err3:
+err4:
free_percpu(p->cpu_qstats);
+err3:
+   free_percpu(p->cpu_bstats_hw);
 err2:
free_percpu(p->cpu_bstats);
 err1:
@@ -988,6 +994,8 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct 
tc_action *p,
goto errout;
 
if (gnet_stats_copy_basic(NULL, , p->cpu_bsta

[PATCH 0/2] net/sched: Add hardware specific counters to TC actions

2018-08-09 Thread Eelco Chaudron
Add hardware specific counters to TC actions which will be exported
through the netlink API. This makes troubleshooting TC flower offload
easier, as it possible to differentiate the packets being offloaded.

Signed-off-by: Eelco Chaudron 

Eelco Chaudron (2):
  net/core: Add new basic hardware counter
  net/sched: Add hardware specific counters to TC actions


 include/net/act_api.h  |8 +++-
 include/net/gen_stats.h|4 ++
 include/net/pkt_cls.h  |2 +
 include/uapi/linux/gen_stats.h |1 +
 net/core/gen_stats.c   |   73 ++--
 net/sched/act_api.c|   14 ++--
 net/sched/act_gact.c   |6 +++
 net/sched/act_mirred.c |5 ++-
 8 files changed, 85 insertions(+), 28 deletions(-)



[PATCH 1/2] net/core: Add new basic hardware counter

2018-08-09 Thread Eelco Chaudron
Add a new hardware specific basic counter, TCA_STATS_BASIC_HW. This can
be used to count packets/bytes processed by hardware offload.


Signed-off-by: Eelco Chaudron 
---
 include/net/gen_stats.h|4 ++
 include/uapi/linux/gen_stats.h |1 +
 net/core/gen_stats.c   |   73 ++--
 3 files changed, 59 insertions(+), 19 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 0304ba2..7e54a9a 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -44,6 +44,10 @@ void __gnet_stats_copy_basic(const seqcount_t *running,
 struct gnet_stats_basic_packed *bstats,
 struct gnet_stats_basic_cpu __percpu *cpu,
 struct gnet_stats_basic_packed *b);
+int gnet_stats_copy_basic_hw(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b);
 int gnet_stats_copy_rate_est(struct gnet_dump *d,
 struct net_rate_estimator __rcu **ptr);
 int gnet_stats_copy_queue(struct gnet_dump *d,
diff --git a/include/uapi/linux/gen_stats.h b/include/uapi/linux/gen_stats.h
index 24a861c..065408e 100644
--- a/include/uapi/linux/gen_stats.h
+++ b/include/uapi/linux/gen_stats.h
@@ -12,6 +12,7 @@ enum {
TCA_STATS_APP,
TCA_STATS_RATE_EST64,
TCA_STATS_PAD,
+   TCA_STATS_BASIC_HW,
__TCA_STATS_MAX,
 };
 #define TCA_STATS_MAX (__TCA_STATS_MAX - 1)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 188d693..65a2e82 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -162,30 +162,18 @@
 }
 EXPORT_SYMBOL(__gnet_stats_copy_basic);
 
-/**
- * gnet_stats_copy_basic - copy basic statistics into statistic TLV
- * @running: seqcount_t pointer
- * @d: dumping handle
- * @cpu: copy statistic per cpu
- * @b: basic statistics
- *
- * Appends the basic statistics to the top level TLV created by
- * gnet_stats_start_copy().
- *
- * Returns 0 on success or -1 with the statistic lock released
- * if the room in the socket buffer was not sufficient.
- */
 int
-gnet_stats_copy_basic(const seqcount_t *running,
- struct gnet_dump *d,
- struct gnet_stats_basic_cpu __percpu *cpu,
- struct gnet_stats_basic_packed *b)
+___gnet_stats_copy_basic(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b,
+int type)
 {
struct gnet_stats_basic_packed bstats = {0};
 
__gnet_stats_copy_basic(running, , cpu, b);
 
-   if (d->compat_tc_stats) {
+   if (d->compat_tc_stats && type == TCA_STATS_BASIC) {
d->tc_stats.bytes = bstats.bytes;
d->tc_stats.packets = bstats.packets;
}
@@ -196,14 +184,61 @@
memset(, 0, sizeof(sb));
sb.bytes = bstats.bytes;
sb.packets = bstats.packets;
-   return gnet_stats_copy(d, TCA_STATS_BASIC, , sizeof(sb),
+   return gnet_stats_copy(d, type, , sizeof(sb),
   TCA_STATS_PAD);
}
return 0;
 }
+
+/**
+ * gnet_stats_copy_basic - copy basic statistics into statistic TLV
+ * @running: seqcount_t pointer
+ * @d: dumping handle
+ * @cpu: copy statistic per cpu
+ * @b: basic statistics
+ *
+ * Appends the basic statistics to the top level TLV created by
+ * gnet_stats_start_copy().
+ *
+ * Returns 0 on success or -1 with the statistic lock released
+ * if the room in the socket buffer was not sufficient.
+ */
+int
+gnet_stats_copy_basic(const seqcount_t *running,
+ struct gnet_dump *d,
+ struct gnet_stats_basic_cpu __percpu *cpu,
+ struct gnet_stats_basic_packed *b)
+{
+   return ___gnet_stats_copy_basic(running, d, cpu, b,
+   TCA_STATS_BASIC);
+}
 EXPORT_SYMBOL(gnet_stats_copy_basic);
 
 /**
+ * gnet_stats_copy_basic_hw - copy basic hw statistics into statistic TLV
+ * @running: seqcount_t pointer
+ * @d: dumping handle
+ * @cpu: copy statistic per cpu
+ * @b: basic statistics
+ *
+ * Appends the basic statistics to the top level TLV created by
+ * gnet_stats_start_copy().
+ *
+ * Returns 0 on success or -1 with the statistic lock released
+ * if the room in the socket buffer was not sufficient.
+ */
+int
+gnet_stats_copy_basic_hw(const seqcount_t *running,
+struct gnet_dump *d,
+struct gnet_stats_basic_cpu __percpu *cpu,
+struct gnet_stats_basic_packed *b)
+{
+   return ___gnet_stats_copy_basic(running, d, cpu, b,
+ 

Re: [patch net resend] net: sched: cbq: create block for q->link.block

2017-11-28 Thread Eelco Chaudron

On 27/11/17 18:37, Jiri Pirko wrote:

From: Jiri Pirko <j...@mellanox.com>

q->link.block is not initialized, that leads to EINVAL when one tries to
add filter there. So initialize it properly.

This can be reproduced by:
$ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 
1000Mbit
$ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 
0x00 flowid 1:1

Reported-by: Jaroslav Aster <jas...@redhat.com>
Reported-by: Ivan Vecera <ivec...@redhat.com>
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Jiri Pirko <j...@mellanox.com>
---
  net/sched/sch_cbq.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index 6361be7..525eb3a 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1158,9 +1158,13 @@ static int cbq_init(struct Qdisc *sch, struct nlattr 
*opt)
if ((q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB])) == NULL)
return -EINVAL;
  
+	err = tcf_block_get(>link.block, >link.filter_list, sch);

+   if (err)
+   goto put_rtab;
+
err = qdisc_class_hash_init(>clhash);
if (err < 0)
-   goto put_rtab;
+   goto put_block;
  
  	q->link.sibling = >link;

q->link.common.classid = sch->handle;
@@ -1194,6 +1198,9 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt)
cbq_addprio(q, >link);
return 0;
  
+put_block:

+   tcf_block_put(q->link.block);
+
  put_rtab:
qdisc_put_rtab(q->link.R_tab);
    return err;


I was about to submit a similar patch...


Acked-by: Eelco Chaudron <echau...@redhat.com>




Re: [PATCH net-next] arp: Ignore packets with an all zero sender mac address

2017-11-03 Thread Eelco Chaudron

On 27/10/17 15:48, David Miller wrote:

From: Eelco Chaudron <echau...@redhat.com>
Date: Thu, 26 Oct 2017 10:37:01 +0200


Some applications/devices seem to forget their MAC address when
performing some kind of a failover which triggers (something that
looks like) a gratuities arp.

The ARP packet looks something like this:

   Address Resolution Protocol (reply)
   Hardware type: Ethernet (1)
   Protocol type: IPv4 (0x0800)
   Hardware size: 6
   Protocol size: 4
   Opcode: reply (2)
   Sender MAC address: 00:00:00:00:00:00
   Sender IP address: 10.0.0.1
   Target MAC address: 00:00:00:00:00:00
   Target IP address: 255.255.255.255

This will result in existing arp entries being overwritten with an all
zero mac address. Until the arp entry times out this host can no
longer initiate a connection to this device.

Checking for and ignoring invalid mac addresses will solve this
problem.

Signed-off-by: Eelco Chaudron <echau...@redhat.com>

I really have trouble justifying this fully.

I looked at a bunch of ARP implementations, and I see no special
checks about the link level address other than to make sure it isn't
"our" address.

Whatever is generating these weird ARP packets should be fixed
instead.
Looking for any mentioning of an all-zero MAC address being invalid, the 
only reference I could find was in the original first Xerox Wire 
Specification. The IEEE specifications do not mention this at all, and 
according to it, the all-zero address is a valid MA-L address assigned 
to Xerox.


Looking at the packet more, it might be an attempt to do an unARP (RFC 
1868) but forgot to implement to set the Hardware Address Length to zero.


I'm sure adding an arptables entry can be used to solve this instead, in 
case the offending device cannot be fixed.


Please ignore this patch...

Cheers,

Eelco


[PATCH net-next] arp: Ignore packets with an all zero sender mac address

2017-10-26 Thread Eelco Chaudron
Some applications/devices seem to forget their MAC address when
performing some kind of a failover which triggers (something that
looks like) a gratuities arp.

The ARP packet looks something like this:

  Address Resolution Protocol (reply)
  Hardware type: Ethernet (1)
  Protocol type: IPv4 (0x0800)
  Hardware size: 6
  Protocol size: 4
  Opcode: reply (2)
  Sender MAC address: 00:00:00:00:00:00
  Sender IP address: 10.0.0.1
  Target MAC address: 00:00:00:00:00:00
  Target IP address: 255.255.255.255

This will result in existing arp entries being overwritten with an all
zero mac address. Until the arp entry times out this host can no
longer initiate a connection to this device.

Checking for and ignoring invalid mac addresses will solve this
problem.

Signed-off-by: Eelco Chaudron <echau...@redhat.com>
---
 net/ipv4/arp.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index a8d7c5a9fb05..e60c88b203e9 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -750,6 +750,16 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
case ARPHRD_IEEE1394:
break;
 #endif
+   case ARPHRD_ETHER:
+   case ARPHRD_FDDI:
+   case ARPHRD_IEEE802:
+   /*
+* Check for bad sender hardware addresses. An all zero MAC
+* address is not valid for Ethernet, FDDI or IEEE802.
+*/
+   if (is_zero_ether_addr(sha))
+   goto out_free_skb;
+   break;
default:
tha = arp_ptr;
arp_ptr += dev->addr_len;
-- 
2.13.6