Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-11-16 Thread John Fastabend
On 11/15/2017 09:51 AM, Willem de Bruijn wrote:
> On Wed, Nov 15, 2017 at 10:11 AM, John Fastabend
>  wrote:
>> On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
  /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
  static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
  {
 -   struct sk_buff *skb = sch->gso_skb;
 +   struct sk_buff *skb = skb_peek(&sch->gso_skb);

 if (skb) {
 -   sch->gso_skb = NULL;
 +   skb = __skb_dequeue(&sch->gso_skb);
 qdisc_qstats_backlog_dec(sch, skb);
 sch->q.qlen--;
>>>
>>> In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
>>> Same for its use in qdisc_peek_dequeued.
>>>
>>
>> Yes, agree if this was used in lockless qdisc it could race. However,
>> I don't think it is actually used in the lockless cases yet. For pfifo
>> fast __skb_array_peek is used.
> 
> Oh right. That will be easy to miss when other qdiscs are converted
> to lockless. Perhaps another location to add lockdep annotations.
> 

Yep. Will add lockdep here.

> Related: what happens when pfifo_fast is used as a leaf in a non-lockless
> qdisc hierarchy, say htb? The individual leaves will still have
> TCQ_F_NOLOCK, so will try to take the qdisc lock in dequeue_skb
> and other locations, but that is already held?
> 

Right. So I guess TCQ_F_NOLOCK needs to be propagated through the chain
of qdiscs and at attach we can only set TCQ_F_NOLOCK if the entire chain
of qdisc's are NOLOCK. Will spin an update with this as well.

> Thanks for revising and resubmitting the patchset, btw!
> 

no problem will be good to finally get this out of my queue.


Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-11-15 Thread Willem de Bruijn
On Wed, Nov 15, 2017 at 10:11 AM, John Fastabend
 wrote:
> On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>>>  /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>>>  static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>>>  {
>>> -   struct sk_buff *skb = sch->gso_skb;
>>> +   struct sk_buff *skb = skb_peek(&sch->gso_skb);
>>>
>>> if (skb) {
>>> -   sch->gso_skb = NULL;
>>> +   skb = __skb_dequeue(&sch->gso_skb);
>>> qdisc_qstats_backlog_dec(sch, skb);
>>> sch->q.qlen--;
>>
>> In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
>> Same for its use in qdisc_peek_dequeued.
>>
>
> Yes, agree if this was used in lockless qdisc it could race. However,
> I don't think it is actually used in the lockless cases yet. For pfifo
> fast __skb_array_peek is used.

Oh right. That will be easy to miss when other qdiscs are converted
to lockless. Perhaps another location to add lockdep annotations.

Related: what happens when pfifo_fast is used as a leaf in a non-lockless
qdisc hierarchy, say htb? The individual leaves will still have
TCQ_F_NOLOCK, so will try to take the qdisc lock in dequeue_skb
and other locations, but that is already held?

Thanks for revising and resubmitting the patchset, btw!


Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-11-15 Thread John Fastabend
On 11/14/2017 04:41 PM, Willem de Bruijn wrote:
>>  /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>>  static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>>  {
>> -   struct sk_buff *skb = sch->gso_skb;
>> +   struct sk_buff *skb = skb_peek(&sch->gso_skb);
>>
>> if (skb) {
>> -   sch->gso_skb = NULL;
>> +   skb = __skb_dequeue(&sch->gso_skb);
>> qdisc_qstats_backlog_dec(sch, skb);
>> sch->q.qlen--;
> 
> In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
> Same for its use in qdisc_peek_dequeued.
> 

Yes, agree if this was used in lockless qdisc it could race. However,
I don't think it is actually used in the lockless cases yet. For pfifo
fast __skb_array_peek is used.

>> -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>>  {
> 
> Perhaps dev_requeue_skb_qdisc_locked is more descriptive. Or
> adding a lockdep_is_held(..) also documents that the __locked variant
> below is not just a lock/unlock wrapper around this inner function.
> 

Adding lockdep annotation here makes sense to me.


Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-11-14 Thread Willem de Bruijn
>> -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>> +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>>  {
>
> Perhaps dev_requeue_skb_qdisc_locked is more descriptive. Or
> adding a lockdep_is_held(..) also documents that the __locked variant
> below is not just a lock/unlock wrapper around this inner function.
>
>> -   q->gso_skb = skb;
>> +   __skb_queue_head(&q->gso_skb, skb);
>> q->qstats.requeues++;
>> qdisc_qstats_backlog_inc(q, skb);
>> q->q.qlen++;/* it's still part of the queue */
>> @@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, 
>> struct Qdisc *q)
>> return 0;
>>  }
>>
>> +static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc 
>> *q)
>> +{
>> +   spinlock_t *lock = qdisc_lock(q);
>> +
>> +   spin_lock(lock);
>> +   __skb_queue_tail(&q->gso_skb, skb);
>
> why does this requeue at the tail, unlike __dev_requeue_skb?

I guess that requeue has to queue at the tail in the lockless case,
and it does not matter in the qdisc_locked case, as then there can
only ever be at most one outstanding gso_skb.


Re: [RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-11-14 Thread Willem de Bruijn
>  /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
>  static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
>  {
> -   struct sk_buff *skb = sch->gso_skb;
> +   struct sk_buff *skb = skb_peek(&sch->gso_skb);
>
> if (skb) {
> -   sch->gso_skb = NULL;
> +   skb = __skb_dequeue(&sch->gso_skb);
> qdisc_qstats_backlog_dec(sch, skb);
> sch->q.qlen--;

In lockless qdiscs, can this race, so that __skb_dequeue returns NULL?
Same for its use in qdisc_peek_dequeued.

> -static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> +static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
>  {

Perhaps dev_requeue_skb_qdisc_locked is more descriptive. Or
adding a lockdep_is_held(..) also documents that the __locked variant
below is not just a lock/unlock wrapper around this inner function.

> -   q->gso_skb = skb;
> +   __skb_queue_head(&q->gso_skb, skb);
> q->qstats.requeues++;
> qdisc_qstats_backlog_inc(q, skb);
> q->q.qlen++;/* it's still part of the queue */
> @@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, 
> struct Qdisc *q)
> return 0;
>  }
>
> +static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc 
> *q)
> +{
> +   spinlock_t *lock = qdisc_lock(q);
> +
> +   spin_lock(lock);
> +   __skb_queue_tail(&q->gso_skb, skb);

why does this requeue at the tail, unlike __dev_requeue_skb?

> +   spin_unlock(lock);
> +
> +   qdisc_qstats_cpu_requeues_inc(q);
> +   qdisc_qstats_cpu_backlog_inc(q, skb);
> +   qdisc_qstats_cpu_qlen_inc(q);
> +   __netif_schedule(q);
> +
> +   return 0;
> +}
> +


[RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-11-13 Thread John Fastabend
This work is preparing the qdisc layer to support egress lockless
qdiscs. If we are running the egress qdisc lockless in the case we
overrun the netdev, for whatever reason, the netdev returns a busy
error code and the skb is parked on the gso_skb pointer. With many
cores all hitting this case at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.

This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.

Signed-off-by: John Fastabend 
---
 0 files changed

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7bc2826..6e329f0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -88,7 +88,7 @@ struct Qdisc {
/*
 * For performance sake on SMP, we put highly modified fields at the end
 */
-   struct sk_buff  *gso_skb cacheline_aligned_in_smp;
+   struct sk_buff_head gso_skb cacheline_aligned_in_smp;
struct qdisc_skb_head   q;
struct gnet_stats_basic_packed bstats;
seqcount_t  running;
@@ -795,26 +795,30 @@ static inline struct sk_buff *qdisc_peek_head(struct 
Qdisc *sch)
 /* generic pseudo peek method for non-work-conserving qdisc */
 static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
 {
+   struct sk_buff *skb = skb_peek(&sch->gso_skb);
+
/* we can reuse ->gso_skb because peek isn't called for root qdiscs */
-   if (!sch->gso_skb) {
-   sch->gso_skb = sch->dequeue(sch);
-   if (sch->gso_skb) {
+   if (!skb) {
+   skb = sch->dequeue(sch);
+
+   if (skb) {
+   __skb_queue_head(&sch->gso_skb, skb);
/* it's still part of the queue */
-   qdisc_qstats_backlog_inc(sch, sch->gso_skb);
+   qdisc_qstats_backlog_inc(sch, skb);
sch->q.qlen++;
}
}
 
-   return sch->gso_skb;
+   return skb;
 }
 
 /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
 static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 {
-   struct sk_buff *skb = sch->gso_skb;
+   struct sk_buff *skb = skb_peek(&sch->gso_skb);
 
if (skb) {
-   sch->gso_skb = NULL;
+   skb = __skb_dequeue(&sch->gso_skb);
qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--;
} else {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 131eab8..1055bec 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -45,10 +45,9 @@
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
-
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
-   q->gso_skb = skb;
+   __skb_queue_head(&q->gso_skb, skb);
q->qstats.requeues++;
qdisc_qstats_backlog_inc(q, skb);
q->q.qlen++;/* it's still part of the queue */
@@ -57,6 +56,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, 
struct Qdisc *q)
return 0;
 }
 
+static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
+{
+   spinlock_t *lock = qdisc_lock(q);
+
+   spin_lock(lock);
+   __skb_queue_tail(&q->gso_skb, skb);
+   spin_unlock(lock);
+
+   qdisc_qstats_cpu_requeues_inc(q);
+   qdisc_qstats_cpu_backlog_inc(q, skb);
+   qdisc_qstats_cpu_qlen_inc(q);
+   __netif_schedule(q);
+
+   return 0;
+}
+
+static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+   if (q->flags & TCQ_F_NOLOCK)
+   return dev_requeue_skb_locked(skb, q);
+   else
+   return __dev_requeue_skb(skb, q);
+}
+
 static void try_bulk_dequeue_skb(struct Qdisc *q,
 struct sk_buff *skb,
 const struct netdev_queue *txq,
@@ -112,23 +135,50 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
   int *packets)
 {
-   struct sk_buff *skb = q->gso_skb;
const struct netdev_queue *txq = q->dev_queue;
+   struct sk_buff *skb;
 
*packets = 1;
-   if (unlikely(skb)) {
+   if (unlikely(!skb_queue_empty(&q->gso_skb))) {
+   spinlock_t *lock = NULL;
+
+   if (q->flags & TCQ_F_NOLOCK) {
+   lock = qdisc_lock(q);
+   spin_lock(lock);
+   }
+
+   skb = skb_peek(&q->gso_skb);
+
+   /* skb may be null if another cpu pulls gso_skb off in between
+* empty check and lock.
+*/
+   if (!skb) {
+   if (lock)
+

[RFC PATCH 06/17] net: sched: explicit locking in gso_cpu fallback

2017-05-02 Thread John Fastabend
This work is preparing the qdisc layer to support egress lockless
qdiscs. If we are running the egress qdisc lockless in the case we
overrun the netdev, for whatever reason, the netdev returns a busy
error code and the skb is parked on the gso_skb pointer. With many
cores all hitting this case at once its possible to have multiple
sk_buffs here so we turn gso_skb into a queue.

This should be the edge case and if we see this frequently then
the netdev/qdisc layer needs to back off.

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |   20 +++
 net/sched/sch_generic.c   |   81 ++---
 2 files changed, 80 insertions(+), 21 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 83952af..3021bbb 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -85,7 +85,7 @@ struct Qdisc {
/*
 * For performance sake on SMP, we put highly modified fields at the end
 */
-   struct sk_buff  *gso_skb cacheline_aligned_in_smp;
+   struct sk_buff_head gso_skb cacheline_aligned_in_smp;
struct qdisc_skb_head   q;
struct gnet_stats_basic_packed bstats;
seqcount_t  running;
@@ -754,26 +754,30 @@ static inline struct sk_buff *qdisc_peek_head(struct 
Qdisc *sch)
 /* generic pseudo peek method for non-work-conserving qdisc */
 static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
 {
+   struct sk_buff *skb = skb_peek(&sch->gso_skb);
+
/* we can reuse ->gso_skb because peek isn't called for root qdiscs */
-   if (!sch->gso_skb) {
-   sch->gso_skb = sch->dequeue(sch);
-   if (sch->gso_skb) {
+   if (!skb) {
+   skb = sch->dequeue(sch);
+
+   if (skb) {
+   __skb_queue_head(&sch->gso_skb, skb);
/* it's still part of the queue */
-   qdisc_qstats_backlog_inc(sch, sch->gso_skb);
+   qdisc_qstats_backlog_inc(sch, skb);
sch->q.qlen++;
}
}
 
-   return sch->gso_skb;
+   return skb;
 }
 
 /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
 static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 {
-   struct sk_buff *skb = sch->gso_skb;
+   struct sk_buff *skb = skb_peek(&sch->gso_skb);
 
if (skb) {
-   sch->gso_skb = NULL;
+   skb = __skb_dequeue(&sch->gso_skb);
qdisc_qstats_backlog_dec(sch, skb);
sch->q.qlen--;
} else {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 8a03738..f179fc9 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -44,10 +44,9 @@
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
-
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
-   q->gso_skb = skb;
+   __skb_queue_head(&q->gso_skb, skb);
q->qstats.requeues++;
qdisc_qstats_backlog_inc(q, skb);
q->q.qlen++;/* it's still part of the queue */
@@ -56,6 +55,30 @@ static inline int dev_requeue_skb(struct sk_buff *skb, 
struct Qdisc *q)
return 0;
 }
 
+static inline int dev_requeue_skb_locked(struct sk_buff *skb, struct Qdisc *q)
+{
+   spinlock_t *lock = qdisc_lock(q);
+
+   spin_lock(lock);
+   __skb_queue_tail(&q->gso_skb, skb);
+   spin_unlock(lock);
+
+   qdisc_qstats_cpu_requeues_inc(q);
+   qdisc_qstats_cpu_backlog_inc(q, skb);
+   qdisc_qstats_cpu_qlen_inc(q);
+   __netif_schedule(q);
+
+   return 0;
+}
+
+static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+   if (q->flags & TCQ_F_NOLOCK)
+   return dev_requeue_skb_locked(skb, q);
+   else
+   return __dev_requeue_skb(skb, q);
+}
+
 static void try_bulk_dequeue_skb(struct Qdisc *q,
 struct sk_buff *skb,
 const struct netdev_queue *txq,
@@ -111,23 +134,50 @@ static void try_bulk_dequeue_skb_slow(struct Qdisc *q,
 static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
   int *packets)
 {
-   struct sk_buff *skb = q->gso_skb;
const struct netdev_queue *txq = q->dev_queue;
+   struct sk_buff *skb;
 
*packets = 1;
-   if (unlikely(skb)) {
+   if (unlikely(!skb_queue_empty(&q->gso_skb))) {
+   spinlock_t *lock = NULL;
+
+   if (q->flags & TCQ_F_NOLOCK) {
+   lock = qdisc_lock(q);
+   spin_lock(lock);
+   }
+
+   skb = skb_peek(&q->gso_skb);
+
+   /* skb may be null if another cpu pulls gso_skb of