Re: [Xen-devel] [v3,11/41] mips: reuse asm-generic/barrier.h
Paul E. McKenney wrote: > > You could use SYNC_ACQUIRE() to implement read_barrier_depends() and > smp_read_barrier_depends(), but SYNC_RMB probably does not suffice. > The reason for this is that smp_read_barrier_depends() must order the > pointer load against any subsequent read or write through a dereference > of that pointer. For example: > >p = READ_ONCE(gp); >smp_rmb(); >r1 = p->a; /* ordered by smp_rmb(). */ >p->b = 42; /* NOT ordered by smp_rmb(), BUG!!! */ >r2 = x; /* ordered by smp_rmb(), but doesn't need to be. */ > > In contrast: > >p = READ_ONCE(gp); >smp_read_barrier_depends(); >r1 = p->a; /* ordered by smp_read_barrier_depends(). */ >p->b = 42; /* ordered by smp_read_barrier_depends(). */ >r2 = x; /* not ordered by smp_read_barrier_depends(), which is OK. */ > > Again, if your hardware maintains local ordering for address > and data dependencies, you can have read_barrier_depends() and > smp_read_barrier_depends() be no-ops like they are for most > architectures. > > Does that help? This is crazy! smp_rmb started out being strictly stronger than smp_read_barrier_depends, when did this stop being the case? -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen
Eric Dumazet wrote: > > We already have netdev->gso_max_size and netdev->gso_max_segs > which are cached into sk->sk_gso_max_size & sk->sk_gso_max_segs It is quite dangerous to attempt tricks like this because a tc redirection or netfilter nat could change the destination device rendering such hints incorrect. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] caif: Fix napi poll list corruption
On Mon, Dec 22, 2014 at 04:18:33PM +0800, Jason Wang wrote: > > btw, looks like at least caif_virtio has the same issue. Good catch. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) breaks caif. It is now required that if the entire budget is consumed when poll returns, the napi poll_list must remain empty. However, like some other drivers caif tries to do a last-ditch check and if there is more work it will call napi_schedule and then immediately process some of this new work. Should the entire budget be consumed while processing such new work then we will violate the new caller contract. This patch fixes this by not touching any work when we reschedule in caif. Signed-off-by: Herbert Xu diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c index a5fefb9..b306210 100644 --- a/drivers/net/caif/caif_virtio.c +++ b/drivers/net/caif/caif_virtio.c @@ -257,7 +257,6 @@ static int cfv_rx_poll(struct napi_struct *napi, int quota) struct vringh_kiov *riov = &cfv->ctx.riov; unsigned int skb_len; -again: do { skb = NULL; @@ -322,7 +321,6 @@ exit: napi_schedule_prep(napi)) { vringh_notify_disable_kern(cfv->vr_rx); __napi_schedule(napi); - goto again; } break; Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/4] net: Rearrange loop in net_rx_action
This patch rearranges the loop in net_rx_action to reduce the amount of jumping back and forth when reading the code. Signed-off-by: Herbert Xu --- net/core/dev.c | 26 -- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 22b0fb2..dd565a5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4631,9 +4631,15 @@ static void net_rx_action(struct softirq_action *h) list_splice_init(&sd->poll_list, &list); local_irq_enable(); - while (!list_empty(&list)) { + for (;;) { struct napi_struct *n; + if (list_empty(&list)) { + if (!sd_has_rps_ipi_waiting(sd) && list_empty(&repoll)) + return; + break; + } + n = list_first_entry(&list, struct napi_struct, poll_list); budget -= napi_poll(n, &repoll); @@ -4641,15 +4647,13 @@ static void net_rx_action(struct softirq_action *h) * Allow this to run for 2 jiffies since which will allow * an average latency of 1.5/HZ. */ - if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit))) - goto softnet_break; + if (unlikely(budget <= 0 || +time_after_eq(jiffies, time_limit))) { + sd->time_squeeze++; + break; + } } - if (!sd_has_rps_ipi_waiting(sd) && - list_empty(&list) && - list_empty(&repoll)) - return; -out: local_irq_disable(); list_splice_tail_init(&sd->poll_list, &list); @@ -4659,12 +4663,6 @@ out: __raise_softirq_irqoff(NET_RX_SOFTIRQ); net_rps_action_and_irq_enable(sd); - - return; - -softnet_break: - sd->time_squeeze++; - goto out; } struct netdev_adjacent { ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget
The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) required drivers to leave poll_list empty if the entire budget is consumed. We have already had two broken drivers so let's add a check for this. Signed-off-by: Herbert Xu --- net/core/dev.c |9 + 1 file changed, 9 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index f7c4f4e..4a9c424 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4602,6 +4602,15 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) napi_gro_flush(n, HZ >= 1000); } + /* Some drivers may have called napi_schedule +* prior to exhausting their budget. +*/ + if (unlikely(!list_empty(&n->poll_list))) { + pr_warn_once("%s: Budget exhausted after napi rescheduled\n", +n->dev ? n->dev->name : "backlog"); + goto out_unlock; + } + list_add_tail(&n->poll_list, repoll); out_unlock: ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/4] net: Always poll at least one device in net_rx_action
We should only perform the softnet_break check after we have polled at least one device in net_rx_action. Otherwise a zero or negative setting of netdev_budget can lock up the whole system. Signed-off-by: Herbert Xu --- net/core/dev.c |7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 4a9c424..22b0fb2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4634,16 +4634,15 @@ static void net_rx_action(struct softirq_action *h) while (!list_empty(&list)) { struct napi_struct *n; + n = list_first_entry(&list, struct napi_struct, poll_list); + budget -= napi_poll(n, &repoll); + /* If softirq window is exhausted then punt. * Allow this to run for 2 jiffies since which will allow * an average latency of 1.5/HZ. */ if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit))) goto softnet_break; - - - n = list_first_entry(&list, struct napi_struct, poll_list); - budget -= napi_poll(n, &repoll); } if (!sd_has_rps_ipi_waiting(sd) && ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/4] net: Move napi polling code out of net_rx_action
This patch creates a new function napi_poll and moves the napi polling code from net_rx_action into it. Signed-off-by: Herbert Xu --- net/core/dev.c | 98 +++-- 1 file changed, 54 insertions(+), 44 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index f411c28..f7c4f4e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4557,6 +4557,59 @@ void netif_napi_del(struct napi_struct *napi) } EXPORT_SYMBOL(netif_napi_del); +static int napi_poll(struct napi_struct *n, struct list_head *repoll) +{ + void *have; + int work, weight; + + list_del_init(&n->poll_list); + + have = netpoll_poll_lock(n); + + weight = n->weight; + + /* This NAPI_STATE_SCHED test is for avoiding a race +* with netpoll's poll_napi(). Only the entity which +* obtains the lock and sees NAPI_STATE_SCHED set will +* actually make the ->poll() call. Therefore we avoid +* accidentally calling ->poll() when NAPI is not scheduled. +*/ + work = 0; + if (test_bit(NAPI_STATE_SCHED, &n->state)) { + work = n->poll(n, weight); + trace_napi_poll(n); + } + + WARN_ON_ONCE(work > weight); + + if (likely(work < weight)) + goto out_unlock; + + /* Drivers must not modify the NAPI state if they +* consume the entire weight. In such cases this code +* still "owns" the NAPI instance and therefore can +* move the instance around on the list at-will. +*/ + if (unlikely(napi_disable_pending(n))) { + napi_complete(n); + goto out_unlock; + } + + if (n->gro_list) { + /* flush too old packets +* If HZ < 1000, flush all packets. +*/ + napi_gro_flush(n, HZ >= 1000); + } + + list_add_tail(&n->poll_list, repoll); + +out_unlock: + netpoll_poll_unlock(have); + + return work; +} + static void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); @@ -4564,7 +4617,6 @@ static void net_rx_action(struct softirq_action *h) int budget = netdev_budget; LIST_HEAD(list); LIST_HEAD(repoll); - void *have; local_irq_disable(); list_splice_init(&sd->poll_list, &list); @@ -4572,7 +4624,6 @@ static void net_rx_action(struct softirq_action *h) while (!list_empty(&list)) { struct napi_struct *n; - int work, weight; /* If softirq window is exhausted then punt. * Allow this to run for 2 jiffies since which will allow @@ -4583,48 +4634,7 @@ static void net_rx_action(struct softirq_action *h) n = list_first_entry(&list, struct napi_struct, poll_list); - list_del_init(&n->poll_list); - - have = netpoll_poll_lock(n); - - weight = n->weight; - - /* This NAPI_STATE_SCHED test is for avoiding a race -* with netpoll's poll_napi(). Only the entity which -* obtains the lock and sees NAPI_STATE_SCHED set will -* actually make the ->poll() call. Therefore we avoid -* accidentally calling ->poll() when NAPI is not scheduled. -*/ - work = 0; - if (test_bit(NAPI_STATE_SCHED, &n->state)) { - work = n->poll(n, weight); - trace_napi_poll(n); - } - - WARN_ON_ONCE(work > weight); - - budget -= work; - - /* Drivers must not modify the NAPI state if they -* consume the entire weight. In such cases this code -* still "owns" the NAPI instance and therefore can -* move the instance around on the list at-will. -*/ - if (unlikely(work == weight)) { - if (unlikely(napi_disable_pending(n))) { - napi_complete(n); - } else { - if (n->gro_list) { - /* flush too old packets -* If HZ < 1000, flush all packets. -*/ - napi_gro_flush(n, HZ >= 1000); - } - list_add_tail(&n->poll_list, &repoll); - } - } - - netpoll_poll_unlock(have); + budget -= napi_poll(n, &repoll); } if (!sd_has_rps_ipi_waiting(sd) && ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [0/4] net: net_rx_action fixes and clean-ups
On Sat, Dec 20, 2014 at 10:00:12AM -0800, Eric Dumazet wrote: > > OK, but could you : > 1) use pr_warn_once() > 2) split the too long line > pr_warn_once("%s: Budget exhausted after napi rescheduled\n", >n->dev ? n->dev->name : "backlog"); Sure, I'll clean it up a bit too. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] net: Detect drivers that reschedule NAPI and exhaust budget
On Fri, Dec 19, 2014 at 09:40:00PM -0500, David Miller wrote: > From: Eric Dumazet > Date: Fri, 19 Dec 2014 17:34:48 -0800 > > >> @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h) > >> */ > >>napi_gro_flush(n, HZ >= 1000); > >>} > >> - list_add_tail(&n->poll_list, &repoll); > >> + /* Some drivers may have called napi_schedule > >> + * prior to exhausting their budget. > >> + */ > >> + if (!WARN_ON_ONCE(!list_empty(&n->poll_list))) > >> + list_add_tail(&n->poll_list, &repoll); > >>} > >>} > >> > > > > I do not think stack trace will point to the buggy driver ? > > > > IMO it would be better to print a single line with the netdev name ? > > Right, we are already back from the poll routine and will just end > up seeing the call trace leading to the software interrupt. Good point Eric. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) required drivers to leave poll_list empty if the entire budget is consumed. We have already had two broken drivers so let's add a check for this. Signed-off-by: Herbert Xu diff --git a/net/core/dev.c b/net/core/dev.c index f411c28..47fdc5c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4620,7 +4620,13 @@ static void net_rx_action(struct softirq_action *h) */ napi_gro_flush(n, HZ >= 1000); } - list_add_tail(&n->poll_list, &repoll); + /* Some drivers may have called napi_schedule +* prior to exhausting their budget. +*/ + if (unlikely(!list_empty(&n->poll_list))) + pr_warn("%s: Budget exhausted after napi rescheduled\n", n->dev ? n->dev->name : "backlog"); + else + list_add_tail(&n->poll_list, &repoll); } } Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] net: Detect drivers that reschedule NAPI and exhaust budget
On Sat, Dec 20, 2014 at 11:23:27AM +1100, Herbert Xu wrote: > > A similar bug exists in virtio_net. In order to detect other drivers doing this we should add something like this. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) required drivers to leave poll_list empty if the entire budget is consumed. We have already had two broken drivers so let's add a check for this. Signed-off-by: Herbert Xu diff --git a/net/core/dev.c b/net/core/dev.c index f411c28..88f9725 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4620,7 +4620,11 @@ static void net_rx_action(struct softirq_action *h) */ napi_gro_flush(n, HZ >= 1000); } - list_add_tail(&n->poll_list, &repoll); + /* Some drivers may have called napi_schedule +* prior to exhausting their budget. +*/ + if (!WARN_ON_ONCE(!list_empty(&n->poll_list))) + list_add_tail(&n->poll_list, &repoll); } } Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] virtio_net: Fix napi poll list corruption
David Vrabel wrote: > After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt > masking in NAPI) the napi instance is removed from the per-cpu list > prior to calling the n->poll(), and is only requeued if all of the > budget was used. This inadvertently broke netfront because netfront > does not use NAPI correctly. A similar bug exists in virtio_net. -- >8 -- The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt masking in NAPI) breaks virtio_net in an insidious way. It is now required that if the entire budget is consumed when poll returns, the napi poll_list must remain empty. However, like some other drivers virtio_net tries to do a last-ditch check and if there is more work it will call napi_schedule and then immediately process some of this new work. Should the entire budget be consumed while processing such new work then we will violate the new caller contract. This patch fixes this by not touching any work when we reschedule in virtio_net. The worst part of this bug is that the list corruption causes other napi users to be moved off-list. In my case I was chasing a stall in IPsec (IPsec uses netif_rx) and I only belatedly realised that it was virtio_net which caused the stall even though the virtio_net poll was still functioning perfectly after IPsec stalled. Signed-off-by: Herbert Xu diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index b8bd719..5ca9771 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -760,7 +760,6 @@ static int virtnet_poll(struct napi_struct *napi, int budget) container_of(napi, struct receive_queue, napi); unsigned int r, received = 0; -again: received += virtnet_receive(rq, budget - received); /* Out of packets? */ @@ -771,7 +770,6 @@ again: napi_schedule_prep(napi)) { virtqueue_disable_cb(rq->vq); __napi_schedule(napi); - goto again; } } Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel