Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Tue, Apr 20, 2021 at 12:16:40AM +0200, Toke Høiland-Jørgensen wrote: > "Paul E. McKenney" writes: > > > On Mon, Apr 19, 2021 at 11:21:41PM +0200, Toke Høiland-Jørgensen wrote: > >> "Paul E. McKenney" writes: > >> > >> > On Mon, Apr 19, 2021 at 08:12:27PM +0200, Toke Høiland-Jørgensen wrote: > >> >> "Paul E. McKenney" writes: > >> >> > >> >> > On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen > >> >> > wrote: > >> >> >> "Paul E. McKenney" writes: > >> >> >> > >> >> >> > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: > >> >> >> >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer > >> >> >> >> wrote: > >> >> >> >> > On Thu, 15 Apr 2021 17:39:13 -0700 > >> >> >> >> > Martin KaFai Lau wrote: > >> >> >> >> > > >> >> >> >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke > >> >> >> >> > > Høiland-Jørgensen wrote: > >> >> >> >> > > > Jesper Dangaard Brouer writes: > >> >> >> >> > > > > >> >> >> >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > >> >> >> >> > > > > Martin KaFai Lau wrote: > >> >> >> >> > > > > > >> >> >> >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke > >> >> >> >> > > > >> Høiland-Jørgensen wrote: > >> >> >> >> > > > >> > Hangbin Liu writes: > >> >> >> >> > > > >> > > >> >> >> >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin > >> >> >> >> > > > >> > > KaFai Lau wrote: > >> >> >> >> > > > >> > >> > static void bq_xmit_all(struct > >> >> >> >> > > > >> > >> > xdp_dev_bulk_queue *bq, u32 flags) > >> >> >> >> > > > >> > >> > { > >> >> >> >> > > > >> > >> > struct net_device *dev = bq->dev; > >> >> >> >> > > > >> > >> > - int sent = 0, err = 0; > >> >> >> >> > > > >> > >> > + int sent = 0, drops = 0, err = 0; > >> >> >> >> > > > >> > >> > + unsigned int cnt = bq->count; > >> >> >> >> > > > >> > >> > + int to_send = cnt; > >> >> >> >> > > > >> > >> > int i; > >> >> >> >> > > > >> > >> > > >> >> >> >> > > > >> > >> > - if (unlikely(!bq->count)) > >> >> >> >> > > > >> > >> > + if (unlikely(!cnt)) > >> >> >> >> > > > >> > >> > return; > >> >> >> >> > > > >> > >> > > >> >> >> >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { > >> >> >> >> > > > >> > >> > + for (i = 0; i < cnt; i++) { > >> >> >> >> > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > >> >> >> >> > > > >> > >> > > >> >> >> >> > > > >> > >> > prefetch(xdpf); > >> >> >> >> > > > >> > >> > } > >> >> >> >> > > > >> > >> > > >> >> >> >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, > >> >> >> >> > > > >> > >> > bq->count, bq->q, flags); > >> >> >> >> > > > >> > >> > + if (bq->xdp_prog) { > >> >> >> >> > > > >> > >> bq->xdp_prog is used here > >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> > + to_send = > >> >> >> >> > > > >> > >> > dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, > >> >> >> >> > > > >> > >> > dev); > >> >> >> >> > > > >> > >> > + if (!to_send) > >> >> >> >> > > > >> > >> > + goto out; > >> >> >> >> > > > >> > >> > + > >> >> >> >> > > > >> > >> > + drops = cnt - to_send; > >> >> >> >> > > > >> > >> > + } > >> >> >> >> > > > >> > >> > + > >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> [ ... ] > >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, > >> >> >> >> > > > >> > >> > struct xdp_frame *xdpf, > >> >> >> >> > > > >> > >> > -struct net_device *dev_rx) > >> >> >> >> > > > >> > >> > +struct net_device *dev_rx, > >> >> >> >> > > > >> > >> > struct bpf_prog *xdp_prog) > >> >> >> >> > > > >> > >> > { > >> >> >> >> > > > >> > >> > struct list_head *flush_list = > >> >> >> >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); > >> >> >> >> > > > >> > >> > struct xdp_dev_bulk_queue *bq = > >> >> >> >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); > >> >> >> >> > > > >> > >> > @@ -412,18 +466,22 @@ static void > >> >> >> >> > > > >> > >> > bq_enqueue(struct net_device *dev, struct > >> >> >> >> > > > >> > >> > xdp_frame *xdpf, > >> >> >> >> > > > >> > >> > /* Ingress dev_rx will be the same for all > >> >> >> >> > > > >> > >> > xdp_frame's in > >> >> >> >> > > > >> > >> >* bulk_queue, because bq stored per-CPU and > >> >> >> >> > > > >> > >> > must be flushed > >> >> >> >> > > > >> > >> >* from net_device drivers NAPI func end. > >> >> >> >> > > > >> > >> > + * > >> >> >> >> > > > >> > >> > + * Do the same with xdp_prog and flush_list > >> >> >> >> > > > >> > >> > since these fields > >> >> >> >> > > > >> > >> > + * are only ever modified together. > >> >> >> >> > > > >> > >> >*/ > >> >> >> >> > > > >> > >> > - if (!bq->dev_rx) > >> >> >> >> > > > >> > >> > + if (!bq->dev_rx) { > >> >> >> >> > > > >> > >> > bq->dev_rx = dev_rx; > >> >> >> >> > > > >> > >> > + bq->xdp_prog = xdp_prog; > >> >> >> >> > > > >> > >> bp->xdp_prog is assigned here and could be used
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
"Paul E. McKenney" writes: > On Mon, Apr 19, 2021 at 11:21:41PM +0200, Toke Høiland-Jørgensen wrote: >> "Paul E. McKenney" writes: >> >> > On Mon, Apr 19, 2021 at 08:12:27PM +0200, Toke Høiland-Jørgensen wrote: >> >> "Paul E. McKenney" writes: >> >> >> >> > On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen wrote: >> >> >> "Paul E. McKenney" writes: >> >> >> >> >> >> > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: >> >> >> >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer >> >> >> >> wrote: >> >> >> >> > On Thu, 15 Apr 2021 17:39:13 -0700 >> >> >> >> > Martin KaFai Lau wrote: >> >> >> >> > >> >> >> >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke >> >> >> >> > > Høiland-Jørgensen wrote: >> >> >> >> > > > Jesper Dangaard Brouer writes: >> >> >> >> > > > >> >> >> >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 >> >> >> >> > > > > Martin KaFai Lau wrote: >> >> >> >> > > > > >> >> >> >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke >> >> >> >> > > > >> Høiland-Jørgensen wrote: >> >> >> >> > > > >> > Hangbin Liu writes: >> >> >> >> > > > >> > >> >> >> >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai >> >> >> >> > > > >> > > Lau wrote: >> >> >> >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue >> >> >> >> > > > >> > >> > *bq, u32 flags) >> >> >> >> > > > >> > >> > { >> >> >> >> > > > >> > >> > struct net_device *dev = bq->dev; >> >> >> >> > > > >> > >> > - int sent = 0, err = 0; >> >> >> >> > > > >> > >> > + int sent = 0, drops = 0, err = 0; >> >> >> >> > > > >> > >> > + unsigned int cnt = bq->count; >> >> >> >> > > > >> > >> > + int to_send = cnt; >> >> >> >> > > > >> > >> > int i; >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> > - if (unlikely(!bq->count)) >> >> >> >> > > > >> > >> > + if (unlikely(!cnt)) >> >> >> >> > > > >> > >> > return; >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { >> >> >> >> > > > >> > >> > + for (i = 0; i < cnt; i++) { >> >> >> >> > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> > prefetch(xdpf); >> >> >> >> > > > >> > >> > } >> >> >> >> > > > >> > >> > >> >> >> >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, >> >> >> >> > > > >> > >> > bq->count, bq->q, flags); >> >> >> >> > > > >> > >> > + if (bq->xdp_prog) { >> >> >> >> > > > >> > >> bq->xdp_prog is used here >> >> >> >> > > > >> > >> >> >> >> >> > > > >> > >> > + to_send = >> >> >> >> > > > >> > >> > dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); >> >> >> >> > > > >> > >> > + if (!to_send) >> >> >> >> > > > >> > >> > + goto out; >> >> >> >> > > > >> > >> > + >> >> >> >> > > > >> > >> > + drops = cnt - to_send; >> >> >> >> > > > >> > >> > + } >> >> >> >> > > > >> > >> > + >> >> >> >> > > > >> > >> >> >> >> >> > > > >> > >> [ ... ] >> >> >> >> > > > >> > >> >> >> >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, >> >> >> >> > > > >> > >> > struct xdp_frame *xdpf, >> >> >> >> > > > >> > >> > - struct net_device *dev_rx) >> >> >> >> > > > >> > >> > + struct net_device *dev_rx, >> >> >> >> > > > >> > >> > struct bpf_prog *xdp_prog) >> >> >> >> > > > >> > >> > { >> >> >> >> > > > >> > >> > struct list_head *flush_list = >> >> >> >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); >> >> >> >> > > > >> > >> > struct xdp_dev_bulk_queue *bq = >> >> >> >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); >> >> >> >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct >> >> >> >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, >> >> >> >> > > > >> > >> > /* Ingress dev_rx will be the same for all >> >> >> >> > > > >> > >> > xdp_frame's in >> >> >> >> > > > >> > >> > * bulk_queue, because bq stored per-CPU and >> >> >> >> > > > >> > >> > must be flushed >> >> >> >> > > > >> > >> > * from net_device drivers NAPI func end. >> >> >> >> > > > >> > >> > +* >> >> >> >> > > > >> > >> > +* Do the same with xdp_prog and flush_list >> >> >> >> > > > >> > >> > since these fields >> >> >> >> > > > >> > >> > +* are only ever modified together. >> >> >> >> > > > >> > >> > */ >> >> >> >> > > > >> > >> > - if (!bq->dev_rx) >> >> >> >> > > > >> > >> > + if (!bq->dev_rx) { >> >> >> >> > > > >> > >> > bq->dev_rx = dev_rx; >> >> >> >> > > > >> > >> > + bq->xdp_prog = xdp_prog; >> >> >> >> > > > >> > >> bp->xdp_prog is assigned here and could be used later >> >> >> >> > > > >> > >> in bq_xmit_all(). >> >> >> >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one >> >> >> >> > > > >> > >> rcu_read_lock()? >> >> >> >> > > > >> > >> It is not very obvious after taking a quick look at >> >> >
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Mon, Apr 19, 2021 at 11:21:41PM +0200, Toke Høiland-Jørgensen wrote: > "Paul E. McKenney" writes: > > > On Mon, Apr 19, 2021 at 08:12:27PM +0200, Toke Høiland-Jørgensen wrote: > >> "Paul E. McKenney" writes: > >> > >> > On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen wrote: > >> >> "Paul E. McKenney" writes: > >> >> > >> >> > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: > >> >> >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer > >> >> >> wrote: > >> >> >> > On Thu, 15 Apr 2021 17:39:13 -0700 > >> >> >> > Martin KaFai Lau wrote: > >> >> >> > > >> >> >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen > >> >> >> > > wrote: > >> >> >> > > > Jesper Dangaard Brouer writes: > >> >> >> > > > > >> >> >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > >> >> >> > > > > Martin KaFai Lau wrote: > >> >> >> > > > > > >> >> >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke > >> >> >> > > > >> Høiland-Jørgensen wrote: > >> >> >> > > > >> > Hangbin Liu writes: > >> >> >> > > > >> > > >> >> >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai > >> >> >> > > > >> > > Lau wrote: > >> >> >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue > >> >> >> > > > >> > >> > *bq, u32 flags) > >> >> >> > > > >> > >> > { > >> >> >> > > > >> > >> > struct net_device *dev = bq->dev; > >> >> >> > > > >> > >> > -int sent = 0, err = 0; > >> >> >> > > > >> > >> > +int sent = 0, drops = 0, err = 0; > >> >> >> > > > >> > >> > +unsigned int cnt = bq->count; > >> >> >> > > > >> > >> > +int to_send = cnt; > >> >> >> > > > >> > >> > int i; > >> >> >> > > > >> > >> > > >> >> >> > > > >> > >> > -if (unlikely(!bq->count)) > >> >> >> > > > >> > >> > +if (unlikely(!cnt)) > >> >> >> > > > >> > >> > return; > >> >> >> > > > >> > >> > > >> >> >> > > > >> > >> > -for (i = 0; i < bq->count; i++) { > >> >> >> > > > >> > >> > +for (i = 0; i < cnt; i++) { > >> >> >> > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > >> >> >> > > > >> > >> > > >> >> >> > > > >> > >> > prefetch(xdpf); > >> >> >> > > > >> > >> > } > >> >> >> > > > >> > >> > > >> >> >> > > > >> > >> > -sent = dev->netdev_ops->ndo_xdp_xmit(dev, > >> >> >> > > > >> > >> > bq->count, bq->q, flags); > >> >> >> > > > >> > >> > +if (bq->xdp_prog) { > >> >> >> > > > >> > >> bq->xdp_prog is used here > >> >> >> > > > >> > >> > >> >> >> > > > >> > >> > +to_send = > >> >> >> > > > >> > >> > dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); > >> >> >> > > > >> > >> > +if (!to_send) > >> >> >> > > > >> > >> > +goto out; > >> >> >> > > > >> > >> > + > >> >> >> > > > >> > >> > +drops = cnt - to_send; > >> >> >> > > > >> > >> > +} > >> >> >> > > > >> > >> > + > >> >> >> > > > >> > >> > >> >> >> > > > >> > >> [ ... ] > >> >> >> > > > >> > >> > >> >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, > >> >> >> > > > >> > >> > struct xdp_frame *xdpf, > >> >> >> > > > >> > >> > - struct net_device *dev_rx) > >> >> >> > > > >> > >> > + struct net_device *dev_rx, > >> >> >> > > > >> > >> > struct bpf_prog *xdp_prog) > >> >> >> > > > >> > >> > { > >> >> >> > > > >> > >> > struct list_head *flush_list = > >> >> >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); > >> >> >> > > > >> > >> > struct xdp_dev_bulk_queue *bq = > >> >> >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); > >> >> >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct > >> >> >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, > >> >> >> > > > >> > >> > /* Ingress dev_rx will be the same for all > >> >> >> > > > >> > >> > xdp_frame's in > >> >> >> > > > >> > >> > * bulk_queue, because bq stored per-CPU and > >> >> >> > > > >> > >> > must be flushed > >> >> >> > > > >> > >> > * from net_device drivers NAPI func end. > >> >> >> > > > >> > >> > + * > >> >> >> > > > >> > >> > + * Do the same with xdp_prog and flush_list > >> >> >> > > > >> > >> > since these fields > >> >> >> > > > >> > >> > + * are only ever modified together. > >> >> >> > > > >> > >> > */ > >> >> >> > > > >> > >> > -if (!bq->dev_rx) > >> >> >> > > > >> > >> > +if (!bq->dev_rx) { > >> >> >> > > > >> > >> > bq->dev_rx = dev_rx; > >> >> >> > > > >> > >> > +bq->xdp_prog = xdp_prog; > >> >> >> > > > >> > >> bp->xdp_prog is assigned here and could be used later > >> >> >> > > > >> > >> in bq_xmit_all(). > >> >> >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one > >> >> >> > > > >> > >> rcu_read_lock()? > >> >> >> > > > >> > >> It is not very obvious after taking a quick look at > >> >> >> > > > >> > >> xdp_do_flush[_map]. > >> >> >> > > > >> > >> > >> >> >> > > > >> > >> e
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
"Paul E. McKenney" writes: > On Mon, Apr 19, 2021 at 08:12:27PM +0200, Toke Høiland-Jørgensen wrote: >> "Paul E. McKenney" writes: >> >> > On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen wrote: >> >> "Paul E. McKenney" writes: >> >> >> >> > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: >> >> >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: >> >> >> > On Thu, 15 Apr 2021 17:39:13 -0700 >> >> >> > Martin KaFai Lau wrote: >> >> >> > >> >> >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen >> >> >> > > wrote: >> >> >> > > > Jesper Dangaard Brouer writes: >> >> >> > > > >> >> >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 >> >> >> > > > > Martin KaFai Lau wrote: >> >> >> > > > > >> >> >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke >> >> >> > > > >> Høiland-Jørgensen wrote: >> >> >> > > > >> > Hangbin Liu writes: >> >> >> > > > >> > >> >> >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai >> >> >> > > > >> > > Lau wrote: >> >> >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue >> >> >> > > > >> > >> > *bq, u32 flags) >> >> >> > > > >> > >> > { >> >> >> > > > >> > >> >struct net_device *dev = bq->dev; >> >> >> > > > >> > >> > - int sent = 0, err = 0; >> >> >> > > > >> > >> > + int sent = 0, drops = 0, err = 0; >> >> >> > > > >> > >> > + unsigned int cnt = bq->count; >> >> >> > > > >> > >> > + int to_send = cnt; >> >> >> > > > >> > >> >int i; >> >> >> > > > >> > >> > >> >> >> > > > >> > >> > - if (unlikely(!bq->count)) >> >> >> > > > >> > >> > + if (unlikely(!cnt)) >> >> >> > > > >> > >> >return; >> >> >> > > > >> > >> > >> >> >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { >> >> >> > > > >> > >> > + for (i = 0; i < cnt; i++) { >> >> >> > > > >> > >> >struct xdp_frame *xdpf = bq->q[i]; >> >> >> > > > >> > >> > >> >> >> > > > >> > >> >prefetch(xdpf); >> >> >> > > > >> > >> >} >> >> >> > > > >> > >> > >> >> >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, >> >> >> > > > >> > >> > bq->count, bq->q, flags); >> >> >> > > > >> > >> > + if (bq->xdp_prog) { >> >> >> > > > >> > >> bq->xdp_prog is used here >> >> >> > > > >> > >> >> >> >> > > > >> > >> > + to_send = >> >> >> > > > >> > >> > dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); >> >> >> > > > >> > >> > + if (!to_send) >> >> >> > > > >> > >> > + goto out; >> >> >> > > > >> > >> > + >> >> >> > > > >> > >> > + drops = cnt - to_send; >> >> >> > > > >> > >> > + } >> >> >> > > > >> > >> > + >> >> >> > > > >> > >> >> >> >> > > > >> > >> [ ... ] >> >> >> > > > >> > >> >> >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct >> >> >> > > > >> > >> > xdp_frame *xdpf, >> >> >> > > > >> > >> > - struct net_device *dev_rx) >> >> >> > > > >> > >> > + struct net_device *dev_rx, >> >> >> > > > >> > >> > struct bpf_prog *xdp_prog) >> >> >> > > > >> > >> > { >> >> >> > > > >> > >> >struct list_head *flush_list = >> >> >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); >> >> >> > > > >> > >> >struct xdp_dev_bulk_queue *bq = >> >> >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); >> >> >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct >> >> >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, >> >> >> > > > >> > >> >/* Ingress dev_rx will be the same for all >> >> >> > > > >> > >> > xdp_frame's in >> >> >> > > > >> > >> > * bulk_queue, because bq stored per-CPU and >> >> >> > > > >> > >> > must be flushed >> >> >> > > > >> > >> > * from net_device drivers NAPI func end. >> >> >> > > > >> > >> > + * >> >> >> > > > >> > >> > + * Do the same with xdp_prog and flush_list >> >> >> > > > >> > >> > since these fields >> >> >> > > > >> > >> > + * are only ever modified together. >> >> >> > > > >> > >> > */ >> >> >> > > > >> > >> > - if (!bq->dev_rx) >> >> >> > > > >> > >> > + if (!bq->dev_rx) { >> >> >> > > > >> > >> >bq->dev_rx = dev_rx; >> >> >> > > > >> > >> > + bq->xdp_prog = xdp_prog; >> >> >> > > > >> > >> bp->xdp_prog is assigned here and could be used later in >> >> >> > > > >> > >> bq_xmit_all(). >> >> >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one >> >> >> > > > >> > >> rcu_read_lock()? >> >> >> > > > >> > >> It is not very obvious after taking a quick look at >> >> >> > > > >> > >> xdp_do_flush[_map]. >> >> >> > > > >> > >> >> >> >> > > > >> > >> e.g. what if the devmap elem gets deleted. >> >> >> > > > >> > > >> >> >> > > > >> > > Jesper knows better than me. From my veiw, based on the >> >> >> > > > >> > > description of >> >> >> > > > >> > >
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Mon, Apr 19, 2021 at 08:12:27PM +0200, Toke Høiland-Jørgensen wrote: > "Paul E. McKenney" writes: > > > On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen wrote: > >> "Paul E. McKenney" writes: > >> > >> > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: > >> >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: > >> >> > On Thu, 15 Apr 2021 17:39:13 -0700 > >> >> > Martin KaFai Lau wrote: > >> >> > > >> >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen > >> >> > > wrote: > >> >> > > > Jesper Dangaard Brouer writes: > >> >> > > > > >> >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > >> >> > > > > Martin KaFai Lau wrote: > >> >> > > > > > >> >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke > >> >> > > > >> Høiland-Jørgensen wrote: > >> >> > > > >> > Hangbin Liu writes: > >> >> > > > >> > > >> >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau > >> >> > > > >> > > wrote: > >> >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, > >> >> > > > >> > >> > u32 flags) > >> >> > > > >> > >> > { > >> >> > > > >> > >> > struct net_device *dev = bq->dev; > >> >> > > > >> > >> > - int sent = 0, err = 0; > >> >> > > > >> > >> > + int sent = 0, drops = 0, err = 0; > >> >> > > > >> > >> > + unsigned int cnt = bq->count; > >> >> > > > >> > >> > + int to_send = cnt; > >> >> > > > >> > >> > int i; > >> >> > > > >> > >> > > >> >> > > > >> > >> > - if (unlikely(!bq->count)) > >> >> > > > >> > >> > + if (unlikely(!cnt)) > >> >> > > > >> > >> > return; > >> >> > > > >> > >> > > >> >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { > >> >> > > > >> > >> > + for (i = 0; i < cnt; i++) { > >> >> > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > >> >> > > > >> > >> > > >> >> > > > >> > >> > prefetch(xdpf); > >> >> > > > >> > >> > } > >> >> > > > >> > >> > > >> >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, > >> >> > > > >> > >> > bq->count, bq->q, flags); > >> >> > > > >> > >> > + if (bq->xdp_prog) { > >> >> > > > >> > >> bq->xdp_prog is used here > >> >> > > > >> > >> > >> >> > > > >> > >> > + to_send = > >> >> > > > >> > >> > dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); > >> >> > > > >> > >> > + if (!to_send) > >> >> > > > >> > >> > + goto out; > >> >> > > > >> > >> > + > >> >> > > > >> > >> > + drops = cnt - to_send; > >> >> > > > >> > >> > + } > >> >> > > > >> > >> > + > >> >> > > > >> > >> > >> >> > > > >> > >> [ ... ] > >> >> > > > >> > >> > >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct > >> >> > > > >> > >> > xdp_frame *xdpf, > >> >> > > > >> > >> > - struct net_device *dev_rx) > >> >> > > > >> > >> > + struct net_device *dev_rx, > >> >> > > > >> > >> > struct bpf_prog *xdp_prog) > >> >> > > > >> > >> > { > >> >> > > > >> > >> > struct list_head *flush_list = > >> >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); > >> >> > > > >> > >> > struct xdp_dev_bulk_queue *bq = > >> >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); > >> >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct > >> >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, > >> >> > > > >> > >> > /* Ingress dev_rx will be the same for all > >> >> > > > >> > >> > xdp_frame's in > >> >> > > > >> > >> > * bulk_queue, because bq stored per-CPU and > >> >> > > > >> > >> > must be flushed > >> >> > > > >> > >> > * from net_device drivers NAPI func end. > >> >> > > > >> > >> > +* > >> >> > > > >> > >> > +* Do the same with xdp_prog and flush_list > >> >> > > > >> > >> > since these fields > >> >> > > > >> > >> > +* are only ever modified together. > >> >> > > > >> > >> > */ > >> >> > > > >> > >> > - if (!bq->dev_rx) > >> >> > > > >> > >> > + if (!bq->dev_rx) { > >> >> > > > >> > >> > bq->dev_rx = dev_rx; > >> >> > > > >> > >> > + bq->xdp_prog = xdp_prog; > >> >> > > > >> > >> bp->xdp_prog is assigned here and could be used later in > >> >> > > > >> > >> bq_xmit_all(). > >> >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one > >> >> > > > >> > >> rcu_read_lock()? > >> >> > > > >> > >> It is not very obvious after taking a quick look at > >> >> > > > >> > >> xdp_do_flush[_map]. > >> >> > > > >> > >> > >> >> > > > >> > >> e.g. what if the devmap elem gets deleted. > >> >> > > > >> > > > >> >> > > > >> > > Jesper knows better than me. From my veiw, based on the > >> >> > > > >> > > description of > >> >> > > > >> > > __dev_flush(): > >> >> > > > >> > > > >> >> > > > >> > > On devmap tear down we ensure the fl
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
"Paul E. McKenney" writes: > On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen wrote: >> "Paul E. McKenney" writes: >> >> > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: >> >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: >> >> > On Thu, 15 Apr 2021 17:39:13 -0700 >> >> > Martin KaFai Lau wrote: >> >> > >> >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen >> >> > > wrote: >> >> > > > Jesper Dangaard Brouer writes: >> >> > > > >> >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 >> >> > > > > Martin KaFai Lau wrote: >> >> > > > > >> >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen >> >> > > > >> wrote: >> >> > > > >> > Hangbin Liu writes: >> >> > > > >> > >> >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau >> >> > > > >> > > wrote: >> >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, >> >> > > > >> > >> > u32 flags) >> >> > > > >> > >> > { >> >> > > > >> > >> > struct net_device *dev = bq->dev; >> >> > > > >> > >> > - int sent = 0, err = 0; >> >> > > > >> > >> > + int sent = 0, drops = 0, err = 0; >> >> > > > >> > >> > + unsigned int cnt = bq->count; >> >> > > > >> > >> > + int to_send = cnt; >> >> > > > >> > >> > int i; >> >> > > > >> > >> > >> >> > > > >> > >> > - if (unlikely(!bq->count)) >> >> > > > >> > >> > + if (unlikely(!cnt)) >> >> > > > >> > >> > return; >> >> > > > >> > >> > >> >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { >> >> > > > >> > >> > + for (i = 0; i < cnt; i++) { >> >> > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; >> >> > > > >> > >> > >> >> > > > >> > >> > prefetch(xdpf); >> >> > > > >> > >> > } >> >> > > > >> > >> > >> >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, >> >> > > > >> > >> > bq->q, flags); >> >> > > > >> > >> > + if (bq->xdp_prog) { >> >> > > > >> > >> bq->xdp_prog is used here >> >> > > > >> > >> >> >> > > > >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, >> >> > > > >> > >> > bq->q, cnt, dev); >> >> > > > >> > >> > + if (!to_send) >> >> > > > >> > >> > + goto out; >> >> > > > >> > >> > + >> >> > > > >> > >> > + drops = cnt - to_send; >> >> > > > >> > >> > + } >> >> > > > >> > >> > + >> >> > > > >> > >> >> >> > > > >> > >> [ ... ] >> >> > > > >> > >> >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct >> >> > > > >> > >> > xdp_frame *xdpf, >> >> > > > >> > >> > -struct net_device *dev_rx) >> >> > > > >> > >> > +struct net_device *dev_rx, struct >> >> > > > >> > >> > bpf_prog *xdp_prog) >> >> > > > >> > >> > { >> >> > > > >> > >> > struct list_head *flush_list = >> >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); >> >> > > > >> > >> > struct xdp_dev_bulk_queue *bq = >> >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); >> >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct >> >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, >> >> > > > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's >> >> > > > >> > >> > in >> >> > > > >> > >> >* bulk_queue, because bq stored per-CPU and must be >> >> > > > >> > >> > flushed >> >> > > > >> > >> >* from net_device drivers NAPI func end. >> >> > > > >> > >> > + * >> >> > > > >> > >> > + * Do the same with xdp_prog and flush_list since these >> >> > > > >> > >> > fields >> >> > > > >> > >> > + * are only ever modified together. >> >> > > > >> > >> >*/ >> >> > > > >> > >> > - if (!bq->dev_rx) >> >> > > > >> > >> > + if (!bq->dev_rx) { >> >> > > > >> > >> > bq->dev_rx = dev_rx; >> >> > > > >> > >> > + bq->xdp_prog = xdp_prog; >> >> > > > >> > >> bp->xdp_prog is assigned here and could be used later in >> >> > > > >> > >> bq_xmit_all(). >> >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one >> >> > > > >> > >> rcu_read_lock()? >> >> > > > >> > >> It is not very obvious after taking a quick look at >> >> > > > >> > >> xdp_do_flush[_map]. >> >> > > > >> > >> >> >> > > > >> > >> e.g. what if the devmap elem gets deleted. >> >> > > > >> > > >> >> > > > >> > > Jesper knows better than me. From my veiw, based on the >> >> > > > >> > > description of >> >> > > > >> > > __dev_flush(): >> >> > > > >> > > >> >> > > > >> > > On devmap tear down we ensure the flush list is empty before >> >> > > > >> > > completing to >> >> > > > >> > > ensure all flush operations have completed. When drivers >> >> > > > >> > > update the bpf >> >> > > > >> > > program they may need to ensure any flush ops are also >> >> > > > >> > > complete. >> >> > > > >> >> >> > > > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a >> >> > > > >> devmap's elem. >> >> > >> >> > The bq->xdp_prog comes form the devmap "dev" element, and it is stored >>
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Sat, Apr 17, 2021 at 02:27:19PM +0200, Toke Høiland-Jørgensen wrote: > "Paul E. McKenney" writes: > > > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: > >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: > >> > On Thu, 15 Apr 2021 17:39:13 -0700 > >> > Martin KaFai Lau wrote: > >> > > >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: > >> > > > Jesper Dangaard Brouer writes: > >> > > > > >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > >> > > > > Martin KaFai Lau wrote: > >> > > > > > >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen > >> > > > >> wrote: > >> > > > >> > Hangbin Liu writes: > >> > > > >> > > >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau > >> > > > >> > > wrote: > >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, > >> > > > >> > >> > u32 flags) > >> > > > >> > >> > { > >> > > > >> > >> >struct net_device *dev = bq->dev; > >> > > > >> > >> > - int sent = 0, err = 0; > >> > > > >> > >> > + int sent = 0, drops = 0, err = 0; > >> > > > >> > >> > + unsigned int cnt = bq->count; > >> > > > >> > >> > + int to_send = cnt; > >> > > > >> > >> >int i; > >> > > > >> > >> > > >> > > > >> > >> > - if (unlikely(!bq->count)) > >> > > > >> > >> > + if (unlikely(!cnt)) > >> > > > >> > >> >return; > >> > > > >> > >> > > >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { > >> > > > >> > >> > + for (i = 0; i < cnt; i++) { > >> > > > >> > >> >struct xdp_frame *xdpf = bq->q[i]; > >> > > > >> > >> > > >> > > > >> > >> >prefetch(xdpf); > >> > > > >> > >> >} > >> > > > >> > >> > > >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, > >> > > > >> > >> > bq->q, flags); > >> > > > >> > >> > + if (bq->xdp_prog) { > >> > > > >> > >> bq->xdp_prog is used here > >> > > > >> > >> > >> > > > >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, > >> > > > >> > >> > bq->q, cnt, dev); > >> > > > >> > >> > + if (!to_send) > >> > > > >> > >> > + goto out; > >> > > > >> > >> > + > >> > > > >> > >> > + drops = cnt - to_send; > >> > > > >> > >> > + } > >> > > > >> > >> > + > >> > > > >> > >> > >> > > > >> > >> [ ... ] > >> > > > >> > >> > >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct > >> > > > >> > >> > xdp_frame *xdpf, > >> > > > >> > >> > - struct net_device *dev_rx) > >> > > > >> > >> > + struct net_device *dev_rx, struct > >> > > > >> > >> > bpf_prog *xdp_prog) > >> > > > >> > >> > { > >> > > > >> > >> >struct list_head *flush_list = > >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); > >> > > > >> > >> >struct xdp_dev_bulk_queue *bq = > >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); > >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct > >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, > >> > > > >> > >> >/* Ingress dev_rx will be the same for all xdp_frame's > >> > > > >> > >> > in > >> > > > >> > >> > * bulk_queue, because bq stored per-CPU and must be > >> > > > >> > >> > flushed > >> > > > >> > >> > * from net_device drivers NAPI func end. > >> > > > >> > >> > + * > >> > > > >> > >> > + * Do the same with xdp_prog and flush_list since these > >> > > > >> > >> > fields > >> > > > >> > >> > + * are only ever modified together. > >> > > > >> > >> > */ > >> > > > >> > >> > - if (!bq->dev_rx) > >> > > > >> > >> > + if (!bq->dev_rx) { > >> > > > >> > >> >bq->dev_rx = dev_rx; > >> > > > >> > >> > + bq->xdp_prog = xdp_prog; > >> > > > >> > >> bp->xdp_prog is assigned here and could be used later in > >> > > > >> > >> bq_xmit_all(). > >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one > >> > > > >> > >> rcu_read_lock()? > >> > > > >> > >> It is not very obvious after taking a quick look at > >> > > > >> > >> xdp_do_flush[_map]. > >> > > > >> > >> > >> > > > >> > >> e.g. what if the devmap elem gets deleted. > >> > > > >> > > > >> > > > >> > > Jesper knows better than me. From my veiw, based on the > >> > > > >> > > description of > >> > > > >> > > __dev_flush(): > >> > > > >> > > > >> > > > >> > > On devmap tear down we ensure the flush list is empty before > >> > > > >> > > completing to > >> > > > >> > > ensure all flush operations have completed. When drivers > >> > > > >> > > update the bpf > >> > > > >> > > program they may need to ensure any flush ops are also > >> > > > >> > > complete. > >> > > > >> > >> > > > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a > >> > > > >> devmap's elem. > >> > > >> > The bq->xdp_prog comes form the devmap "dev" element, and it is stored > >> > in temporarily in the "bq" structure that is only valid for this > >> > softirq NAPI-cycle. I'm slightly w
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
"Paul E. McKenney" writes: > On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: >> On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: >> > On Thu, 15 Apr 2021 17:39:13 -0700 >> > Martin KaFai Lau wrote: >> > >> > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: >> > > > Jesper Dangaard Brouer writes: >> > > > >> > > > > On Thu, 15 Apr 2021 10:35:51 -0700 >> > > > > Martin KaFai Lau wrote: >> > > > > >> > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen >> > > > >> wrote: >> > > > >> > Hangbin Liu writes: >> > > > >> > >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau >> > > > >> > > wrote: >> > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 >> > > > >> > >> > flags) >> > > > >> > >> > { >> > > > >> > >> > struct net_device *dev = bq->dev; >> > > > >> > >> > -int sent = 0, err = 0; >> > > > >> > >> > +int sent = 0, drops = 0, err = 0; >> > > > >> > >> > +unsigned int cnt = bq->count; >> > > > >> > >> > +int to_send = cnt; >> > > > >> > >> > int i; >> > > > >> > >> > >> > > > >> > >> > -if (unlikely(!bq->count)) >> > > > >> > >> > +if (unlikely(!cnt)) >> > > > >> > >> > return; >> > > > >> > >> > >> > > > >> > >> > -for (i = 0; i < bq->count; i++) { >> > > > >> > >> > +for (i = 0; i < cnt; i++) { >> > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; >> > > > >> > >> > >> > > > >> > >> > prefetch(xdpf); >> > > > >> > >> > } >> > > > >> > >> > >> > > > >> > >> > -sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, >> > > > >> > >> > bq->q, flags); >> > > > >> > >> > +if (bq->xdp_prog) { >> > > > >> > >> bq->xdp_prog is used here >> > > > >> > >> >> > > > >> > >> > +to_send = dev_map_bpf_prog_run(bq->xdp_prog, >> > > > >> > >> > bq->q, cnt, dev); >> > > > >> > >> > +if (!to_send) >> > > > >> > >> > +goto out; >> > > > >> > >> > + >> > > > >> > >> > +drops = cnt - to_send; >> > > > >> > >> > +} >> > > > >> > >> > + >> > > > >> > >> >> > > > >> > >> [ ... ] >> > > > >> > >> >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct >> > > > >> > >> > xdp_frame *xdpf, >> > > > >> > >> > - struct net_device *dev_rx) >> > > > >> > >> > + struct net_device *dev_rx, struct >> > > > >> > >> > bpf_prog *xdp_prog) >> > > > >> > >> > { >> > > > >> > >> > struct list_head *flush_list = >> > > > >> > >> > this_cpu_ptr(&dev_flush_list); >> > > > >> > >> > struct xdp_dev_bulk_queue *bq = >> > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); >> > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct >> > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, >> > > > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's >> > > > >> > >> > in >> > > > >> > >> > * bulk_queue, because bq stored per-CPU and must be >> > > > >> > >> > flushed >> > > > >> > >> > * from net_device drivers NAPI func end. >> > > > >> > >> > + * >> > > > >> > >> > + * Do the same with xdp_prog and flush_list since these >> > > > >> > >> > fields >> > > > >> > >> > + * are only ever modified together. >> > > > >> > >> > */ >> > > > >> > >> > -if (!bq->dev_rx) >> > > > >> > >> > +if (!bq->dev_rx) { >> > > > >> > >> > bq->dev_rx = dev_rx; >> > > > >> > >> > +bq->xdp_prog = xdp_prog; >> > > > >> > >> bp->xdp_prog is assigned here and could be used later in >> > > > >> > >> bq_xmit_all(). >> > > > >> > >> How is bq->xdp_prog protected? Are they all under one >> > > > >> > >> rcu_read_lock()? >> > > > >> > >> It is not very obvious after taking a quick look at >> > > > >> > >> xdp_do_flush[_map]. >> > > > >> > >> >> > > > >> > >> e.g. what if the devmap elem gets deleted. >> > > > >> > > >> > > > >> > > Jesper knows better than me. From my veiw, based on the >> > > > >> > > description of >> > > > >> > > __dev_flush(): >> > > > >> > > >> > > > >> > > On devmap tear down we ensure the flush list is empty before >> > > > >> > > completing to >> > > > >> > > ensure all flush operations have completed. When drivers update >> > > > >> > > the bpf >> > > > >> > > program they may need to ensure any flush ops are also >> > > > >> > > complete. >> > > > >> >> > > > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's >> > > > >> elem. >> > >> > The bq->xdp_prog comes form the devmap "dev" element, and it is stored >> > in temporarily in the "bq" structure that is only valid for this >> > softirq NAPI-cycle. I'm slightly worried that we copied this pointer >> > the the xdp_prog here, more below (and Q for Paul). >> > >> > > > >> > >> > > > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll >> > > > >> > loop, >>
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Fri, Apr 16, 2021 at 11:22:52AM -0700, Martin KaFai Lau wrote: > On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: > > On Thu, 15 Apr 2021 17:39:13 -0700 > > Martin KaFai Lau wrote: > > > > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: > > > > Jesper Dangaard Brouer writes: > > > > > > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > > > > > Martin KaFai Lau wrote: > > > > > > > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen > > > > >> wrote: > > > > >> > Hangbin Liu writes: > > > > >> > > > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau > > > > >> > > wrote: > > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 > > > > >> > >> > flags) > > > > >> > >> > { > > > > >> > >> > struct net_device *dev = bq->dev; > > > > >> > >> > - int sent = 0, err = 0; > > > > >> > >> > + int sent = 0, drops = 0, err = 0; > > > > >> > >> > + unsigned int cnt = bq->count; > > > > >> > >> > + int to_send = cnt; > > > > >> > >> > int i; > > > > >> > >> > > > > > >> > >> > - if (unlikely(!bq->count)) > > > > >> > >> > + if (unlikely(!cnt)) > > > > >> > >> > return; > > > > >> > >> > > > > > >> > >> > - for (i = 0; i < bq->count; i++) { > > > > >> > >> > + for (i = 0; i < cnt; i++) { > > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > > > > >> > >> > > > > > >> > >> > prefetch(xdpf); > > > > >> > >> > } > > > > >> > >> > > > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, > > > > >> > >> > bq->q, flags); > > > > >> > >> > + if (bq->xdp_prog) { > > > > >> > >> bq->xdp_prog is used here > > > > >> > >> > > > > >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, > > > > >> > >> > bq->q, cnt, dev); > > > > >> > >> > + if (!to_send) > > > > >> > >> > + goto out; > > > > >> > >> > + > > > > >> > >> > + drops = cnt - to_send; > > > > >> > >> > + } > > > > >> > >> > + > > > > >> > >> > > > > >> > >> [ ... ] > > > > >> > >> > > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct > > > > >> > >> > xdp_frame *xdpf, > > > > >> > >> > -struct net_device *dev_rx) > > > > >> > >> > +struct net_device *dev_rx, struct > > > > >> > >> > bpf_prog *xdp_prog) > > > > >> > >> > { > > > > >> > >> > struct list_head *flush_list = > > > > >> > >> > this_cpu_ptr(&dev_flush_list); > > > > >> > >> > struct xdp_dev_bulk_queue *bq = > > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); > > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct > > > > >> > >> > net_device *dev, struct xdp_frame *xdpf, > > > > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's > > > > >> > >> > in > > > > >> > >> >* bulk_queue, because bq stored per-CPU and must be > > > > >> > >> > flushed > > > > >> > >> >* from net_device drivers NAPI func end. > > > > >> > >> > + * > > > > >> > >> > + * Do the same with xdp_prog and flush_list since these > > > > >> > >> > fields > > > > >> > >> > + * are only ever modified together. > > > > >> > >> >*/ > > > > >> > >> > - if (!bq->dev_rx) > > > > >> > >> > + if (!bq->dev_rx) { > > > > >> > >> > bq->dev_rx = dev_rx; > > > > >> > >> > + bq->xdp_prog = xdp_prog; > > > > >> > >> bp->xdp_prog is assigned here and could be used later in > > > > >> > >> bq_xmit_all(). > > > > >> > >> How is bq->xdp_prog protected? Are they all under one > > > > >> > >> rcu_read_lock()? > > > > >> > >> It is not very obvious after taking a quick look at > > > > >> > >> xdp_do_flush[_map]. > > > > >> > >> > > > > >> > >> e.g. what if the devmap elem gets deleted. > > > > >> > > > > > > >> > > Jesper knows better than me. From my veiw, based on the > > > > >> > > description of > > > > >> > > __dev_flush(): > > > > >> > > > > > > >> > > On devmap tear down we ensure the flush list is empty before > > > > >> > > completing to > > > > >> > > ensure all flush operations have completed. When drivers update > > > > >> > > the bpf > > > > >> > > program they may need to ensure any flush ops are also complete. > > > > >> > > > > > > >> > > > > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's > > > > >> elem. > > > > The bq->xdp_prog comes form the devmap "dev" element, and it is stored > > in temporarily in the "bq" structure that is only valid for this > > softirq NAPI-cycle. I'm slightly worried that we copied this pointer > > the the xdp_prog here, more below (and Q for Paul). > > > > > > >> > > > > > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll > > > > >> > loop, > > > > >> > which also runs under one big rcu_read_lock(). So the storage in > > > > >> > the > > > > >> >
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Fri, Apr 16, 2021 at 03:45:23PM +0200, Jesper Dangaard Brouer wrote: > On Thu, 15 Apr 2021 17:39:13 -0700 > Martin KaFai Lau wrote: > > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: > > > Jesper Dangaard Brouer writes: > > > > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > > > > Martin KaFai Lau wrote: > > > > > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen > > > >> wrote: > > > >> > Hangbin Liu writes: > > > >> > > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > > > >> > > > > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 > > > >> > >> > flags) > > > >> > >> > { > > > >> > >> > struct net_device *dev = bq->dev; > > > >> > >> > - int sent = 0, err = 0; > > > >> > >> > + int sent = 0, drops = 0, err = 0; > > > >> > >> > + unsigned int cnt = bq->count; > > > >> > >> > + int to_send = cnt; > > > >> > >> > int i; > > > >> > >> > > > > >> > >> > - if (unlikely(!bq->count)) > > > >> > >> > + if (unlikely(!cnt)) > > > >> > >> > return; > > > >> > >> > > > > >> > >> > - for (i = 0; i < bq->count; i++) { > > > >> > >> > + for (i = 0; i < cnt; i++) { > > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > > > >> > >> > > > > >> > >> > prefetch(xdpf); > > > >> > >> > } > > > >> > >> > > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, > > > >> > >> > bq->q, flags); > > > >> > >> > + if (bq->xdp_prog) { > > > >> > >> bq->xdp_prog is used here > > > >> > >> > > > >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, > > > >> > >> > bq->q, cnt, dev); > > > >> > >> > + if (!to_send) > > > >> > >> > + goto out; > > > >> > >> > + > > > >> > >> > + drops = cnt - to_send; > > > >> > >> > + } > > > >> > >> > + > > > >> > >> > > > >> > >> [ ... ] > > > >> > >> > > > >> > >> > static void bq_enqueue(struct net_device *dev, struct > > > >> > >> > xdp_frame *xdpf, > > > >> > >> > - struct net_device *dev_rx) > > > >> > >> > + struct net_device *dev_rx, struct > > > >> > >> > bpf_prog *xdp_prog) > > > >> > >> > { > > > >> > >> > struct list_head *flush_list = > > > >> > >> > this_cpu_ptr(&dev_flush_list); > > > >> > >> > struct xdp_dev_bulk_queue *bq = > > > >> > >> > this_cpu_ptr(dev->xdp_bulkq); > > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device > > > >> > >> > *dev, struct xdp_frame *xdpf, > > > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's > > > >> > >> > in > > > >> > >> > * bulk_queue, because bq stored per-CPU and must be > > > >> > >> > flushed > > > >> > >> > * from net_device drivers NAPI func end. > > > >> > >> > +* > > > >> > >> > +* Do the same with xdp_prog and flush_list since these > > > >> > >> > fields > > > >> > >> > +* are only ever modified together. > > > >> > >> > */ > > > >> > >> > - if (!bq->dev_rx) > > > >> > >> > + if (!bq->dev_rx) { > > > >> > >> > bq->dev_rx = dev_rx; > > > >> > >> > + bq->xdp_prog = xdp_prog; > > > >> > >> bp->xdp_prog is assigned here and could be used later in > > > >> > >> bq_xmit_all(). > > > >> > >> How is bq->xdp_prog protected? Are they all under one > > > >> > >> rcu_read_lock()? > > > >> > >> It is not very obvious after taking a quick look at > > > >> > >> xdp_do_flush[_map]. > > > >> > >> > > > >> > >> e.g. what if the devmap elem gets deleted. > > > >> > > > > > >> > > Jesper knows better than me. From my veiw, based on the > > > >> > > description of > > > >> > > __dev_flush(): > > > >> > > > > > >> > > On devmap tear down we ensure the flush list is empty before > > > >> > > completing to > > > >> > > ensure all flush operations have completed. When drivers update > > > >> > > the bpf > > > >> > > program they may need to ensure any flush ops are also complete. > > > >> > > > > > >> > > > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's > > > >> elem. > > The bq->xdp_prog comes form the devmap "dev" element, and it is stored > in temporarily in the "bq" structure that is only valid for this > softirq NAPI-cycle. I'm slightly worried that we copied this pointer > the the xdp_prog here, more below (and Q for Paul). > > > > >> > > > > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll > > > >> > loop, > > > >> > which also runs under one big rcu_read_lock(). So the storage in the > > > >> > bulk queue is quite temporary, it's just used for bulking to increase > > > >> > performance :) > > > >> > > > >> I am missing the one big rcu_read_lock() part. For example, in > > > >> i40e_txrx.c, > > > >> i40e_run_xdp() ha
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Fri, Apr 16, 2021 at 12:03:41PM +0200, Toke Høiland-Jørgensen wrote: > Martin KaFai Lau writes: > > > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: > >> Jesper Dangaard Brouer writes: > >> > >> > On Thu, 15 Apr 2021 10:35:51 -0700 > >> > Martin KaFai Lau wrote: > >> > > >> >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: > >> >> > Hangbin Liu writes: > >> >> > > >> >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > >> >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 > >> >> > >> > flags) > >> >> > >> > { > >> >> > >> > struct net_device *dev = bq->dev; > >> >> > >> > -int sent = 0, err = 0; > >> >> > >> > +int sent = 0, drops = 0, err = 0; > >> >> > >> > +unsigned int cnt = bq->count; > >> >> > >> > +int to_send = cnt; > >> >> > >> > int i; > >> >> > >> > > >> >> > >> > -if (unlikely(!bq->count)) > >> >> > >> > +if (unlikely(!cnt)) > >> >> > >> > return; > >> >> > >> > > >> >> > >> > -for (i = 0; i < bq->count; i++) { > >> >> > >> > +for (i = 0; i < cnt; i++) { > >> >> > >> > struct xdp_frame *xdpf = bq->q[i]; > >> >> > >> > > >> >> > >> > prefetch(xdpf); > >> >> > >> > } > >> >> > >> > > >> >> > >> > -sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, > >> >> > >> > bq->q, flags); > >> >> > >> > +if (bq->xdp_prog) { > >> >> > >> bq->xdp_prog is used here > >> >> > >> > >> >> > >> > +to_send = dev_map_bpf_prog_run(bq->xdp_prog, > >> >> > >> > bq->q, cnt, dev); > >> >> > >> > +if (!to_send) > >> >> > >> > +goto out; > >> >> > >> > + > >> >> > >> > +drops = cnt - to_send; > >> >> > >> > +} > >> >> > >> > + > >> >> > >> > >> >> > >> [ ... ] > >> >> > >> > >> >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame > >> >> > >> > *xdpf, > >> >> > >> > - struct net_device *dev_rx) > >> >> > >> > + struct net_device *dev_rx, struct > >> >> > >> > bpf_prog *xdp_prog) > >> >> > >> > { > >> >> > >> > struct list_head *flush_list = > >> >> > >> > this_cpu_ptr(&dev_flush_list); > >> >> > >> > struct xdp_dev_bulk_queue *bq = > >> >> > >> > this_cpu_ptr(dev->xdp_bulkq); > >> >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device > >> >> > >> > *dev, struct xdp_frame *xdpf, > >> >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's > >> >> > >> > in > >> >> > >> > * bulk_queue, because bq stored per-CPU and must be > >> >> > >> > flushed > >> >> > >> > * from net_device drivers NAPI func end. > >> >> > >> > + * > >> >> > >> > + * Do the same with xdp_prog and flush_list since these > >> >> > >> > fields > >> >> > >> > + * are only ever modified together. > >> >> > >> > */ > >> >> > >> > -if (!bq->dev_rx) > >> >> > >> > +if (!bq->dev_rx) { > >> >> > >> > bq->dev_rx = dev_rx; > >> >> > >> > +bq->xdp_prog = xdp_prog; > >> >> > >> bp->xdp_prog is assigned here and could be used later in > >> >> > >> bq_xmit_all(). > >> >> > >> How is bq->xdp_prog protected? Are they all under one > >> >> > >> rcu_read_lock()? > >> >> > >> It is not very obvious after taking a quick look at > >> >> > >> xdp_do_flush[_map]. > >> >> > >> > >> >> > >> e.g. what if the devmap elem gets deleted. > >> >> > > > >> >> > > Jesper knows better than me. From my veiw, based on the description > >> >> > > of > >> >> > > __dev_flush(): > >> >> > > > >> >> > > On devmap tear down we ensure the flush list is empty before > >> >> > > completing to > >> >> > > ensure all flush operations have completed. When drivers update the > >> >> > > bpf > >> >> > > program they may need to ensure any flush ops are also complete. > >> >> > >> >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's > >> >> elem. > >> >> > >> >> > > >> >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, > >> >> > which also runs under one big rcu_read_lock(). So the storage in the > >> >> > bulk queue is quite temporary, it's just used for bulking to increase > >> >> > performance :) > >> >> > >> >> I am missing the one big rcu_read_lock() part. For example, in > >> >> i40e_txrx.c, > >> >> i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used > >> >> to run > >> >> in i40e_run_xdp() and it is fine. > >> >> > >> >> In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the > >> >> rcu_read_unlock() has already done. It is now run in > >> >> xdp_do_flush_map(). > >> >> or I missed the big rcu_read_lock() in i40e_napi_poll()? > >> >> > >> >> I do see the big rcu_read_lock() in mlx5e_napi_poll(). > >> > > >> >
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
Jesper Dangaard Brouer writes: > On Thu, 15 Apr 2021 17:39:13 -0700 > Martin KaFai Lau wrote: > >> On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: >> > Jesper Dangaard Brouer writes: >> > >> > > On Thu, 15 Apr 2021 10:35:51 -0700 >> > > Martin KaFai Lau wrote: >> > > >> > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: >> > >> >> > >> > Hangbin Liu writes: >> > >> > >> > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: >> > >> > > >> > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 >> > >> > >> > flags) >> > >> > >> > { >> > >> > >> > struct net_device *dev = bq->dev; >> > >> > >> > -int sent = 0, err = 0; >> > >> > >> > +int sent = 0, drops = 0, err = 0; >> > >> > >> > +unsigned int cnt = bq->count; >> > >> > >> > +int to_send = cnt; >> > >> > >> > int i; >> > >> > >> > >> > >> > >> > -if (unlikely(!bq->count)) >> > >> > >> > +if (unlikely(!cnt)) >> > >> > >> > return; >> > >> > >> > >> > >> > >> > -for (i = 0; i < bq->count; i++) { >> > >> > >> > +for (i = 0; i < cnt; i++) { >> > >> > >> > struct xdp_frame *xdpf = bq->q[i]; >> > >> > >> > >> > >> > >> > prefetch(xdpf); >> > >> > >> > } >> > >> > >> > >> > >> > >> > -sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, >> > >> > >> > bq->q, flags); >> > >> > >> > +if (bq->xdp_prog) { >> > >> > >> bq->xdp_prog is used here >> > >> > >> >> > >> > >> > +to_send = dev_map_bpf_prog_run(bq->xdp_prog, >> > >> > >> > bq->q, cnt, dev); >> > >> > >> > +if (!to_send) >> > >> > >> > +goto out; >> > >> > >> > + >> > >> > >> > +drops = cnt - to_send; >> > >> > >> > +} >> > >> > >> > + >> > >> > >> >> > >> > >> [ ... ] >> > >> > >> >> > >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame >> > >> > >> > *xdpf, >> > >> > >> > - struct net_device *dev_rx) >> > >> > >> > + struct net_device *dev_rx, struct >> > >> > >> > bpf_prog *xdp_prog) >> > >> > >> > { >> > >> > >> > struct list_head *flush_list = >> > >> > >> > this_cpu_ptr(&dev_flush_list); >> > >> > >> > struct xdp_dev_bulk_queue *bq = >> > >> > >> > this_cpu_ptr(dev->xdp_bulkq); >> > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device >> > >> > >> > *dev, struct xdp_frame *xdpf, >> > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's >> > >> > >> > in >> > >> > >> > * bulk_queue, because bq stored per-CPU and must be >> > >> > >> > flushed >> > >> > >> > * from net_device drivers NAPI func end. >> > >> > >> > + * >> > >> > >> > + * Do the same with xdp_prog and flush_list since these >> > >> > >> > fields >> > >> > >> > + * are only ever modified together. >> > >> > >> > */ >> > >> > >> > -if (!bq->dev_rx) >> > >> > >> > +if (!bq->dev_rx) { >> > >> > >> > bq->dev_rx = dev_rx; >> > >> > >> > +bq->xdp_prog = xdp_prog; >> > >> > >> bp->xdp_prog is assigned here and could be used later in >> > >> > >> bq_xmit_all(). >> > >> > >> How is bq->xdp_prog protected? Are they all under one >> > >> > >> rcu_read_lock()? >> > >> > >> It is not very obvious after taking a quick look at >> > >> > >> xdp_do_flush[_map]. >> > >> > >> >> > >> > >> e.g. what if the devmap elem gets deleted. >> > >> > > >> > >> > > Jesper knows better than me. From my veiw, based on the description >> > >> > > of >> > >> > > __dev_flush(): >> > >> > > >> > >> > > On devmap tear down we ensure the flush list is empty before >> > >> > > completing to >> > >> > > ensure all flush operations have completed. When drivers update the >> > >> > > bpf >> > >> > > program they may need to ensure any flush ops are also complete. >> > >> >> > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's >> > >> elem. > > The bq->xdp_prog comes form the devmap "dev" element, and it is stored > in temporarily in the "bq" structure that is only valid for this > softirq NAPI-cycle. I'm slightly worried that we copied this pointer > the the xdp_prog here, more below (and Q for Paul). > >> > >> > >> > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, >> > >> > which also runs under one big rcu_read_lock(). So the storage in the >> > >> > bulk queue is quite temporary, it's just used for bulking to increase >> > >> > performance :) >> > >> >> > >> I am missing the one big rcu_read_lock() part. For example, in >> > >> i40e_txrx.c, >> > >> i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used >> > >> to run >> > >> in i40e_run_xdp() and it is fine. >> > >> >> > >> In this
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Thu, 15 Apr 2021 17:39:13 -0700 Martin KaFai Lau wrote: > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: > > Jesper Dangaard Brouer writes: > > > > > On Thu, 15 Apr 2021 10:35:51 -0700 > > > Martin KaFai Lau wrote: > > > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: > > >> > Hangbin Liu writes: > > >> > > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > > >> > >> > { > > >> > >> > struct net_device *dev = bq->dev; > > >> > >> > - int sent = 0, err = 0; > > >> > >> > + int sent = 0, drops = 0, err = 0; > > >> > >> > + unsigned int cnt = bq->count; > > >> > >> > + int to_send = cnt; > > >> > >> > int i; > > >> > >> > > > >> > >> > - if (unlikely(!bq->count)) > > >> > >> > + if (unlikely(!cnt)) > > >> > >> > return; > > >> > >> > > > >> > >> > - for (i = 0; i < bq->count; i++) { > > >> > >> > + for (i = 0; i < cnt; i++) { > > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > > >> > >> > > > >> > >> > prefetch(xdpf); > > >> > >> > } > > >> > >> > > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, > > >> > >> > flags); > > >> > >> > + if (bq->xdp_prog) { > > >> > >> bq->xdp_prog is used here > > >> > >> > > >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, > > >> > >> > cnt, dev); > > >> > >> > + if (!to_send) > > >> > >> > + goto out; > > >> > >> > + > > >> > >> > + drops = cnt - to_send; > > >> > >> > + } > > >> > >> > + > > >> > >> > > >> > >> [ ... ] > > >> > >> > > >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame > > >> > >> > *xdpf, > > >> > >> > -struct net_device *dev_rx) > > >> > >> > +struct net_device *dev_rx, struct bpf_prog > > >> > >> > *xdp_prog) > > >> > >> > { > > >> > >> > struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > > >> > >> > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device > > >> > >> > *dev, struct xdp_frame *xdpf, > > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's in > > >> > >> >* bulk_queue, because bq stored per-CPU and must be flushed > > >> > >> >* from net_device drivers NAPI func end. > > >> > >> > + * > > >> > >> > + * Do the same with xdp_prog and flush_list since these fields > > >> > >> > + * are only ever modified together. > > >> > >> >*/ > > >> > >> > - if (!bq->dev_rx) > > >> > >> > + if (!bq->dev_rx) { > > >> > >> > bq->dev_rx = dev_rx; > > >> > >> > + bq->xdp_prog = xdp_prog; > > >> > >> bp->xdp_prog is assigned here and could be used later in > > >> > >> bq_xmit_all(). > > >> > >> How is bq->xdp_prog protected? Are they all under one > > >> > >> rcu_read_lock()? > > >> > >> It is not very obvious after taking a quick look at > > >> > >> xdp_do_flush[_map]. > > >> > >> > > >> > >> e.g. what if the devmap elem gets deleted. > > >> > > > > >> > > Jesper knows better than me. From my veiw, based on the description > > >> > > of > > >> > > __dev_flush(): > > >> > > > > >> > > On devmap tear down we ensure the flush list is empty before > > >> > > completing to > > >> > > ensure all flush operations have completed. When drivers update the > > >> > > bpf > > >> > > program they may need to ensure any flush ops are also complete. > > >> > > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's elem. The bq->xdp_prog comes form the devmap "dev" element, and it is stored in temporarily in the "bq" structure that is only valid for this softirq NAPI-cycle. I'm slightly worried that we copied this pointer the the xdp_prog here, more below (and Q for Paul). > > >> > > > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, > > >> > which also runs under one big rcu_read_lock(). So the storage in the > > >> > bulk queue is quite temporary, it's just used for bulking to increase > > >> > performance :) > > >> > > >> I am missing the one big rcu_read_lock() part. For example, in > > >> i40e_txrx.c, > > >> i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used > > >> to run > > >> in i40e_run_xdp() and it is fine. > > >> > > >> In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the > > >> rcu_read_unlock() has already done. It is now run in xdp_do_flush_map(). > > >> or I missed the big rcu_read_lock() in i40e_napi_poll()? > > >> > > >> I do see the big rcu_read_lock() in mlx5e_napi_poll(). > > > > > > I believed/assumed xdp_do_flush_map() was already protected under an > > > rcu_read_lock. As the devmap and cpumap, which get called via > > > __dev_flush() and __cpu_map_flush(), have multiple RCU objects that we > > > are operating
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
Martin KaFai Lau writes: > On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: >> Jesper Dangaard Brouer writes: >> >> > On Thu, 15 Apr 2021 10:35:51 -0700 >> > Martin KaFai Lau wrote: >> > >> >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: >> >> > Hangbin Liu writes: >> >> > >> >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: >> >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) >> >> > >> > { >> >> > >> >struct net_device *dev = bq->dev; >> >> > >> > - int sent = 0, err = 0; >> >> > >> > + int sent = 0, drops = 0, err = 0; >> >> > >> > + unsigned int cnt = bq->count; >> >> > >> > + int to_send = cnt; >> >> > >> >int i; >> >> > >> > >> >> > >> > - if (unlikely(!bq->count)) >> >> > >> > + if (unlikely(!cnt)) >> >> > >> >return; >> >> > >> > >> >> > >> > - for (i = 0; i < bq->count; i++) { >> >> > >> > + for (i = 0; i < cnt; i++) { >> >> > >> >struct xdp_frame *xdpf = bq->q[i]; >> >> > >> > >> >> > >> >prefetch(xdpf); >> >> > >> >} >> >> > >> > >> >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, >> >> > >> > flags); >> >> > >> > + if (bq->xdp_prog) { >> >> > >> bq->xdp_prog is used here >> >> > >> >> >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, >> >> > >> > cnt, dev); >> >> > >> > + if (!to_send) >> >> > >> > + goto out; >> >> > >> > + >> >> > >> > + drops = cnt - to_send; >> >> > >> > + } >> >> > >> > + >> >> > >> >> >> > >> [ ... ] >> >> > >> >> >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame >> >> > >> > *xdpf, >> >> > >> > - struct net_device *dev_rx) >> >> > >> > + struct net_device *dev_rx, struct bpf_prog >> >> > >> > *xdp_prog) >> >> > >> > { >> >> > >> >struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); >> >> > >> >struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); >> >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device >> >> > >> > *dev, struct xdp_frame *xdpf, >> >> > >> >/* Ingress dev_rx will be the same for all xdp_frame's in >> >> > >> > * bulk_queue, because bq stored per-CPU and must be flushed >> >> > >> > * from net_device drivers NAPI func end. >> >> > >> > + * >> >> > >> > + * Do the same with xdp_prog and flush_list since these fields >> >> > >> > + * are only ever modified together. >> >> > >> > */ >> >> > >> > - if (!bq->dev_rx) >> >> > >> > + if (!bq->dev_rx) { >> >> > >> >bq->dev_rx = dev_rx; >> >> > >> > + bq->xdp_prog = xdp_prog; >> >> > >> bp->xdp_prog is assigned here and could be used later in >> >> > >> bq_xmit_all(). >> >> > >> How is bq->xdp_prog protected? Are they all under one >> >> > >> rcu_read_lock()? >> >> > >> It is not very obvious after taking a quick look at >> >> > >> xdp_do_flush[_map]. >> >> > >> >> >> > >> e.g. what if the devmap elem gets deleted. >> >> > > >> >> > > Jesper knows better than me. From my veiw, based on the description of >> >> > > __dev_flush(): >> >> > > >> >> > > On devmap tear down we ensure the flush list is empty before >> >> > > completing to >> >> > > ensure all flush operations have completed. When drivers update the >> >> > > bpf >> >> > > program they may need to ensure any flush ops are also complete. >> >> >> >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's elem. >> >> >> >> > >> >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, >> >> > which also runs under one big rcu_read_lock(). So the storage in the >> >> > bulk queue is quite temporary, it's just used for bulking to increase >> >> > performance :) >> >> >> >> I am missing the one big rcu_read_lock() part. For example, in >> >> i40e_txrx.c, >> >> i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used to >> >> run >> >> in i40e_run_xdp() and it is fine. >> >> >> >> In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the >> >> rcu_read_unlock() has already done. It is now run in xdp_do_flush_map(). >> >> or I missed the big rcu_read_lock() in i40e_napi_poll()? >> >> >> >> I do see the big rcu_read_lock() in mlx5e_napi_poll(). >> > >> > I believed/assumed xdp_do_flush_map() was already protected under an >> > rcu_read_lock. As the devmap and cpumap, which get called via >> > __dev_flush() and __cpu_map_flush(), have multiple RCU objects that we >> > are operating on. > What other rcu objects it is using during flush? The bq_enqueue() function in cpumap.c puts the 'bq' pointer onto the flush_list, and 'bq' lives inside struct bpf_cpu_map_entry, so that's a reference to the map entry as well. The devmap function used to work the same way, until we changed it in 75ccae62cb8d ("xdp: Move devmap bulk queue into struct net_device"). >> > Perhaps it is a bug
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Thu, Apr 15, 2021 at 10:29:40PM +0200, Toke Høiland-Jørgensen wrote: > Jesper Dangaard Brouer writes: > > > On Thu, 15 Apr 2021 10:35:51 -0700 > > Martin KaFai Lau wrote: > > > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: > >> > Hangbin Liu writes: > >> > > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > >> > >> > { > >> > >> > struct net_device *dev = bq->dev; > >> > >> > - int sent = 0, err = 0; > >> > >> > + int sent = 0, drops = 0, err = 0; > >> > >> > + unsigned int cnt = bq->count; > >> > >> > + int to_send = cnt; > >> > >> > int i; > >> > >> > > >> > >> > - if (unlikely(!bq->count)) > >> > >> > + if (unlikely(!cnt)) > >> > >> > return; > >> > >> > > >> > >> > - for (i = 0; i < bq->count; i++) { > >> > >> > + for (i = 0; i < cnt; i++) { > >> > >> > struct xdp_frame *xdpf = bq->q[i]; > >> > >> > > >> > >> > prefetch(xdpf); > >> > >> > } > >> > >> > > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, > >> > >> > flags); > >> > >> > + if (bq->xdp_prog) { > >> > >> bq->xdp_prog is used here > >> > >> > >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, > >> > >> > cnt, dev); > >> > >> > + if (!to_send) > >> > >> > + goto out; > >> > >> > + > >> > >> > + drops = cnt - to_send; > >> > >> > + } > >> > >> > + > >> > >> > >> > >> [ ... ] > >> > >> > >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame > >> > >> > *xdpf, > >> > >> > - struct net_device *dev_rx) > >> > >> > + struct net_device *dev_rx, struct bpf_prog > >> > >> > *xdp_prog) > >> > >> > { > >> > >> > struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > >> > >> > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device > >> > >> > *dev, struct xdp_frame *xdpf, > >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's in > >> > >> > * bulk_queue, because bq stored per-CPU and must be flushed > >> > >> > * from net_device drivers NAPI func end. > >> > >> > +* > >> > >> > +* Do the same with xdp_prog and flush_list since these fields > >> > >> > +* are only ever modified together. > >> > >> > */ > >> > >> > - if (!bq->dev_rx) > >> > >> > + if (!bq->dev_rx) { > >> > >> > bq->dev_rx = dev_rx; > >> > >> > + bq->xdp_prog = xdp_prog; > >> > >> bp->xdp_prog is assigned here and could be used later in > >> > >> bq_xmit_all(). > >> > >> How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? > >> > >> It is not very obvious after taking a quick look at > >> > >> xdp_do_flush[_map]. > >> > >> > >> > >> e.g. what if the devmap elem gets deleted. > >> > > > >> > > Jesper knows better than me. From my veiw, based on the description of > >> > > __dev_flush(): > >> > > > >> > > On devmap tear down we ensure the flush list is empty before > >> > > completing to > >> > > ensure all flush operations have completed. When drivers update the bpf > >> > > program they may need to ensure any flush ops are also complete. > >> > >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's elem. > >> > >> > > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, > >> > which also runs under one big rcu_read_lock(). So the storage in the > >> > bulk queue is quite temporary, it's just used for bulking to increase > >> > performance :) > >> > >> I am missing the one big rcu_read_lock() part. For example, in > >> i40e_txrx.c, > >> i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used to > >> run > >> in i40e_run_xdp() and it is fine. > >> > >> In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the > >> rcu_read_unlock() has already done. It is now run in xdp_do_flush_map(). > >> or I missed the big rcu_read_lock() in i40e_napi_poll()? > >> > >> I do see the big rcu_read_lock() in mlx5e_napi_poll(). > > > > I believed/assumed xdp_do_flush_map() was already protected under an > > rcu_read_lock. As the devmap and cpumap, which get called via > > __dev_flush() and __cpu_map_flush(), have multiple RCU objects that we > > are operating on. What other rcu objects it is using during flush? > > > > Perhaps it is a bug in i40e? A quick look into ixgbe falls into the same bucket. didn't look at other drivers though. > > > > We are running in softirq in NAPI context, when xdp_do_flush_map() is > > call, which I think means that this CPU will not go-through a RCU grace > > period before we exit softirq, so in-practice it should be safe. > > Yup, this seems to be correct: rcu_softirq_qs() is only called between > full invocations of the softirq handle
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
Jesper Dangaard Brouer writes: > On Thu, 15 Apr 2021 10:35:51 -0700 > Martin KaFai Lau wrote: > >> On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: >> > Hangbin Liu writes: >> > >> > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: >> > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) >> > >> > { >> > >> > struct net_device *dev = bq->dev; >> > >> > - int sent = 0, err = 0; >> > >> > + int sent = 0, drops = 0, err = 0; >> > >> > + unsigned int cnt = bq->count; >> > >> > + int to_send = cnt; >> > >> > int i; >> > >> > >> > >> > - if (unlikely(!bq->count)) >> > >> > + if (unlikely(!cnt)) >> > >> > return; >> > >> > >> > >> > - for (i = 0; i < bq->count; i++) { >> > >> > + for (i = 0; i < cnt; i++) { >> > >> > struct xdp_frame *xdpf = bq->q[i]; >> > >> > >> > >> > prefetch(xdpf); >> > >> > } >> > >> > >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, >> > >> > flags); >> > >> > + if (bq->xdp_prog) { >> > >> bq->xdp_prog is used here >> > >> >> > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, >> > >> > cnt, dev); >> > >> > + if (!to_send) >> > >> > + goto out; >> > >> > + >> > >> > + drops = cnt - to_send; >> > >> > + } >> > >> > + >> > >> >> > >> [ ... ] >> > >> >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame >> > >> > *xdpf, >> > >> > -struct net_device *dev_rx) >> > >> > +struct net_device *dev_rx, struct bpf_prog >> > >> > *xdp_prog) >> > >> > { >> > >> > struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); >> > >> > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); >> > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device *dev, >> > >> > struct xdp_frame *xdpf, >> > >> > /* Ingress dev_rx will be the same for all xdp_frame's in >> > >> >* bulk_queue, because bq stored per-CPU and must be flushed >> > >> >* from net_device drivers NAPI func end. >> > >> > + * >> > >> > + * Do the same with xdp_prog and flush_list since these fields >> > >> > + * are only ever modified together. >> > >> >*/ >> > >> > - if (!bq->dev_rx) >> > >> > + if (!bq->dev_rx) { >> > >> > bq->dev_rx = dev_rx; >> > >> > + bq->xdp_prog = xdp_prog; >> > >> bp->xdp_prog is assigned here and could be used later in bq_xmit_all(). >> > >> How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? >> > >> It is not very obvious after taking a quick look at xdp_do_flush[_map]. >> > >> >> > >> e.g. what if the devmap elem gets deleted. >> > > >> > > Jesper knows better than me. From my veiw, based on the description of >> > > __dev_flush(): >> > > >> > > On devmap tear down we ensure the flush list is empty before completing >> > > to >> > > ensure all flush operations have completed. When drivers update the bpf >> > > program they may need to ensure any flush ops are also complete. >> >> AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's elem. >> >> > >> > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, >> > which also runs under one big rcu_read_lock(). So the storage in the >> > bulk queue is quite temporary, it's just used for bulking to increase >> > performance :) >> >> I am missing the one big rcu_read_lock() part. For example, in i40e_txrx.c, >> i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used to run >> in i40e_run_xdp() and it is fine. >> >> In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the >> rcu_read_unlock() has already done. It is now run in xdp_do_flush_map(). >> or I missed the big rcu_read_lock() in i40e_napi_poll()? >> >> I do see the big rcu_read_lock() in mlx5e_napi_poll(). > > I believed/assumed xdp_do_flush_map() was already protected under an > rcu_read_lock. As the devmap and cpumap, which get called via > __dev_flush() and __cpu_map_flush(), have multiple RCU objects that we > are operating on. > > Perhaps it is a bug in i40e? > > We are running in softirq in NAPI context, when xdp_do_flush_map() is > call, which I think means that this CPU will not go-through a RCU grace > period before we exit softirq, so in-practice it should be safe. Yup, this seems to be correct: rcu_softirq_qs() is only called between full invocations of the softirq handler, which for networking is net_rx_action(), and so translates into full NAPI poll cycles. -Toke
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Thu, 15 Apr 2021 10:35:51 -0700 Martin KaFai Lau wrote: > On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: > > Hangbin Liu writes: > > > > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > > >> > { > > >> >struct net_device *dev = bq->dev; > > >> > - int sent = 0, err = 0; > > >> > + int sent = 0, drops = 0, err = 0; > > >> > + unsigned int cnt = bq->count; > > >> > + int to_send = cnt; > > >> >int i; > > >> > > > >> > - if (unlikely(!bq->count)) > > >> > + if (unlikely(!cnt)) > > >> >return; > > >> > > > >> > - for (i = 0; i < bq->count; i++) { > > >> > + for (i = 0; i < cnt; i++) { > > >> >struct xdp_frame *xdpf = bq->q[i]; > > >> > > > >> >prefetch(xdpf); > > >> >} > > >> > > > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, > > >> > flags); > > >> > + if (bq->xdp_prog) { > > >> bq->xdp_prog is used here > > >> > > >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, > > >> > cnt, dev); > > >> > + if (!to_send) > > >> > + goto out; > > >> > + > > >> > + drops = cnt - to_send; > > >> > + } > > >> > + > > >> > > >> [ ... ] > > >> > > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > > >> > - struct net_device *dev_rx) > > >> > + struct net_device *dev_rx, struct bpf_prog > > >> > *xdp_prog) > > >> > { > > >> >struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > > >> >struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device *dev, > > >> > struct xdp_frame *xdpf, > > >> >/* Ingress dev_rx will be the same for all xdp_frame's in > > >> > * bulk_queue, because bq stored per-CPU and must be flushed > > >> > * from net_device drivers NAPI func end. > > >> > + * > > >> > + * Do the same with xdp_prog and flush_list since these fields > > >> > + * are only ever modified together. > > >> > */ > > >> > - if (!bq->dev_rx) > > >> > + if (!bq->dev_rx) { > > >> >bq->dev_rx = dev_rx; > > >> > + bq->xdp_prog = xdp_prog; > > >> bp->xdp_prog is assigned here and could be used later in bq_xmit_all(). > > >> How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? > > >> It is not very obvious after taking a quick look at xdp_do_flush[_map]. > > >> > > >> e.g. what if the devmap elem gets deleted. > > > > > > Jesper knows better than me. From my veiw, based on the description of > > > __dev_flush(): > > > > > > On devmap tear down we ensure the flush list is empty before completing to > > > ensure all flush operations have completed. When drivers update the bpf > > > program they may need to ensure any flush ops are also complete. > > AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's elem. > > > > > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, > > which also runs under one big rcu_read_lock(). So the storage in the > > bulk queue is quite temporary, it's just used for bulking to increase > > performance :) > > I am missing the one big rcu_read_lock() part. For example, in i40e_txrx.c, > i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used to run > in i40e_run_xdp() and it is fine. > > In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the > rcu_read_unlock() has already done. It is now run in xdp_do_flush_map(). > or I missed the big rcu_read_lock() in i40e_napi_poll()? > > I do see the big rcu_read_lock() in mlx5e_napi_poll(). I believed/assumed xdp_do_flush_map() was already protected under an rcu_read_lock. As the devmap and cpumap, which get called via __dev_flush() and __cpu_map_flush(), have multiple RCU objects that we are operating on. Perhaps it is a bug in i40e? We are running in softirq in NAPI context, when xdp_do_flush_map() is call, which I think means that this CPU will not go-through a RCU grace period before we exit softirq, so in-practice it should be safe. But to be correct I do think we need a rcu_read_lock() around this call. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Thu, Apr 15, 2021 at 11:22:19AM +0200, Toke Høiland-Jørgensen wrote: > Hangbin Liu writes: > > > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > >> > { > >> > struct net_device *dev = bq->dev; > >> > -int sent = 0, err = 0; > >> > +int sent = 0, drops = 0, err = 0; > >> > +unsigned int cnt = bq->count; > >> > +int to_send = cnt; > >> > int i; > >> > > >> > -if (unlikely(!bq->count)) > >> > +if (unlikely(!cnt)) > >> > return; > >> > > >> > -for (i = 0; i < bq->count; i++) { > >> > +for (i = 0; i < cnt; i++) { > >> > struct xdp_frame *xdpf = bq->q[i]; > >> > > >> > prefetch(xdpf); > >> > } > >> > > >> > -sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, > >> > flags); > >> > +if (bq->xdp_prog) { > >> bq->xdp_prog is used here > >> > >> > +to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, > >> > cnt, dev); > >> > +if (!to_send) > >> > +goto out; > >> > + > >> > +drops = cnt - to_send; > >> > +} > >> > + > >> > >> [ ... ] > >> > >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > >> > - struct net_device *dev_rx) > >> > + struct net_device *dev_rx, struct bpf_prog > >> > *xdp_prog) > >> > { > >> > struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > >> > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device *dev, > >> > struct xdp_frame *xdpf, > >> > /* Ingress dev_rx will be the same for all xdp_frame's in > >> > * bulk_queue, because bq stored per-CPU and must be flushed > >> > * from net_device drivers NAPI func end. > >> > + * > >> > + * Do the same with xdp_prog and flush_list since these fields > >> > + * are only ever modified together. > >> > */ > >> > -if (!bq->dev_rx) > >> > +if (!bq->dev_rx) { > >> > bq->dev_rx = dev_rx; > >> > +bq->xdp_prog = xdp_prog; > >> bp->xdp_prog is assigned here and could be used later in bq_xmit_all(). > >> How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? > >> It is not very obvious after taking a quick look at xdp_do_flush[_map]. > >> > >> e.g. what if the devmap elem gets deleted. > > > > Jesper knows better than me. From my veiw, based on the description of > > __dev_flush(): > > > > On devmap tear down we ensure the flush list is empty before completing to > > ensure all flush operations have completed. When drivers update the bpf > > program they may need to ensure any flush ops are also complete. AFAICT, the bq->xdp_prog is not from the dev. It is from a devmap's elem. > > Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, > which also runs under one big rcu_read_lock(). So the storage in the > bulk queue is quite temporary, it's just used for bulking to increase > performance :) I am missing the one big rcu_read_lock() part. For example, in i40e_txrx.c, i40e_run_xdp() has its own rcu_read_lock/unlock(). dst->xdp_prog used to run in i40e_run_xdp() and it is fine. In this patch, dst->xdp_prog is run outside of i40e_run_xdp() where the rcu_read_unlock() has already done. It is now run in xdp_do_flush_map(). or I missed the big rcu_read_lock() in i40e_napi_poll()? I do see the big rcu_read_lock() in mlx5e_napi_poll().
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
Hangbin Liu writes: > On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: >> > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) >> > { >> >struct net_device *dev = bq->dev; >> > - int sent = 0, err = 0; >> > + int sent = 0, drops = 0, err = 0; >> > + unsigned int cnt = bq->count; >> > + int to_send = cnt; >> >int i; >> > >> > - if (unlikely(!bq->count)) >> > + if (unlikely(!cnt)) >> >return; >> > >> > - for (i = 0; i < bq->count; i++) { >> > + for (i = 0; i < cnt; i++) { >> >struct xdp_frame *xdpf = bq->q[i]; >> > >> >prefetch(xdpf); >> >} >> > >> > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); >> > + if (bq->xdp_prog) { >> bq->xdp_prog is used here >> >> > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); >> > + if (!to_send) >> > + goto out; >> > + >> > + drops = cnt - to_send; >> > + } >> > + >> >> [ ... ] >> >> > static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, >> > - struct net_device *dev_rx) >> > + struct net_device *dev_rx, struct bpf_prog *xdp_prog) >> > { >> >struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); >> >struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); >> > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device *dev, >> > struct xdp_frame *xdpf, >> >/* Ingress dev_rx will be the same for all xdp_frame's in >> > * bulk_queue, because bq stored per-CPU and must be flushed >> > * from net_device drivers NAPI func end. >> > + * >> > + * Do the same with xdp_prog and flush_list since these fields >> > + * are only ever modified together. >> > */ >> > - if (!bq->dev_rx) >> > + if (!bq->dev_rx) { >> >bq->dev_rx = dev_rx; >> > + bq->xdp_prog = xdp_prog; >> bp->xdp_prog is assigned here and could be used later in bq_xmit_all(). >> How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? >> It is not very obvious after taking a quick look at xdp_do_flush[_map]. >> >> e.g. what if the devmap elem gets deleted. > > Jesper knows better than me. From my veiw, based on the description of > __dev_flush(): > > On devmap tear down we ensure the flush list is empty before completing to > ensure all flush operations have completed. When drivers update the bpf > program they may need to ensure any flush ops are also complete. Yeah, drivers call xdp_do_flush() before exiting their NAPI poll loop, which also runs under one big rcu_read_lock(). So the storage in the bulk queue is quite temporary, it's just used for bulking to increase performance :) -Toke
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Wed, Apr 14, 2021 at 05:17:11PM -0700, Martin KaFai Lau wrote: > > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > > { > > struct net_device *dev = bq->dev; > > - int sent = 0, err = 0; > > + int sent = 0, drops = 0, err = 0; > > + unsigned int cnt = bq->count; > > + int to_send = cnt; > > int i; > > > > - if (unlikely(!bq->count)) > > + if (unlikely(!cnt)) > > return; > > > > - for (i = 0; i < bq->count; i++) { > > + for (i = 0; i < cnt; i++) { > > struct xdp_frame *xdpf = bq->q[i]; > > > > prefetch(xdpf); > > } > > > > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); > > + if (bq->xdp_prog) { > bq->xdp_prog is used here > > > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); > > + if (!to_send) > > + goto out; > > + > > + drops = cnt - to_send; > > + } > > + > > [ ... ] > > > static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > > - struct net_device *dev_rx) > > + struct net_device *dev_rx, struct bpf_prog *xdp_prog) > > { > > struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device *dev, struct > > xdp_frame *xdpf, > > /* Ingress dev_rx will be the same for all xdp_frame's in > > * bulk_queue, because bq stored per-CPU and must be flushed > > * from net_device drivers NAPI func end. > > +* > > +* Do the same with xdp_prog and flush_list since these fields > > +* are only ever modified together. > > */ > > - if (!bq->dev_rx) > > + if (!bq->dev_rx) { > > bq->dev_rx = dev_rx; > > + bq->xdp_prog = xdp_prog; > bp->xdp_prog is assigned here and could be used later in bq_xmit_all(). > How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? > It is not very obvious after taking a quick look at xdp_do_flush[_map]. > > e.g. what if the devmap elem gets deleted. Jesper knows better than me. From my veiw, based on the description of __dev_flush(): On devmap tear down we ensure the flush list is empty before completing to ensure all flush operations have completed. When drivers update the bpf program they may need to ensure any flush ops are also complete. Thanks Hangbin
Re: [PATCHv7 bpf-next 1/4] bpf: run devmap xdp_prog on flush instead of bulk enqueue
On Wed, Apr 14, 2021 at 08:26:07PM +0800, Hangbin Liu wrote: [ ... ] > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c > index aa516472ce46..3980fb3bfb09 100644 > --- a/kernel/bpf/devmap.c > +++ b/kernel/bpf/devmap.c > @@ -57,6 +57,7 @@ struct xdp_dev_bulk_queue { > struct list_head flush_node; > struct net_device *dev; > struct net_device *dev_rx; > + struct bpf_prog *xdp_prog; > unsigned int count; > }; > > @@ -326,22 +327,71 @@ bool dev_map_can_have_prog(struct bpf_map *map) > return false; > } > > +static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog, > + struct xdp_frame **frames, int n, > + struct net_device *dev) > +{ > + struct xdp_txq_info txq = { .dev = dev }; > + struct xdp_buff xdp; > + int i, nframes = 0; > + > + for (i = 0; i < n; i++) { > + struct xdp_frame *xdpf = frames[i]; > + u32 act; > + int err; > + > + xdp_convert_frame_to_buff(xdpf, &xdp); > + xdp.txq = &txq; > + > + act = bpf_prog_run_xdp(xdp_prog, &xdp); > + switch (act) { > + case XDP_PASS: > + err = xdp_update_frame_from_buff(&xdp, xdpf); > + if (unlikely(err < 0)) > + xdp_return_frame_rx_napi(xdpf); > + else > + frames[nframes++] = xdpf; > + break; > + default: > + bpf_warn_invalid_xdp_action(act); > + fallthrough; > + case XDP_ABORTED: > + trace_xdp_exception(dev, xdp_prog, act); > + fallthrough; > + case XDP_DROP: > + xdp_return_frame_rx_napi(xdpf); > + break; > + } > + } > + return nframes; /* sent frames count */ > +} > + > static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags) > { > struct net_device *dev = bq->dev; > - int sent = 0, err = 0; > + int sent = 0, drops = 0, err = 0; > + unsigned int cnt = bq->count; > + int to_send = cnt; > int i; > > - if (unlikely(!bq->count)) > + if (unlikely(!cnt)) > return; > > - for (i = 0; i < bq->count; i++) { > + for (i = 0; i < cnt; i++) { > struct xdp_frame *xdpf = bq->q[i]; > > prefetch(xdpf); > } > > - sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags); > + if (bq->xdp_prog) { bq->xdp_prog is used here > + to_send = dev_map_bpf_prog_run(bq->xdp_prog, bq->q, cnt, dev); > + if (!to_send) > + goto out; > + > + drops = cnt - to_send; > + } > + [ ... ] > static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf, > -struct net_device *dev_rx) > +struct net_device *dev_rx, struct bpf_prog *xdp_prog) > { > struct list_head *flush_list = this_cpu_ptr(&dev_flush_list); > struct xdp_dev_bulk_queue *bq = this_cpu_ptr(dev->xdp_bulkq); > @@ -412,18 +466,22 @@ static void bq_enqueue(struct net_device *dev, struct > xdp_frame *xdpf, > /* Ingress dev_rx will be the same for all xdp_frame's in >* bulk_queue, because bq stored per-CPU and must be flushed >* from net_device drivers NAPI func end. > + * > + * Do the same with xdp_prog and flush_list since these fields > + * are only ever modified together. >*/ > - if (!bq->dev_rx) > + if (!bq->dev_rx) { > bq->dev_rx = dev_rx; > + bq->xdp_prog = xdp_prog; bp->xdp_prog is assigned here and could be used later in bq_xmit_all(). How is bq->xdp_prog protected? Are they all under one rcu_read_lock()? It is not very obvious after taking a quick look at xdp_do_flush[_map]. e.g. what if the devmap elem gets deleted. [ ... ] > static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp, > -struct net_device *dev_rx) > + struct net_device *dev_rx, > + struct bpf_prog *xdp_prog) > { > struct xdp_frame *xdpf; > int err; > @@ -439,42 +497,14 @@ static inline int __xdp_enqueue(struct net_device *dev, > struct xdp_buff *xdp, > if (unlikely(!xdpf)) > return -EOVERFLOW; > > - bq_enqueue(dev, xdpf, dev_rx); > + bq_enqueue(dev, xdpf, dev_rx, xdp_prog); > return 0; > } > [ ... ] > @@ -482,12 +512,7 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct > xdp_buff *xdp, > { > struct net_device *dev = dst->dev; > > - if (dst->xdp_prog) { > - xdp = dev_map_run_prog(dev, xdp, dst->xdp_prog); > - if (!xdp) > - return 0; > - } > - return __xdp_enqueue(dev, xdp, dev_rx); > + return __xdp_