On Fri, Jun 14, 2024 at 07:11:50AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 13, 2024 at 08:22:19PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 13, 2024 at 02:32:27PM -0400, Kent Overstreet wrote:
>
> [ . . . ]
>
> After sleeping on it, backing up a level of abstraction, and thus perhaps
> responding a bit more constructively...
>
> And also, thank you very much for looking into this!
>
> > > commit 3bb5d34d6d6e2d5433cce5d248fd81211afedec0
> > > Author: Kent Overstreet <[email protected]>
> > > Date: Mon Jun 10 20:47:03 2024 -0400
> > >
> > > bcachefs: pending_rcu_items
> > >
> > > Generic (?) data structure for explicitly tracking pending SRCU items
> > >
> > > - Can we make this generic across other RCU flavors?
>
> It should be possible, at least for current RCU and SRCU. I think I owe
> you SRCU variants of a couple of existing RCU functions. Later today!
I was wondering how much commonality there is between the core data
structures of RCU and SRCU. If RCU internally has something analagous to
srcu_struct, and we could refer to that object without caring if it's
for RCU or SRCU, that would be cool... otherwise, we can just do it a
level up with switch statements.
> > > - Do we need to add a fallback to linked lists when darray allocation
> > > fails?
>
> In general, as in kfree_rcu(), yes. In your specific case here, maybe not.
>
> > > - We wish to avoid the overhead of start_poll_synchronize_srcu() if we
> > > already have items for a given sequence number, but it appears
> > > get_poll_synchronize_srcu(); start_poll_synchronize_srcu() can
> > > actually return differert sequence numbers within the same read-side
> > > critical section. Is this the correct way of doing this?
>
> You can always buffer up items that have not yet been assigned a sequence
> number, then after you have enough or after enough time has passed, do a
> single get_poll_synchronize_srcu() for the group.
I don't think we'd want to do that, since that would be delaying the
point at which objects get reclaimed. Have a look at my latest version
and tell me if you think that works? It seems to work in my testing.
> > > - Do we need to ensure the rearming RCU callback is correctly flushed
> > > when exiting? (Probably)
>
> I believe so, and setting a flag then doing an srcu_barrier() should suffice.
Is the only difference between srcu_barrier() and synchronize_srcu()
that srcu_barrier() is a noop if callbacks weren't enqueued?
> > > +static void pending_rcu_items_process(struct pending_rcu_items *pending)
> > > {
> > > - struct bkey_cached *ck = container_of(rcu, struct bkey_cached, rcu);
> > > + while (pending_rcu_has_items(pending) &&
> > > + poll_state_synchronize_srcu(pending->srcu, pending->seq)) {
> > > + darray_for_each(pending->objs[0], i)
> > > + pending->process(pending, *i);
>
> Given that you have interrupts and/or bh disabled, there needs to be a
> third set of callbacks, those whose grace periods are already elapsed.
Not following? The pending->process() is exactly that callback.
Also, the new version does processing in proccess context, without the
lock.
> Unioning one of the ->objs[] sets onto this third set, leaving the source
> ->objs[] set empty, should strongly be O(1) without the need for memory
> allocation. I might be missing something, but at first glance darray
> does not support this.
>
> The pages-of-pointers structure named kvfree_rcu_bulk_data, along with
> the associated structures in kernel/rcu/tree.c does this work, but (1)
> is open-coded and (2) has to deal with the full range of what the kernel
> can throw at it. It falls back to an embedded rcu_head structure, in
> response to your second question in the commit log.
>
> But perhaps this code could be generalized? Or maybe darray can be
> appropriately extended?
Not sure what you mean by "unioning onto a third set"?
> And one thing that I missed the first time around is that you are using
> a single ->seq for two sets of objects (->objs[0] and ->objs[1]).
> Each set needs its own sequence number.
I've got this changed in my latest version, but - I did see one BUG_ON()
where we seem to have three sequence numbers in flight. Any idea what
might be up with that?