On Sun, Jun 09, 2024 at 07:10:53PM -0700, Paul E. McKenney wrote:
> On Sun, Jun 09, 2024 at 08:52:14PM -0400, Kent Overstreet wrote:
> > On Sun, Jun 09, 2024 at 09:55:30AM -0700, Paul E. McKenney wrote:
> > > On Sun, Jun 09, 2024 at 11:37:45AM -0400, Kent Overstreet wrote:
> > > > On Sat, Jun 08, 2024 at 08:25:53PM -0700, Paul E. McKenney wrote:
> > > > > Could you please try something like this just before the call to
> > > > > cleanup_srcu_struct()?
> > > > >
> > > > >
> > > > > WARN_ON_ONCE(poll_state_synchronize_srcu(&c->btree_trans_barrier,
> > > > > ck->btree_trans_barrier_seq);
> > > >
> > > > Which seq was this supposed to be? All keys have been freed by this
> > > > point...
> > >
> > > Or, alternatively, where in the code is this supposed to be?
> > >
> > > If there is no convenient point in the code to grab the most recent
> > > return value from start_poll_synchronize_srcu(), another thing to do
> > > is to invoke either synchronize_srcu() or synchronize_srcu_expedited()
> > > just before the call to cleanup_srcu_struct().
> > >
> > > Another approach is to use get_state_synchronize_srcu() instead of
> > > start_poll_synchronize_srcu(), and have a self-reposting SRCU callback
> > > to keep the grace periods going. Then you would set a flag that
> > > stopped it from self-posting, then do srcu_barrier(). With careful
> > > memory ordering.
> > >
> > > There are quite a few techniques to shut down the self-reposting SRCU
> > > callback when there is nothing for it to do and to restart it if need be.
> > >
> > > But just doing a synchronize_srcu() or synchronize_srcu_expedited() is
> > > a lot simpler and probably does the job.
> >
> > synchronize_srcu_expedited() seems like the simplest solution, yeah.
> >
> > Thanks, I think I'm starting (hazily) to get an idea of how the RCU code
> > is structured, but I'll have to dig more when I have more time, this is
> > interesting :)
> >
> > I am wondering why you couldn't just have cleanup_srcu_struct() do the
> > appropriate cleanup (synchronize_srcu_expedited?) in this instance; if
> > the caller is tearing down the srcu struct they don't need srcu
> > synchronization anymore, I would think the only safety issue that would
> > need a warning would be leaked read locks.
>
> Starting a grace period and then invoking cleanup_srcu_struct() before
> it has had a chance to finish seems worth a warning. And preferable to
> having something like poll_state_synchronize_rcu() segfault later on,
> for example.
>
> > Another question for you: is there a limit to the number of pending
> > sequence numbers from start_poll_synchronize_srcu()? (e.g. 2?)
> >
> > That affects the data structure I use for redoing this "track pending
> > frees" code.
>
> Yes, there is, and you are right, the number is two. Would something
> like the patch shown below help?
If I look at the sequence number from start_poll_synchronize_srcu() vs.
the last uncompleted one, I see a difference of 4 (so far?)
I could use a doubly linked list for tracking all uncompleted objects,
but I'm hoping use a darray per sequence number. Is it a max of two
sequence numbers that had objects/callbacks/were requested pulling,
or...?
Also, I'm wondering if this code might be worth turning into something
generic...
What I'm sketching out is:
struct btree_key_cache_freelist {
spinlock_t lock;
bool rcu_armed;
unsigned long seq;
struct bkey_cached *free[2]; /* singly linked list for now,
* perhaps darray/vector later
*/
struct rcu_head rcu; /* rearming callback to free objects */
};
and these are percpu. Thoughts?