On Mon, Oct 02, 2023 at 11:09:39PM +0200, Frederic Weisbecker wrote:
> > > spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
> > > WRITE_ONCE(ssp->srcu_sup->srcu_gp_start, jiffies);
> > > WRITE_ONCE(ssp->srcu_sup->srcu_n_exp_nodelay, 0);
> > > @@ -1245,7 +1243,18 @@ static unsigned long
> > > srcu_gp_start_if_needed(struct srcu_struct *ssp,
> > > rcu_segcblist_advance(&sdp->srcu_cblist,
> > >
> > > rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
> > > s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
> > > - (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
> > > + /*
> > > + * Acceleration might fail if the preceding call to
> > > + * rcu_segcblist_advance() also failed due to a prior grace
> > > + * period seen incomplete before rcu_seq_snap(). If so then a new
> > > + * call to advance will see the completed grace period and fix
> > > + * the situation.
> > > + */
> > > + if (!rcu_segcblist_accelerate(&sdp->srcu_cblist, s)) {
> >
> > We can add below also? Here old and new are rcu_seq_current() values used in
> > the 2 calls to rcu_segcblist_advance().
> >
> > WARN_ON_ONCE(!(rcu_seq_completed_gp(old, new) && rcu_seq_new_gp(old, new)));
>
> Very good point! "new" should be exactly one and a half grace period away from
> "old", will add that.
>
> Cooking proper patches now.
Actually this more simple fix below. rcu_seq_snap() can be called before
rcu_segcblist_advance() after all. The only side effect is that callbacks
advancing is then _after_ the full barrier in rcu_seq_snap(). I don't see
an obvious problem with that as that barrier only cares about:
1) Ordering accesses of the update side before call_srcu() so they don't bleed
2) See all the accesses prior to the grace period of the current gp_num
The only things callbacks advancing need to be ordered against are carried by
snp locking.
I still remove the accelerations elsewhere and advancing in srcu_gp_start() in
further patches. I'll also avoid advancing and acceleration in
srcu_gp_start_if_needed if there is no callback to queue.
The point is also that this simple fix alone can be easily backported and
the rest can come as cleanups.
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 5602042856b1..8b09fb37dbf3 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1244,10 +1244,10 @@ static unsigned long srcu_gp_start_if_needed(struct
srcu_struct *ssp,
spin_lock_irqsave_sdp_contention(sdp, &flags);
if (rhp)
rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
+ s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
rcu_segcblist_advance(&sdp->srcu_cblist,
rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
- s = rcu_seq_snap(&ssp->srcu_sup->srcu_gp_seq);
- (void)rcu_segcblist_accelerate(&sdp->srcu_cblist, s);
+ WARN_ON_ONCE(!rcu_segcblist_accelerate(&sdp->srcu_cblist, s) && rhp);
if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
sdp->srcu_gp_seq_needed = s;
needgp = true;