On 2/15/2025 5:28 AM, Paul E. McKenney wrote:
> On Fri, Feb 14, 2025 at 01:03:40PM -0500, Joel Fernandes wrote:
>>>> Following up about our gpwrap discussions, I did some tracing of
>>>> rdp->gpwrap
>>>> with just boot testing.
>>>>
>>>> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
>>>> very close.
>>>>
>>>> Example trace:
>>>>
>>>> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
>>>>
>>>> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd),
>>>> rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
>>>> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd),
>>>> rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
>>>> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1
>>>> (0x1), wrap before: 0
>>>> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1
>>>> (0x1), wrap after: 0
>>>>
>>>> The wrap "before" and "after" are the value of gpwrap before and after
>>>> calling rcu_gpnum_ovf().
>>>>
>>>> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
>>>> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too
>>>> long
>>>> and many GPs have passed. This can happen both with/without wrap. So
>>>> imminent
>>>> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
>>>>
>>>> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap,
>>>> it
>>>> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.
>>>
>>> Exactly, and all of that is in fact the intended behavior.
>>
>> Ah I think I get it now, so the wrap isn't really about the counter wrapping
>> causing an issue, but about the design of the ULONG_CMP APIs which are
>> designed
>> for comparisons that wrap around. Due to this design, there is a side effect
>> that the numbers being compared have to be at a distance that is less than
>> half
>> the number wheel. Otherwise the comparisons of numbers yields incorrect
>> results.
>> Is that accurate to say? IOW, ->gpwrap == true means, don't compare the
>> numbers
>> anymore till we sort these huge deltas out.
>
> But "incorrect" is such a harsh and judgmental word. ;-)
>
> Somewhat more seriously, ULONG_CMP_{GE,LT}() are defined to return the
> sign (or its complement) of differences between a pair of 32-bit or 64-bit
> numbers, as sizeof(unsigned long) dictates. This is, mathematically,
> how you do signed differences. The result coming out negative isn't
> incorrect, but instead the expected result given the numbers in question.
>
> Current RCU use cases might interpret a negative result as "no one did
> it for us yet", as in rcu_barrier(), "this CPU has not completed the
> current grace period", as in __note_gp_changes(), and so on.
>
> Now, these can get a false-positive/-negative result. In the case of
> rcu_barrier(), we get a needless rcu_barrier() computation. In the case
> of __note_gp_changes(), the false negative is OK, but a false positive
> from wrapping all the way around, could be fatal for a CPU that was idle
> for a long time (hence ->gpwrap). And so on.
Understood and agreed! I guess what I was getting at was that ->gpwrap can be
true and false-negatives start getting reported well before a full wrap /
false-positive situation arises, I believe. And a false-negative may not be
desirable in all scenarios. In my other patch, I posted a scenario where I was
concerned about missed wakeups on a false-negative situation.
In srcutree.c
swait_event_interruptible_exclusive(
rnp->nocb_gp_wq[rcu_seq_ctr(wait_gp_seq) & 0x1],
rcu_seq_done(&rnp->gp_seq, wait_gp_seq) ||
!READ_ONCE(my_rdp->nocb_gp_sleep));
But I shamelessly admit I don't understand this code well enough to say if this
is an issue :/.
I agree depending on usecase, false-negative is a non-issue. But I am not sure
if false-negatives in all usecases is non-fatal.
>
>>>> >From a testing POV, the example shows it is not set even when around when
>>>> >a
>>>> wrap actually happened. So may be, we are not testing gpwrap much, or at
>>>> all, with rcutorture?
>>>
>>> True enough, and that would be well worth fixing.
>>>
>>> One simple (and maybe too simple) fix is to add a Kconfig option to
>>> kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
>>> (32 << RCU_SEQ_CTR_SHIFT). This, or some similar value, should cause
>>> rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
>>> within a few minutes.
Since minutes was too long for me :-D, I am trying just a count of 4, and I do
see 15 total wraps in just 2 minutes, on 36 CPU system.
I am trying this same patch now for 10 minute run on all scenarios:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=rcu/next&id=984c47eeb7728829901b33462baae2771af80ea0
Will try longer runs after.
Are you interested in an rcutorture patch where we force this? Would this be a
default setting where we always run it, or do we want to set it only under
torture command line options? Do we default to say a high value like 32, but
still provide an option to lower it to say 4, like I am doing?
>>>
>>> And I would not be surprised if doing this located bugs.
>>
>> Ok cool! So I will work on this next then. I am curious to see how often it
>> triggers both with/without such a change.
>
> If you can make it trigger without this change, you are far more patient
> than I am. ;-)
>
>>>> But before that, I am feeling it is not behaving correctly. I'd expect it
>>>> to
>>>> be set even if rnp and rdp values are close but we are actually about to
>>>> wrap. So most likely I am missing something.
>>>
>>> If I understand correctly, you would like ->gpwrap to be set
>>> unconditionally at various points in the cycle, for example, at zero,
>>> ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4. But this could easily
>>> have the effect of discarding each CPU's quiescent states that showed
>>> up before the first call to rcu_watching_snap_recheck(), even if that
>>> CPU was active. Our users might not consider this to be a friendly act.
Got it, thanks.
>>
>> Got it.
>>
>>> In contrast, the current setup only discards quiescent states for CPUs
>>> that just came back from idle prior to call to rcu_watching_snap_save().
>>
>> Do you mean the check of ->gpwrap in rcu_report_qs_rdp() which returns early?
>
> Well, it also resets the CPU to collect quiescent states for the current
> grace period, but yes, it does return early.
Got it, makes sense!
thanks,
- Joel