On 2024-07-05 10:39:25 [-0700], Paul E. McKenney wrote:
> As a workaround, the following commit in -rcu that is slated for
> the upcoming merge window addresses a similar case involving KVM and
> nohz_full:
>
> 68d124b09999 ("rcu: Add rcutree.nohz_full_patience_delay to reduce
> nohz_full OS jitter")
>
> The KVM guys found that setting rcutree.nohz_full_patience_delay to 1000
> (AKA one second) made things work better for them. Does this help your
> use case?
My problem is that I have a task stuck in percpu_down_write()/
__wait_rcu_gp() and I think this is because the RCU machinery is stuck
and there is no grace period.
I have see a rcuc/ thread with a wakeup but it won't be scheduled
because it's priority is lower than the thread that is currently on the
CPU and that thread uses at 100%.
I *think* this explains it because the rcuc moves the grace period
forward.
Looking at the patch, there would be a delay up to 5 secs which would
mean if the task consumes 100% of the CPU then it doesn't change a
thing.
Thank you Paul for the pointers.
> This is again a workaround. Clearly, it would be better if we could
> eliminate that second rcuc wakeup. I tried something similar some time
> back, and there was a problem with it. I will see if I can reconstitute
> the corresponding brain cells.
Is my assumption correct, in order to push the grace period forward,
otherwise the whole is stuck?
> But in the meantime, one advantage of the workaround is that in the
> common case, it would reduce the number of rcuc wakeups to zero, rather
> than to just one.
>
> Thoughts?
I *think* if what I just wrote is correct, I will either have to raise
the priority of rcuc/ or make the thread, that consumes 100% of the CPU
lose its RT priority. Then with the limited number of wakeups it should
be doable.
PS: I do remember the RCU-task thread we had. I did have an idea but I
need check if this is feasible first. So I did not forget, just slow…
> Thanx, Paul
Sebastian