Jan Kiszka wrote:
> It's still unclear what goes on precisely, we are still digging, but the
> test system that can produce this is highly contended.
Short update: Further instrumentation revealed that cr3 differs from
active_mm->pgd while we are looping over that fault, ie. the kernel
tries to fixup the wrong mm. And that means we have some open race
window between updating cr3 and active_mm somewhere (isn't switch_mm run
in a preemptible manner now?).
As a first shot I disabled CONFIG_IPIPE_DELAYED_ATOMICSW, and we are now
checking if it makes a difference. Digging deeper into the code in the
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
Xenomai-core mailing list