Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>>> It's still unclear what goes on precisely, we are still digging, but the >>>> test system that can produce this is highly contended. >>> Short update: Further instrumentation revealed that cr3 differs from >>> active_mm->pgd while we are looping over that fault, ie. the kernel >>> tries to fixup the wrong mm. And that means we have some open race >>> window between updating cr3 and active_mm somewhere (isn't switch_mm run >>> in a preemptible manner now?). >> Maybe the rsp is wrong and leads you to the wrong active_mm ? >> >>> As a first shot I disabled CONFIG_IPIPE_DELAYED_ATOMICSW, and we are now >>> checking if it makes a difference. Digging deeper into the code in the >>> meanwhile... >> As you have found out in the mean time, we do not use unlocked context >> switches on x86. >> > > Yes. > > The last question I asked myself (but couldn't answer yet due to other > activity) was: Where are the local_irq_disable/enable_hw around > switch_mm for its Linux callers?
Ha, that's the point: only activate_mm is protected, but we have more spots in 2.6.29 and maybe other kernels, too! Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core