Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>>> Gilles Chanteperdrix wrote: >>>>> Jan Kiszka wrote: >>>>>> Jan Kiszka wrote: >>>>>>> It's still unclear what goes on precisely, we are still digging, but the >>>>>>> test system that can produce this is highly contended. >>>>>> Short update: Further instrumentation revealed that cr3 differs from >>>>>> active_mm->pgd while we are looping over that fault, ie. the kernel >>>>>> tries to fixup the wrong mm. And that means we have some open race >>>>>> window between updating cr3 and active_mm somewhere (isn't switch_mm run >>>>>> in a preemptible manner now?). >>>>> Maybe the rsp is wrong and leads you to the wrong active_mm ? >>>>> >>>>>> As a first shot I disabled CONFIG_IPIPE_DELAYED_ATOMICSW, and we are now >>>>>> checking if it makes a difference. Digging deeper into the code in the >>>>>> meanwhile... >>>>> As you have found out in the mean time, we do not use unlocked context >>>>> switches on x86. >>>>> >>>> Yes. >>>> >>>> The last question I asked myself (but couldn't answer yet due to other >>>> activity) was: Where are the local_irq_disable/enable_hw around >>>> switch_mm for its Linux callers? >>> Ha, that's the point: only activate_mm is protected, but we have more >>> spots in 2.6.29 and maybe other kernels, too! >> Ok, I do not see where switch_mm is called with IRQs off. What I found, > > We have two direct callers of switch_mm in sched.c and one in fs/aio.c. > Both need protection (I pushed IRQ disabling into switch_mm), but that > is not enough according to current tests. It seems to reduce to > probability of corruption, though. > >> however, is that leave_mm sets the cr3 and just clears >> active_mm->cpu_vm_mask. So, at this point, we have a discrepancy between >> cr3 and active_mm. I do not know what could happen if Xenomai could >> interrupt leave_mm between the cpu_clear and the write_cr3. From what I >> understand, switch_mm called by Xenomai upon return to root would re-set >> the bit, and re-set cr3, which would be set to the kernel cr3 right >> after that, but this would result in the active_mm.cpu_vm_mask bit being >> set instead of cleared as expected. So, maybe an irqs off section is >> missing in leave_mm. > > leave_mm is already protected by its caller smp_invalidate_interrupt - > but now I'm parsing context_switch /wrt to lazy tlb. >
Hmm... lazy tlb: This means a new task is switched in and has active_mm != mm. But do_page_fault reads task->mm... Just thoughts, no clear picture yet. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core