Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Jan Kiszka wrote:
>>>>>> It's still unclear what goes on precisely, we are still digging, but the
>>>>>> test system that can produce this is highly contended.
>>>>> Short update: Further instrumentation revealed that cr3 differs from
>>>>> active_mm->pgd while we are looping over that fault, ie. the kernel
>>>>> tries to fixup the wrong mm. And that means we have some open race
>>>>> window between updating cr3 and active_mm somewhere (isn't switch_mm run
>>>>> in a preemptible manner now?).
>>>> Maybe the rsp is wrong and leads you to the wrong active_mm ?
>>>>> As a first shot I disabled CONFIG_IPIPE_DELAYED_ATOMICSW, and we are now
>>>>> checking if it makes a difference. Digging deeper into the code in the
>>>> As you have found out in the mean time, we do not use unlocked context
>>>> switches on x86.
>>> The last question I asked myself (but couldn't answer yet due to other
>>> activity) was: Where are the local_irq_disable/enable_hw around
>>> switch_mm for its Linux callers?
>> Ha, that's the point: only activate_mm is protected, but we have more
>> spots in 2.6.29 and maybe other kernels, too!
> Ok, I do not see where switch_mm is called with IRQs off. What I found,
We have two direct callers of switch_mm in sched.c and one in fs/aio.c.
Both need protection (I pushed IRQ disabling into switch_mm), but that
is not enough according to current tests. It seems to reduce to
probability of corruption, though.
> however, is that leave_mm sets the cr3 and just clears
> active_mm->cpu_vm_mask. So, at this point, we have a discrepancy between
> cr3 and active_mm. I do not know what could happen if Xenomai could
> interrupt leave_mm between the cpu_clear and the write_cr3. From what I
> understand, switch_mm called by Xenomai upon return to root would re-set
> the bit, and re-set cr3, which would be set to the kernel cr3 right
> after that, but this would result in the active_mm.cpu_vm_mask bit being
> set instead of cleared as expected. So, maybe an irqs off section is
> missing in leave_mm.
leave_mm is already protected by its caller smp_invalidate_interrupt -
but now I'm parsing context_switch /wrt to lazy tlb.
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
Xenomai-core mailing list