Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
>>>> Gilles Chanteperdrix wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Gilles Chanteperdrix wrote:
>>>>>>> Jan Kiszka wrote:
>>>>>>>> Hi again,
>>>>>>>>
>>>>>>>> this is now basically the patch which seems to stabilized x86 /wrt mmu
>>>>>>>> switches again:
>>>>>>>>
>>>>>>>> There were 3 race windows between setting active_mm of the current task
>>>>>>>> and actually switching it (that's a noarch issue), there were several
>>>>>>>> calls into switch_mm without proper hard interrupt protection (on archs
>>>>>>>> that have no preemptible switch_mm, like x86) and there was a race in
>>>>>>>> x86's leave_mm (as Gilles already remarked earlier in this thread -
>>>>>>>> while I was looking at an old tree where smp_invalidate_interrupt took
>>>>>>>> care of this).
>>>>>>>>
>>>>>>>> The patch is thought as a basis for further discussions about
>>>>>>>>
>>>>>>>>  o how to solve all the issues for all archs technically (ideally
>>>>>>>>    without the need to patch noarch parts in an arch-specific way...)
>>>>>>>>
>>>>>>>>  o if anyone thinks there could be more spots like these (I've checked
>>>>>>>>    the code only for x86 so far)
>>>>>>>> diff --git a/fs/aio.c b/fs/aio.c
>>>>>>>> index 76da125..0286f0f 100644
>>>>>>>> --- a/fs/aio.c
>>>>>>>> +++ b/fs/aio.c
>>>>>>>> @@ -618,13 +618,16 @@ static void use_mm(struct mm_struct *mm)
>>>>>>>>  {
>>>>>>>>        struct mm_struct *active_mm;
>>>>>>>>        struct task_struct *tsk = current;
>>>>>>>> +      unsigned long flags;
>>>>>>>>  
>>>>>>>>        task_lock(tsk);
>>>>>>>>        active_mm = tsk->active_mm;
>>>>>>>>        atomic_inc(&mm->mm_count);
>>>>>>>>        tsk->mm = mm;
>>>>>>>> +      local_irq_save_hw_cond(flags);
>>>>>>>>        tsk->active_mm = mm;
>>>>>>>> -      switch_mm(active_mm, mm, tsk);
>>>>>>>> +      __switch_mm(active_mm, mm, tsk);
>>>>>>>> +      local_irq_restore_hw_cond(flags);
>>>>>>>>        task_unlock(tsk);
>>>>>>>>  
>>>>>>>>        mmdrop(active_mm);
>>>>>>>> diff --git a/fs/exec.c b/fs/exec.c
>>>>>>>> index 3b36c69..06591ac 100644
>>>>>>>> --- a/fs/exec.c
>>>>>>>> +++ b/fs/exec.c
>>>>>>>> @@ -718,6 +718,7 @@ static int exec_mmap(struct mm_struct *mm)
>>>>>>>>  {
>>>>>>>>        struct task_struct *tsk;
>>>>>>>>        struct mm_struct * old_mm, *active_mm;
>>>>>>>> +      unsigned long flags;
>>>>>>>>  
>>>>>>>>        /* Notify parent that we're no longer interested in the old VM 
>>>>>>>> */
>>>>>>>>        tsk = current;
>>>>>>>> @@ -740,8 +741,10 @@ static int exec_mmap(struct mm_struct *mm)
>>>>>>>>        task_lock(tsk);
>>>>>>>>        active_mm = tsk->active_mm;
>>>>>>>>        tsk->mm = mm;
>>>>>>>> +      local_irq_save_hw_cond(flags);
>>>>>>>>        tsk->active_mm = mm;
>>>>>>>>        activate_mm(active_mm, mm);
>>>>>>>> +      local_irq_restore_hw_cond(flags);
>>>>>>>>        task_unlock(tsk);
>>>>>>>>        arch_pick_mmap_layout(mm);
>>>>>>>>        if (old_mm) {
>>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>>>> index 01a836b..cf3b68a 100644
>>>>>>>> --- a/kernel/fork.c
>>>>>>>> +++ b/kernel/fork.c
>>>>>>>> @@ -1665,11 +1665,14 @@ SYSCALL_DEFINE1(unshare, unsigned long, 
>>>>>>>> unshare_flags)
>>>>>>>>                }
>>>>>>>>  
>>>>>>>>                if (new_mm) {
>>>>>>>> +                      unsigned long flags;
>>>>>>>>                        mm = current->mm;
>>>>>>>>                        active_mm = current->active_mm;
>>>>>>>>                        current->mm = new_mm;
>>>>>>>> +                      local_irq_save_hw_cond(flags);
>>>>>>>>                        current->active_mm = new_mm;
>>>>>>>>                        activate_mm(active_mm, new_mm);
>>>>>>>> +                      local_irq_restore_hw_cond(flags);
>>>>>>>>                        new_mm = mm;
>>>>>>>>                }
>>>>>>> Ok. These are patches of the noarch code which are dependent on x86
>>>>>>> implementation.
>>>>>>>
>>>>>>> I think we need something like
>>>>>>>
>>>>>>> mm_change_enter(flags);
>>>>>>> mm_change_leave(flags);
>>>>>>>
>>>>>>> which would resolve to ipipe_active_mm = NULL on architectures with
>>>>>>> unlocked context switches, and to local_irq_save_hw/local_irq_restore on
>>>>>>> x86?
>>>>>> + we have to change switch_mm in the lockless case, and maybe also its
>>>>>> caller (if I look at the arm code), to use ipipe_active_mm in order to
>>>>>> decide if to switch or not - see my explanation of the possible race in
>>>>>> aio.c.
>>>>> This is already the way it is currently working. the "prev" passed to
>>>>> switch_mm is always ipipe_active_mm.
>>>>>
>>>> Ah, I missed the difference in arm's xnarch_leave_root. Now I got it.
>>>>
>>>> OK, but this needs to be thought through in details again, specifically
>>>> also the enter-root scenarios. I have no safe feeling about it yet,
>>>> partly because switch_mm fiddles with ipipe_active_mm as well, but more
>>>> generally because we used to have such critical races in this area for a
>>>> "fairly" long time.
>>> When reentering root with ipipe_active_mm NULL, the TIF_MMSWTICH_INT bit
>>> is set, in which case the switch_mm preempted by Xenomai is restarted.
>>>
>> Right. But now we are also dealing with preemptions outside that loop.
>> And we are dealing with archs that have atomic switch_mm, thus no such
>> loops.
>>
>> The goal should be to find a solution against the races outside
>> switch_mm that is applicable to all archs. But the question is still
>> unanswered to me if we can simply reuse existing ipipe_active_mm without
>> disturbing its current use in non-atomic switch_mm.
>>
>> Moreover, part of the problems on x86 was the non-atomic update of
>> cpu_vm_mask vs. actually switching the hardware. How do you deal with
>> cpu_vm_mask on ARM? Why can you call switch_mm without IRQ protection?
>> My impression is that only the (costly) cpu_switch_mm should be allowed
>> to be preemptible to have a safe state around it. I'm also asking as we
>> would have to address this on x86 & co. once switch_mm could be called
>> with undefined prev_mm.
> 
> when ipipe_active_mm is set to NULL, switch_mm does not touch the
> current mm cpu_vm_mask, since it does not know what the current mm is.
> However, we have a problem when switching back to root: we do not clear
> the previous mm cpu_vm_mask since we do not call switch_mm in
> xnarch_switch_to, we should fix this. So, I think we can extend the zone
> where ipipe_active_mm is set to NULL, to protect this area from the
> effects of preemption by Xenomai.
> 
> So, IMO, what we need is something like ipipe_mm_switch_protect which
> sets ipipe_active_mm to NULL on ARM, and results in local_irq_save_hw on
> x86.

That sounds like a good step forward. I will split up my patch into a
non-arch part that instruments the code and an x86 part that installs
the required protection for that arch.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to