Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Gilles Chanteperdrix wrote:
>>>>>> Jan Kiszka wrote:
>>>>>>> Hi again,
>>>>>>>
>>>>>>> this is now basically the patch which seems to stabilized x86 /wrt mmu
>>>>>>> switches again:
>>>>>>>
>>>>>>> There were 3 race windows between setting active_mm of the current task
>>>>>>> and actually switching it (that's a noarch issue), there were several
>>>>>>> calls into switch_mm without proper hard interrupt protection (on archs
>>>>>>> that have no preemptible switch_mm, like x86) and there was a race in
>>>>>>> x86's leave_mm (as Gilles already remarked earlier in this thread -
>>>>>>> while I was looking at an old tree where smp_invalidate_interrupt took
>>>>>>> care of this).
>>>>>>>
>>>>>>> The patch is thought as a basis for further discussions about
>>>>>>>
>>>>>>>  o how to solve all the issues for all archs technically (ideally
>>>>>>>    without the need to patch noarch parts in an arch-specific way...)
>>>>>>>
>>>>>>>  o if anyone thinks there could be more spots like these (I've checked
>>>>>>>    the code only for x86 so far)
>>>>>>> diff --git a/fs/aio.c b/fs/aio.c
>>>>>>> index 76da125..0286f0f 100644
>>>>>>> --- a/fs/aio.c
>>>>>>> +++ b/fs/aio.c
>>>>>>> @@ -618,13 +618,16 @@ static void use_mm(struct mm_struct *mm)
>>>>>>>  {
>>>>>>>         struct mm_struct *active_mm;
>>>>>>>         struct task_struct *tsk = current;
>>>>>>> +       unsigned long flags;
>>>>>>>  
>>>>>>>         task_lock(tsk);
>>>>>>>         active_mm = tsk->active_mm;
>>>>>>>         atomic_inc(&mm->mm_count);
>>>>>>>         tsk->mm = mm;
>>>>>>> +       local_irq_save_hw_cond(flags);
>>>>>>>         tsk->active_mm = mm;
>>>>>>> -       switch_mm(active_mm, mm, tsk);
>>>>>>> +       __switch_mm(active_mm, mm, tsk);
>>>>>>> +       local_irq_restore_hw_cond(flags);
>>>>>>>         task_unlock(tsk);
>>>>>>>  
>>>>>>>         mmdrop(active_mm);
>>>>>>> diff --git a/fs/exec.c b/fs/exec.c
>>>>>>> index 3b36c69..06591ac 100644
>>>>>>> --- a/fs/exec.c
>>>>>>> +++ b/fs/exec.c
>>>>>>> @@ -718,6 +718,7 @@ static int exec_mmap(struct mm_struct *mm)
>>>>>>>  {
>>>>>>>         struct task_struct *tsk;
>>>>>>>         struct mm_struct * old_mm, *active_mm;
>>>>>>> +       unsigned long flags;
>>>>>>>  
>>>>>>>         /* Notify parent that we're no longer interested in the old VM 
>>>>>>> */
>>>>>>>         tsk = current;
>>>>>>> @@ -740,8 +741,10 @@ static int exec_mmap(struct mm_struct *mm)
>>>>>>>         task_lock(tsk);
>>>>>>>         active_mm = tsk->active_mm;
>>>>>>>         tsk->mm = mm;
>>>>>>> +       local_irq_save_hw_cond(flags);
>>>>>>>         tsk->active_mm = mm;
>>>>>>>         activate_mm(active_mm, mm);
>>>>>>> +       local_irq_restore_hw_cond(flags);
>>>>>>>         task_unlock(tsk);
>>>>>>>         arch_pick_mmap_layout(mm);
>>>>>>>         if (old_mm) {
>>>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>>>> index 01a836b..cf3b68a 100644
>>>>>>> --- a/kernel/fork.c
>>>>>>> +++ b/kernel/fork.c
>>>>>>> @@ -1665,11 +1665,14 @@ SYSCALL_DEFINE1(unshare, unsigned long, 
>>>>>>> unshare_flags)
>>>>>>>                 }
>>>>>>>  
>>>>>>>                 if (new_mm) {
>>>>>>> +                       unsigned long flags;
>>>>>>>                         mm = current->mm;
>>>>>>>                         active_mm = current->active_mm;
>>>>>>>                         current->mm = new_mm;
>>>>>>> +                       local_irq_save_hw_cond(flags);
>>>>>>>                         current->active_mm = new_mm;
>>>>>>>                         activate_mm(active_mm, new_mm);
>>>>>>> +                       local_irq_restore_hw_cond(flags);
>>>>>>>                         new_mm = mm;
>>>>>>>                 }
>>>>>> Ok. These are patches of the noarch code which are dependent on x86
>>>>>> implementation.
>>>>>>
>>>>>> I think we need something like
>>>>>>
>>>>>> mm_change_enter(flags);
>>>>>> mm_change_leave(flags);
>>>>>>
>>>>>> which would resolve to ipipe_active_mm = NULL on architectures with
>>>>>> unlocked context switches, and to local_irq_save_hw/local_irq_restore on
>>>>>> x86?
>>>>> + we have to change switch_mm in the lockless case, and maybe also its
>>>>> caller (if I look at the arm code), to use ipipe_active_mm in order to
>>>>> decide if to switch or not - see my explanation of the possible race in
>>>>> aio.c.
>>>> This is already the way it is currently working. the "prev" passed to
>>>> switch_mm is always ipipe_active_mm.
>>>>
>>> Ah, I missed the difference in arm's xnarch_leave_root. Now I got it.
>>>
>>> OK, but this needs to be thought through in details again, specifically
>>> also the enter-root scenarios. I have no safe feeling about it yet,
>>> partly because switch_mm fiddles with ipipe_active_mm as well, but more
>>> generally because we used to have such critical races in this area for a
>>> "fairly" long time.
>> When reentering root with ipipe_active_mm NULL, the TIF_MMSWTICH_INT bit
>> is set, in which case the switch_mm preempted by Xenomai is restarted.
>>
> 
> Right. But now we are also dealing with preemptions outside that loop.
> And we are dealing with archs that have atomic switch_mm, thus no such
> loops.
> 
> The goal should be to find a solution against the races outside
> switch_mm that is applicable to all archs. But the question is still
> unanswered to me if we can simply reuse existing ipipe_active_mm without
> disturbing its current use in non-atomic switch_mm.
> 
> Moreover, part of the problems on x86 was the non-atomic update of
> cpu_vm_mask vs. actually switching the hardware. How do you deal with
> cpu_vm_mask on ARM? Why can you call switch_mm without IRQ protection?
> My impression is that only the (costly) cpu_switch_mm should be allowed
> to be preemptible to have a safe state around it. I'm also asking as we
> would have to address this on x86 & co. once switch_mm could be called
> with undefined prev_mm.

when ipipe_active_mm is set to NULL, switch_mm does not touch the
current mm cpu_vm_mask, since it does not know what the current mm is.
However, we have a problem when switching back to root: we do not clear
the previous mm cpu_vm_mask since we do not call switch_mm in
xnarch_switch_to, we should fix this. So, I think we can extend the zone
where ipipe_active_mm is set to NULL, to protect this area from the
effects of preemption by Xenomai.

So, IMO, what we need is something like ipipe_mm_switch_protect which
sets ipipe_active_mm to NULL on ARM, and results in local_irq_save_hw on
x86.

> 
> Jan
> 


-- 
                                          Gilles


_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to