Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
>>>> Jan Kiszka wrote:
>>>>> Hi again,
>>>>>
>>>>> this is now basically the patch which seems to stabilized x86 /wrt mmu
>>>>> switches again:
>>>>>
>>>>> There were 3 race windows between setting active_mm of the current task
>>>>> and actually switching it (that's a noarch issue), there were several
>>>>> calls into switch_mm without proper hard interrupt protection (on archs
>>>>> that have no preemptible switch_mm, like x86) and there was a race in
>>>>> x86's leave_mm (as Gilles already remarked earlier in this thread -
>>>>> while I was looking at an old tree where smp_invalidate_interrupt took
>>>>> care of this).
>>>>>
>>>>> The patch is thought as a basis for further discussions about
>>>>>
>>>>>  o how to solve all the issues for all archs technically (ideally
>>>>>    without the need to patch noarch parts in an arch-specific way...)
>>>>>
>>>>>  o if anyone thinks there could be more spots like these (I've checked
>>>>>    the code only for x86 so far)
>>>>> diff --git a/fs/aio.c b/fs/aio.c
>>>>> index 76da125..0286f0f 100644
>>>>> --- a/fs/aio.c
>>>>> +++ b/fs/aio.c
>>>>> @@ -618,13 +618,16 @@ static void use_mm(struct mm_struct *mm)
>>>>>  {
>>>>>   struct mm_struct *active_mm;
>>>>>   struct task_struct *tsk = current;
>>>>> + unsigned long flags;
>>>>>  
>>>>>   task_lock(tsk);
>>>>>   active_mm = tsk->active_mm;
>>>>>   atomic_inc(&mm->mm_count);
>>>>>   tsk->mm = mm;
>>>>> + local_irq_save_hw_cond(flags);
>>>>>   tsk->active_mm = mm;
>>>>> - switch_mm(active_mm, mm, tsk);
>>>>> + __switch_mm(active_mm, mm, tsk);
>>>>> + local_irq_restore_hw_cond(flags);
>>>>>   task_unlock(tsk);
>>>>>  
>>>>>   mmdrop(active_mm);
>>>>> diff --git a/fs/exec.c b/fs/exec.c
>>>>> index 3b36c69..06591ac 100644
>>>>> --- a/fs/exec.c
>>>>> +++ b/fs/exec.c
>>>>> @@ -718,6 +718,7 @@ static int exec_mmap(struct mm_struct *mm)
>>>>>  {
>>>>>   struct task_struct *tsk;
>>>>>   struct mm_struct * old_mm, *active_mm;
>>>>> + unsigned long flags;
>>>>>  
>>>>>   /* Notify parent that we're no longer interested in the old VM */
>>>>>   tsk = current;
>>>>> @@ -740,8 +741,10 @@ static int exec_mmap(struct mm_struct *mm)
>>>>>   task_lock(tsk);
>>>>>   active_mm = tsk->active_mm;
>>>>>   tsk->mm = mm;
>>>>> + local_irq_save_hw_cond(flags);
>>>>>   tsk->active_mm = mm;
>>>>>   activate_mm(active_mm, mm);
>>>>> + local_irq_restore_hw_cond(flags);
>>>>>   task_unlock(tsk);
>>>>>   arch_pick_mmap_layout(mm);
>>>>>   if (old_mm) {
>>>>> diff --git a/kernel/fork.c b/kernel/fork.c
>>>>> index 01a836b..cf3b68a 100644
>>>>> --- a/kernel/fork.c
>>>>> +++ b/kernel/fork.c
>>>>> @@ -1665,11 +1665,14 @@ SYSCALL_DEFINE1(unshare, unsigned long, 
>>>>> unshare_flags)
>>>>>           }
>>>>>  
>>>>>           if (new_mm) {
>>>>> +                 unsigned long flags;
>>>>>                   mm = current->mm;
>>>>>                   active_mm = current->active_mm;
>>>>>                   current->mm = new_mm;
>>>>> +                 local_irq_save_hw_cond(flags);
>>>>>                   current->active_mm = new_mm;
>>>>>                   activate_mm(active_mm, new_mm);
>>>>> +                 local_irq_restore_hw_cond(flags);
>>>>>                   new_mm = mm;
>>>>>           }
>>>> Ok. These are patches of the noarch code which are dependent on x86
>>>> implementation.
>>>>
>>>> I think we need something like
>>>>
>>>> mm_change_enter(flags);
>>>> mm_change_leave(flags);
>>>>
>>>> which would resolve to ipipe_active_mm = NULL on architectures with
>>>> unlocked context switches, and to local_irq_save_hw/local_irq_restore on
>>>> x86?
>>> + we have to change switch_mm in the lockless case, and maybe also its
>>> caller (if I look at the arm code), to use ipipe_active_mm in order to
>>> decide if to switch or not - see my explanation of the possible race in
>>> aio.c.
>> This is already the way it is currently working. the "prev" passed to
>> switch_mm is always ipipe_active_mm.
>>
> 
> Ah, I missed the difference in arm's xnarch_leave_root. Now I got it.
> 
> OK, but this needs to be thought through in details again, specifically
> also the enter-root scenarios. I have no safe feeling about it yet,
> partly because switch_mm fiddles with ipipe_active_mm as well, but more
> generally because we used to have such critical races in this area for a
> "fairly" long time.

When reentering root with ipipe_active_mm NULL, the TIF_MMSWTICH_INT bit
is set, in which case the switch_mm preempted by Xenomai is restarted.

-- 
                                            Gilles.

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to