Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

What makes splmax() redundant for the unlocked context switch case? IMO
that bug is still present.

We can clean up xnintr_clock_handler a bit after the changes, will
follow up with a patch.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-20 Thread Jan Kiszka
On 2011-06-19 15:00, Gilles Chanteperdrix wrote:
 On 06/19/2011 01:17 PM, Gilles Chanteperdrix wrote:
 On 06/19/2011 12:14 PM, Gilles Chanteperdrix wrote:
 I am working on this ppd cleanup issue again, I am asking for help to
 find a fix in -head for all cases where the sys_ppd is needed during
 some cleanup.

 The problem is that when the ppd cleanup is invoked:
 - we have no guarantee that current is a thread from the Xenomai
 application;
 - if it is, current-mm is NULL.

 So, associating the sys_ppd to either current or current-mm does not
 work. What we could do is pass the sys_ppd to all the other ppds cleanup
 handlers, this would fix cases such as freeing mutexes fastlock, but
 that does not help when the sys_ppd is needed during a thread deletion hook.

 I would like to find a solution where simply calling xnsys_ppd_get()
 will work, where we do not have an xnsys_ppd_get for each context, such
 as for instance xnsys_ppd_get_by_mm/xnsys_ppd_get_by_task_struct,
 because it would be too error-prone.

 Any idea anyone?

 The best I could come up with: use a ptd to store the mm currently 
 being cleaned up, so that xnshadow_ppd_get continues to work, even
 in the middle of a cleanup.
 
 In order to also get xnshadow_ppd_get to work in task deletion hooks 
 (which is needed to avoid the issue at the origin of this thread), we 
 also need to set this ptd upon shadow mapping, so it is still there 
 when reaching the task deletion hook (where current-mm may be NULL). 
 Hence the patch:
 
 diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
 index b243600..6bc4210 100644
 --- a/ksrc/nucleus/shadow.c
 +++ b/ksrc/nucleus/shadow.c
 @@ -65,6 +65,11 @@ int nkthrptd;
  EXPORT_SYMBOL_GPL(nkthrptd);
  int nkerrptd;
  EXPORT_SYMBOL_GPL(nkerrptd);
 +int nkmmptd;
 +EXPORT_SYMBOL_GPL(nkmmptd);
 +
 +#define xnshadow_mmptd(t) ((t)-ptd[nkmmptd])
 +#define xnshadow_mm(t) ((struct mm_struct *)xnshadow_mmptd(t))

xnshadow_mm() can now return a no longer existing mm. So no user of
xnshadow_mm should ever dereference that pointer. Thus we better change
all that user to treat the return value as a void pointer e.g.

  
  struct xnskin_slot {
   struct xnskin_props *props;
 @@ -1304,6 +1309,8 @@ int xnshadow_map(xnthread_t *thread, xncompletion_t 
 __user *u_completion,
* friends.
*/
   xnshadow_thrptd(current) = thread;
 + xnshadow_mmptd(current) = current-mm;
 +
   rthal_enable_notifier(current);
  
   if (xnthread_base_priority(thread) == 0 
 @@ -2759,7 +2766,15 @@ static void detach_ppd(xnshadow_ppd_t * ppd)
  
  static inline void do_cleanup_event(struct mm_struct *mm)
  {
 + struct task_struct *p = current;
 + struct mm_struct *old;
 +
 + old = xnshadow_mm(p);
 + xnshadow_mmptd(p) = mm;
 +
   ppd_remove_mm(mm, detach_ppd);
 +
 + xnshadow_mmptd(p) = old;

I don't have the full picture yet, but that feels racy: If the context
over which we clean up that foreign mm is also using xnshadow_mmptd,
other threads in that process may dislike this temporary change.

  }
  
  RTHAL_DECLARE_CLEANUP_EVENT(cleanup_event);
 @@ -2925,7 +2940,7 @@ EXPORT_SYMBOL_GPL(xnshadow_unregister_interface);
  xnshadow_ppd_t *xnshadow_ppd_get(unsigned muxid)
  {
   if (xnpod_userspace_p())
 - return ppd_lookup(muxid, current-mm);
 + return ppd_lookup(muxid, xnshadow_mm(current) ?: current-mm);
  
   return NULL;
  }
 @@ -2960,8 +2975,9 @@ int xnshadow_mount(void)
   sema_init(completion_mutex, 1);
   nkthrptd = rthal_alloc_ptdkey();
   nkerrptd = rthal_alloc_ptdkey();
 + nkmmptd = rthal_alloc_ptdkey();
  
 - if (nkthrptd  0 || nkerrptd  0) {
 + if (nkthrptd  0 || nkerrptd  0 || nkmmptd  0) {
   printk(KERN_ERR Xenomai: cannot allocate PTD slots\n);
   return -ENOMEM;
   }
 diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
 index 6ce75e5..cc86852 100644
 --- a/ksrc/skins/posix/mutex.c
 +++ b/ksrc/skins/posix/mutex.c
 @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
   xnlock_put_irqrestore(nklock, s);
  
  #ifdef CONFIG_XENO_FASTSYNCH
 - /* We call xnheap_free even if the mutex is not pshared; when
 -this function is called from pse51_mutexq_cleanup, the
 -sem_heap is destroyed, or not the one to which the fastlock
 -belongs, xnheap will simply return an error. */

I think this comment is not completely obsolete. It still applies /wrt
shared/non-shared.

   xnheap_free(xnsys_ppd_get(mutex-attr.pshared)-sem_heap,
   mutex-synchbase.fastlock);
  #endif /* CONFIG_XENO_FASTSYNCH */
 
 

If we can resolve that potential race, this looks like a nice solution.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Gilles Chanteperdrix
On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).
 
 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.

No, the bug is between my keyboard and chair. On architectures with
unlocked context switches, the Linux task switch still happens with irqs
off, only the mm switch happens with irqs on.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-20 Thread Gilles Chanteperdrix
On 06/20/2011 07:07 PM, Jan Kiszka wrote:
  static inline void do_cleanup_event(struct mm_struct *mm)
  {
 +struct task_struct *p = current;
 +struct mm_struct *old;
 +
 +old = xnshadow_mm(p);
 +xnshadow_mmptd(p) = mm;
 +
  ppd_remove_mm(mm, detach_ppd);
 +
 +xnshadow_mmptd(p) = old;
 
 I don't have the full picture yet, but that feels racy: If the context
 over which we clean up that foreign mm is also using xnshadow_mmptd,
 other threads in that process may dislike this temporary change.

This mmptd is only used by xnshadow_ppd_get (which never dereferences it
by the way). And if the current thread is running the cleanup event, it
is not running any other service at the same time. So, this is safe.

An alternative implementation would be to use some global per-cpu
variable and disable the preemption during cleanup. But that would not
fix the case of the shadow cleanups where current-mm is already null.

The remaining problem is an irq interrupting the cleanup, and using
xnshadow_ppd_get. But I would not expect that to happen. I mean,
xnshadow_ppd_get is more a service to be used for implementation of
system calls.

 diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
 index 6ce75e5..cc86852 100644
 --- a/ksrc/skins/posix/mutex.c
 +++ b/ksrc/skins/posix/mutex.c
 @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
  xnlock_put_irqrestore(nklock, s);
  
  #ifdef CONFIG_XENO_FASTSYNCH
 -/* We call xnheap_free even if the mutex is not pshared; when
 -   this function is called from pse51_mutexq_cleanup, the
 -   sem_heap is destroyed, or not the one to which the fastlock
 -   belongs, xnheap will simply return an error. */
 
 I think this comment is not completely obsolete. It still applies /wrt
 shared/non-shared.

If we apply this patch after the one changing the ppd cleanup order,
everything falls in place (I can even put a warning in xnheap_destroy if
the heap has some bytes still in use when destroyed).

 If we can resolve that potential race, this looks like a nice solution.

It may look a bit overkill, since it uses an adeos ptd. On the other
hand, who else uses them anyway ;-)

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-20 19:33, Gilles Chanteperdrix wrote:
 On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.
 
 No, the bug is between my keyboard and chair. On architectures with
 unlocked context switches, the Linux task switch still happens with irqs
 off, only the mm switch happens with irqs on.

Then why do we call xnlock_get_irqsave in
xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are
off anyway?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Gilles Chanteperdrix
On 06/20/2011 09:38 PM, Jan Kiszka wrote:
 On 2011-06-20 19:33, Gilles Chanteperdrix wrote:
 On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.

 No, the bug is between my keyboard and chair. On architectures with
 unlocked context switches, the Linux task switch still happens with irqs
 off, only the mm switch happens with irqs on.
 
 Then why do we call xnlock_get_irqsave in
 xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are
 off anyway?

Because of the Xenomai task switch, not the Linux task switch.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-20 21:41, Gilles Chanteperdrix wrote:
 On 06/20/2011 09:38 PM, Jan Kiszka wrote:
 On 2011-06-20 19:33, Gilles Chanteperdrix wrote:
 On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.

 No, the bug is between my keyboard and chair. On architectures with
 unlocked context switches, the Linux task switch still happens with irqs
 off, only the mm switch happens with irqs on.

 Then why do we call xnlock_get_irqsave in
 xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are
 off anyway?
 
 Because of the Xenomai task switch, not the Linux task switch.

--verbose please.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Gilles Chanteperdrix
On 06/20/2011 09:41 PM, Jan Kiszka wrote:
 On 2011-06-20 21:41, Gilles Chanteperdrix wrote:
 On 06/20/2011 09:38 PM, Jan Kiszka wrote:
 On 2011-06-20 19:33, Gilles Chanteperdrix wrote:
 On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.

 No, the bug is between my keyboard and chair. On architectures with
 unlocked context switches, the Linux task switch still happens with irqs
 off, only the mm switch happens with irqs on.

 Then why do we call xnlock_get_irqsave in
 xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are
 off anyway?

 Because of the Xenomai task switch, not the Linux task switch.
 
 --verbose please.

There are two kind of task switches, switches between Linux tasks,
handled by Linux kernel function/macro/inline asm switch_to(). And those
between Xenomai tasks, handled by function/macro/inline asm
xnarch_switch_to().

Since a Linux kernel context switch may still be interrupted by a
(primary mode) interrupt which could decide to switch context, it can
not happen with interrupts enabled, due to the way it works (spill the
registers in a place relative to the SP, then change SP not atomically).

The Xenomai context switches have no such risk, so, they may happen with
irqs on, completely.

In case of relax, the two halves of context switches are not of the same
kind. The first half of the context switch is a Xenomai switch, but the
second half is the epilogue of a Linux context switch (which, by the
way, is why we need skipping all the house keeping in __xnpod_schedule
in that case, and also why we go to all the pain for keeping the two
context switches compatible), hence, even on machines with unlocked
context switches, irqs are off at this point.

Hope this is more clear.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-20 21:51, Gilles Chanteperdrix wrote:
 On 06/20/2011 09:41 PM, Jan Kiszka wrote:
 On 2011-06-20 21:41, Gilles Chanteperdrix wrote:
 On 06/20/2011 09:38 PM, Jan Kiszka wrote:
 On 2011-06-20 19:33, Gilles Chanteperdrix wrote:
 On 06/20/2011 06:43 PM, Jan Kiszka wrote:
 On 2011-06-19 17:41, Gilles Chanteperdrix wrote:
 Merged your whole branch, but took the liberty to change it a bit
 (replacing the commit concerning unlocked context switches with comments
 changes only, and changing the commit about xntbase_tick).

 What makes splmax() redundant for the unlocked context switch case? IMO
 that bug is still present.

 No, the bug is between my keyboard and chair. On architectures with
 unlocked context switches, the Linux task switch still happens with irqs
 off, only the mm switch happens with irqs on.

 Then why do we call xnlock_get_irqsave in
 xnsched_finish_unlocked_switch? Why not simply xnlock_get if irqs are
 off anyway?

 Because of the Xenomai task switch, not the Linux task switch.

 --verbose please.
 
 There are two kind of task switches, switches between Linux tasks,
 handled by Linux kernel function/macro/inline asm switch_to(). And those
 between Xenomai tasks, handled by function/macro/inline asm
 xnarch_switch_to().

xnarch_switch_to is the central entry point for everyone. It may decide
to branch to switch_to or __switch_to, or it simply handles all on its
own - that's depending on the arch.

 
 Since a Linux kernel context switch may still be interrupted by a
 (primary mode) interrupt which could decide to switch context, it can
 not happen with interrupts enabled, due to the way it works (spill the
 registers in a place relative to the SP, then change SP not atomically).
 
 The Xenomai context switches have no such risk, so, they may happen with
 irqs on, completely.
 
 In case of relax, the two halves of context switches are not of the same
 kind. The first half of the context switch is a Xenomai switch, but the
 second half is the epilogue of a Linux context switch (which, by the
 way, is why we need skipping all the house keeping in __xnpod_schedule
 in that case, and also why we go to all the pain for keeping the two
 context switches compatible), hence, even on machines with unlocked
 context switches, irqs are off at this point.
 
 Hope this is more clear.

That's all clear. But it's unclear how this maps to our key question.

Can you point out where in those paths irqs are disabled again (after
entering xnarch_switch_to) and left off for each of the unlocked
switching archs? I'm still skeptical that the need for disable irqs
during thread switch on some archs also leads to unconditionally
disabled hard irqs when returning from xnarch_switch_to.

But even if that's all the case today, we would better set this
requirement in stone:

diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index f2fc2ab..c4c5807 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -2273,6 +2273,8 @@ reschedule:

xnpod_switch_to(sched, prev, next);

+   XENO_BUGON(NUCLEUS, !irqs_disabled_hw());
+
 #ifdef CONFIG_XENO_OPT_PERVASIVE
/*
 * Test whether we transitioned from primary mode to secondary

[ just demonstrating, would require some cleanup ]

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Gilles Chanteperdrix
On 06/20/2011 10:41 PM, Jan Kiszka wrote:
 xnarch_switch_to is the central entry point for everyone. It may decide
 to branch to switch_to or __switch_to, or it simply handles all on its
 own - that's depending on the arch.

No, the Linux kernel does not know anything about xnarch_switch_to, so
the schedule() function continues to use switch_to happily.
xnarch_switch_to is only used to switch from xnthread_t to xnthread_t,
by __xnpod_schedule(). Now, that some architecture (namely x86) decide
that xnarch_switch_to should use switch_to (or more likely an inner
__switch_to) when the xnthread_t has a non NULL user_task member is an
implementation detail.

 Can you point out where in those paths irqs are disabled again (after
 entering xnarch_switch_to)

They are not disabled again after xnarch_switch_to, they are disabled
when starting switch_to.

 and left off for each of the unlocked
 switching archs? I'm still skeptical that the need for disable irqs
 during thread switch on some archs also leads to unconditionally
 disabled hard irqs when returning from xnarch_switch_to.
 
 But even if that's all the case today, we would better set this
 requirement in stone:
 
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index f2fc2ab..c4c5807 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -2273,6 +2273,8 @@ reschedule:
 
   xnpod_switch_to(sched, prev, next);
 
 + XENO_BUGON(NUCLEUS, !irqs_disabled_hw());
 +

You misunderstand me: only after the second half context switch in the
case of xnshadow_relax are the interrupts disabled. Because this second
half-switch started as a switch_to and not an xnarch_switch_to, so,
started as:

#define switch_to(prev,next,last)   \
do {\
local_irq_disable_hw_cond();\
last = __switch_to(prev,task_thread_info(prev),
task_thread_info(next));\
local_irq_enable_hw_cond(); \
} while (0)

(On ARM for instance).

But that is true, we could assert this in the shadow epilogue case.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PULL] native: Fix msendq fastlock leakage

2011-06-20 Thread Jan Kiszka
On 2011-06-20 19:46, Gilles Chanteperdrix wrote:
 On 06/20/2011 07:07 PM, Jan Kiszka wrote:
  static inline void do_cleanup_event(struct mm_struct *mm)
  {
 +   struct task_struct *p = current;
 +   struct mm_struct *old;
 +
 +   old = xnshadow_mm(p);
 +   xnshadow_mmptd(p) = mm;
 +
 ppd_remove_mm(mm, detach_ppd);
 +
 +   xnshadow_mmptd(p) = old;

 I don't have the full picture yet, but that feels racy: If the context
 over which we clean up that foreign mm is also using xnshadow_mmptd,
 other threads in that process may dislike this temporary change.
 
 This mmptd is only used by xnshadow_ppd_get (which never dereferences it
 by the way

I know that the current code is safe. Avoiding mm_struct types is about
keeping it safe in the future. Why track the type if its just an opaque
key and should be handled as such?

). And if the current thread is running the cleanup event, it
 is not running any other service at the same time. So, this is safe.

Ah, xnshadow_mmptd is per-thread, not per-process as I incorrectly
assumed. Then it's safe.

 
 An alternative implementation would be to use some global per-cpu
 variable and disable the preemption during cleanup. But that would not
 fix the case of the shadow cleanups where current-mm is already null.
 
 The remaining problem is an irq interrupting the cleanup, and using
 xnshadow_ppd_get. But I would not expect that to happen. I mean,
 xnshadow_ppd_get is more a service to be used for implementation of
 system calls.

Yes, that kind of usage would be a bug in the first place.

 
 diff --git a/ksrc/skins/posix/mutex.c b/ksrc/skins/posix/mutex.c
 index 6ce75e5..cc86852 100644
 --- a/ksrc/skins/posix/mutex.c
 +++ b/ksrc/skins/posix/mutex.c
 @@ -219,10 +219,6 @@ void pse51_mutex_destroy_internal(pse51_mutex_t *mutex,
 xnlock_put_irqrestore(nklock, s);
  
  #ifdef CONFIG_XENO_FASTSYNCH
 -   /* We call xnheap_free even if the mutex is not pshared; when
 -  this function is called from pse51_mutexq_cleanup, the
 -  sem_heap is destroyed, or not the one to which the fastlock
 -  belongs, xnheap will simply return an error. */

 I think this comment is not completely obsolete. It still applies /wrt
 shared/non-shared.
 
 If we apply this patch after the one changing the ppd cleanup order,
 everything falls in place (I can even put a warning in xnheap_destroy if
 the heap has some bytes still in use when destroyed).

We call xnheap_free even if the mutex is not pshared. This still
applies and is unrelated to the ppd changes.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : nucleus: Fix interrupt handler tails

2011-06-20 Thread Jan Kiszka
On 2011-06-20 22:52, Gilles Chanteperdrix wrote:
 On 06/20/2011 10:41 PM, Jan Kiszka wrote:
 xnarch_switch_to is the central entry point for everyone. It may decide
 to branch to switch_to or __switch_to, or it simply handles all on its
 own - that's depending on the arch.
 
 No, the Linux kernel does not know anything about xnarch_switch_to, so
 the schedule() function continues to use switch_to happily.
 xnarch_switch_to is only used to switch from xnthread_t to xnthread_t,
 by __xnpod_schedule(). Now, that some architecture (namely x86) decide
 that xnarch_switch_to should use switch_to (or more likely an inner
 __switch_to) when the xnthread_t has a non NULL user_task member is an
 implementation detail.
 
 Can you point out where in those paths irqs are disabled again (after
 entering xnarch_switch_to)
 
 They are not disabled again after xnarch_switch_to, they are disabled
 when starting switch_to.
 
  and left off for each of the unlocked
 switching archs? I'm still skeptical that the need for disable irqs
 during thread switch on some archs also leads to unconditionally
 disabled hard irqs when returning from xnarch_switch_to.

 But even if that's all the case today, we would better set this
 requirement in stone:

 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index f2fc2ab..c4c5807 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -2273,6 +2273,8 @@ reschedule:

  xnpod_switch_to(sched, prev, next);

 +XENO_BUGON(NUCLEUS, !irqs_disabled_hw());
 +
 
 You misunderstand me: only after the second half context switch in the
 case of xnshadow_relax are the interrupts disabled. Because this second
 half-switch started as a switch_to and not an xnarch_switch_to, so,
 started as:
 
 #define switch_to(prev,next,last)   \
 do {\
 local_irq_disable_hw_cond();\
 last = __switch_to(prev,task_thread_info(prev),
 task_thread_info(next));\
 local_irq_enable_hw_cond(); \
 } while (0)
 
 (On ARM for instance).

OK, that's now clear, thanks.

 
 But that is true, we could assert this in the shadow epilogue case.
 

I've queued a patch.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core