Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-09-01 Thread Gilles Chanteperdrix
Philippe Gerum wrote:
 On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
 Ok, Gilles did not grumble at you, so I'm daring the following patch,
 since I agree with you here. Totally untested, not even compiled, just
 for the fun of getting lockups and/or threads in limbos. Nah, just
 kidding, your shiny SMP box should be bricked even before that:

 diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
 index f75c6f6..6ad66ba 100644
 --- a/include/nucleus/sched.h
 +++ b/include/nucleus/sched.h
 @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
 xnsched *sched)
  #define xnsched_set_resched(__sched__) do {
 \
xnsched_t *current_sched = xnpod_current_sched();
 \
xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  \
 To increase the probability of regressions: What about moving the above
 line...

 -  if (unlikely(current_sched != (__sched__)))  
 \
 -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);
 \
setbits(current_sched-status, XNRESCHED);   
 \
 -  /* remote will set XNRESCHED locally in the IPI handler */   
 \
 +  if (current_sched != (__sched__))
 \
 +  setbits((__sched__)-status, XNRESCHED); 
 \
 ...into this conditional block? Then you should be able to...

  } while (0)
  
  void xnsched_zombie_hooks(struct xnthread *thread);
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 623bdff..cff76c2 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
 interrupts off. */
 xnshadow_rpi_check();
 }
  #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
 -   /*
 -* xnsched_set_resched() did set the resched mask remotely. We
 -* just need to make sure that our rescheduling request won't
 -* be filtered out locally when testing for XNRESCHED
 -* presence.
 -*/
 -   setbits(sched-status, XNRESCHED);
 xnpod_schedule();
  }
  
 @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
 xnsched *sched)
  {
 int cpu = xnsched_cpu(sched), resched;
  
 -   resched = xnarch_cpu_isset(cpu, sched-resched);
 -   xnarch_cpu_clear(cpu, sched-resched);
 +   resched = testbits(sched-status, XNRESCHED);
  #ifdef CONFIG_SMP
 /* Send resched IPI to remote CPU(s). */
 +   xnarch_cpu_clear(cpu, sched-resched);
 ...drop the line above as well.

 if (unlikely(xnsched_resched_p(sched))) {
 xnarch_send_ipi(sched-resched);
 xnarch_cpus_clear(sched-resched);

 
 Yes, I do think that we are way too stable on SMP boxes these days.
 Let's merge this as well to bring the fun back.
 

The current cpu bit in the resched cpu mask allowed us to know whether
the local cpu actually needed rescheduling. At least on SMP. It may
happen that only remote cpus were set, so, in that case, we were only
sending the IPI, then exiting __xnpod_schedule.


-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-09-01 Thread Philippe Gerum
On Wed, 2010-09-01 at 10:39 +0200, Gilles Chanteperdrix wrote:
 Philippe Gerum wrote:
  On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
  Ok, Gilles did not grumble at you, so I'm daring the following patch,
  since I agree with you here. Totally untested, not even compiled, just
  for the fun of getting lockups and/or threads in limbos. Nah, just
  kidding, your shiny SMP box should be bricked even before that:
 
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index f75c6f6..6ad66ba 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
  xnsched *sched)
   #define xnsched_set_resched(__sched__) do {  
  \
 xnsched_t *current_sched = xnpod_current_sched();  
  \
 xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);
  \
  To increase the probability of regressions: What about moving the above
  line...
 
  -  if (unlikely(current_sched != (__sched__)))
  \
  -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);  
  \
 setbits(current_sched-status, XNRESCHED); 
  \
  -  /* remote will set XNRESCHED locally in the IPI handler */ 
  \
  +  if (current_sched != (__sched__))  
  \
  +  setbits((__sched__)-status, XNRESCHED);   
  \
  ...into this conditional block? Then you should be able to...
 
   } while (0)
   
   void xnsched_zombie_hooks(struct xnthread *thread);
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 623bdff..cff76c2 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
  interrupts off. */
xnshadow_rpi_check();
}
   #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
  - /*
  -  * xnsched_set_resched() did set the resched mask remotely. We
  -  * just need to make sure that our rescheduling request won't
  -  * be filtered out locally when testing for XNRESCHED
  -  * presence.
  -  */
  - setbits(sched-status, XNRESCHED);
xnpod_schedule();
   }
   
  @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
  xnsched *sched)
   {
int cpu = xnsched_cpu(sched), resched;
   
  - resched = xnarch_cpu_isset(cpu, sched-resched);
  - xnarch_cpu_clear(cpu, sched-resched);
  + resched = testbits(sched-status, XNRESCHED);
   #ifdef CONFIG_SMP
/* Send resched IPI to remote CPU(s). */
  + xnarch_cpu_clear(cpu, sched-resched);
  ...drop the line above as well.
 
if (unlikely(xnsched_resched_p(sched))) {
xnarch_send_ipi(sched-resched);
xnarch_cpus_clear(sched-resched);
 
  
  Yes, I do think that we are way too stable on SMP boxes these days.
  Let's merge this as well to bring the fun back.
  
 
 The current cpu bit in the resched cpu mask allowed us to know whether
 the local cpu actually needed rescheduling. At least on SMP. It may
 happen that only remote cpus were set, so, in that case, we were only
 sending the IPI, then exiting __xnpod_schedule.
 

So the choice here is, in SMP non-debug mode only, between:

- setting and clearing a bit at each local rescheduling unconditionally
- peeking at the runqueue uselessly at each rescheduling only involving
remote threads

The answer does not seem obvious.

 

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-09-01 Thread Gilles Chanteperdrix
On Wed, Sep 01, 2010 at 12:52:15PM +0200, Philippe Gerum wrote:
 On Wed, 2010-09-01 at 10:39 +0200, Gilles Chanteperdrix wrote:
  Philippe Gerum wrote:
   On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
   Philippe Gerum wrote:
   Ok, Gilles did not grumble at you, so I'm daring the following patch,
   since I agree with you here. Totally untested, not even compiled, just
   for the fun of getting lockups and/or threads in limbos. Nah, just
   kidding, your shiny SMP box should be bricked even before that:
  
   diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
   index f75c6f6..6ad66ba 100644
   --- a/include/nucleus/sched.h
   +++ b/include/nucleus/sched.h
   @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
   xnsched *sched)
#define xnsched_set_resched(__sched__) do {
   \
  xnsched_t *current_sched = xnpod_current_sched();
   \
  xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  
   \
   To increase the probability of regressions: What about moving the above
   line...
  
   -  if (unlikely(current_sched != (__sched__)))  
   \
   -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);
   \
  setbits(current_sched-status, XNRESCHED);   
   \
   -  /* remote will set XNRESCHED locally in the IPI handler */   
   \
   +  if (current_sched != (__sched__))
   \
   +  setbits((__sched__)-status, XNRESCHED); 
   \
   ...into this conditional block? Then you should be able to...
  
} while (0)

void xnsched_zombie_hooks(struct xnthread *thread);
   diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
   index 623bdff..cff76c2 100644
   --- a/ksrc/nucleus/pod.c
   +++ b/ksrc/nucleus/pod.c
   @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with 
   hw interrupts off. */
   xnshadow_rpi_check();
   }
#endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
   -   /*
   -* xnsched_set_resched() did set the resched mask remotely. We
   -* just need to make sure that our rescheduling request won't
   -* be filtered out locally when testing for XNRESCHED
   -* presence.
   -*/
   -   setbits(sched-status, XNRESCHED);
   xnpod_schedule();
}

   @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
   xnsched *sched)
{
   int cpu = xnsched_cpu(sched), resched;

   -   resched = xnarch_cpu_isset(cpu, sched-resched);
   -   xnarch_cpu_clear(cpu, sched-resched);
   +   resched = testbits(sched-status, XNRESCHED);
#ifdef CONFIG_SMP
   /* Send resched IPI to remote CPU(s). */
   +   xnarch_cpu_clear(cpu, sched-resched);
   ...drop the line above as well.
  
   if (unlikely(xnsched_resched_p(sched))) {
   xnarch_send_ipi(sched-resched);
   xnarch_cpus_clear(sched-resched);
  
   
   Yes, I do think that we are way too stable on SMP boxes these days.
   Let's merge this as well to bring the fun back.
   
  
  The current cpu bit in the resched cpu mask allowed us to know whether
  the local cpu actually needed rescheduling. At least on SMP. It may
  happen that only remote cpus were set, so, in that case, we were only
  sending the IPI, then exiting __xnpod_schedule.
  
 
 So the choice here is, in SMP non-debug mode only, between:
 
 - setting and clearing a bit at each local rescheduling unconditionally
 - peeking at the runqueue uselessly at each rescheduling only involving
 remote threads
 
 The answer does not seem obvious.

Well... It will probably even be hard to come up with a benchmark which 
shows a difference between the two solutions. So, if the current one is 
the simplest one, then so be it.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-31 Thread Philippe Gerum
On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  Ok, Gilles did not grumble at you, so I'm daring the following patch,
  since I agree with you here. Totally untested, not even compiled, just
  for the fun of getting lockups and/or threads in limbos. Nah, just
  kidding, your shiny SMP box should be bricked even before that:
  
  diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
  index f75c6f6..6ad66ba 100644
  --- a/include/nucleus/sched.h
  +++ b/include/nucleus/sched.h
  @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
  xnsched *sched)
   #define xnsched_set_resched(__sched__) do {
  \
 xnsched_t *current_sched = xnpod_current_sched();
  \
 xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);  \
 
 To increase the probability of regressions: What about moving the above
 line...
 
  -  if (unlikely(current_sched != (__sched__)))  
  \
  -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);
  \
 setbits(current_sched-status, XNRESCHED);   
  \
  -  /* remote will set XNRESCHED locally in the IPI handler */   
  \
  +  if (current_sched != (__sched__))
  \
  +  setbits((__sched__)-status, XNRESCHED); 
  \
 
 ...into this conditional block? Then you should be able to...
 
   } while (0)
   
   void xnsched_zombie_hooks(struct xnthread *thread);
  diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
  index 623bdff..cff76c2 100644
  --- a/ksrc/nucleus/pod.c
  +++ b/ksrc/nucleus/pod.c
  @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
  interrupts off. */
  xnshadow_rpi_check();
  }
   #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
  -   /*
  -* xnsched_set_resched() did set the resched mask remotely. We
  -* just need to make sure that our rescheduling request won't
  -* be filtered out locally when testing for XNRESCHED
  -* presence.
  -*/
  -   setbits(sched-status, XNRESCHED);
  xnpod_schedule();
   }
   
  @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
  xnsched *sched)
   {
  int cpu = xnsched_cpu(sched), resched;
   
  -   resched = xnarch_cpu_isset(cpu, sched-resched);
  -   xnarch_cpu_clear(cpu, sched-resched);
  +   resched = testbits(sched-status, XNRESCHED);
   #ifdef CONFIG_SMP
  /* Send resched IPI to remote CPU(s). */
  +   xnarch_cpu_clear(cpu, sched-resched);
 
 ...drop the line above as well.
 
  if (unlikely(xnsched_resched_p(sched))) {
  xnarch_send_ipi(sched-resched);
  xnarch_cpus_clear(sched-resched);
  
 

Yes, I do think that we are way too stable on SMP boxes these days.
Let's merge this as well to bring the fun back.

-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-31 Thread Philippe Gerum
On Tue, 2010-08-31 at 09:09 +0200, Philippe Gerum wrote:
 On Mon, 2010-08-30 at 17:39 +0200, Jan Kiszka wrote:
  Philippe Gerum wrote:
   Ok, Gilles did not grumble at you, so I'm daring the following patch,
   since I agree with you here. Totally untested, not even compiled, just
   for the fun of getting lockups and/or threads in limbos. Nah, just
   kidding, your shiny SMP box should be bricked even before that:
   
   diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
   index f75c6f6..6ad66ba 100644
   --- a/include/nucleus/sched.h
   +++ b/include/nucleus/sched.h
   @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct 
   xnsched *sched)
#define xnsched_set_resched(__sched__) do {  
   \
  xnsched_t *current_sched = xnpod_current_sched();  
   \
  xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);
   \
  
  To increase the probability of regressions: What about moving the above
  line...
  
   -  if (unlikely(current_sched != (__sched__)))
   \
   -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);  
   \
  setbits(current_sched-status, XNRESCHED); 
   \
   -  /* remote will set XNRESCHED locally in the IPI handler */ 
   \
   +  if (current_sched != (__sched__))  
   \
   +  setbits((__sched__)-status, XNRESCHED);   
   \
  
  ...into this conditional block? Then you should be able to...
  
} while (0)

void xnsched_zombie_hooks(struct xnthread *thread);
   diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
   index 623bdff..cff76c2 100644
   --- a/ksrc/nucleus/pod.c
   +++ b/ksrc/nucleus/pod.c
   @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
   interrupts off. */
 xnshadow_rpi_check();
 }
#endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
   - /*
   -  * xnsched_set_resched() did set the resched mask remotely. We
   -  * just need to make sure that our rescheduling request won't
   -  * be filtered out locally when testing for XNRESCHED
   -  * presence.
   -  */
   - setbits(sched-status, XNRESCHED);
 xnpod_schedule();
}

   @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct 
   xnsched *sched)
{
 int cpu = xnsched_cpu(sched), resched;

   - resched = xnarch_cpu_isset(cpu, sched-resched);
   - xnarch_cpu_clear(cpu, sched-resched);
   + resched = testbits(sched-status, XNRESCHED);
#ifdef CONFIG_SMP
 /* Send resched IPI to remote CPU(s). */
   + xnarch_cpu_clear(cpu, sched-resched);
  
  ...drop the line above as well.
  
 if (unlikely(xnsched_resched_p(sched))) {
 xnarch_send_ipi(sched-resched);
 xnarch_cpus_clear(sched-resched);
   
  
 
 Yes, I do think that we are way too stable on SMP boxes these days.
 Let's merge this as well to bring the fun back.
 

All worked according to plan, this introduced a nice lockup under
switchtest load. Unfortunately, a solution exists to fix it:

--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -176,17 +176,17 @@ static inline int xnsched_self_resched_p(struct xnsched 
*sched)
 
 /* Set self resched flag for the given scheduler. */
 #define xnsched_set_self_resched(__sched__) do {   \
-  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched); \
   setbits((__sched__)-status, XNRESCHED); \
 } while (0)


-- 
Philippe.



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-30 Thread Jan Kiszka
Jan Kiszka wrote:
 Philippe Gerum wrote:
 This makes sense. I'm currently testing the patch below which implements
 a close variant of Gilles's proposal. Could you try it as well, to see
 if things improve?
 http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3200660065146915976c193387bf0851be10d0cc
 
 Will test ASAP.

Works fine here so far.

Jan

 
 The logic makes sure that we can keep calling xnsched_set_resched() then
 xnpod_schedule() outside of the same critical section, which is
 something we need. Otherwise this requirement would extend to
 xnpod_suspend/resume_thread(), which is not acceptable.
 
 I still wonder if things can't be even simpler. What is the purpose of
 xnsched_t::resched? I first thought it's just there to coalesce multiple
 remote reschedule requests, thus IPIs triggered by one CPU over
 successive wakeups etc. If that is true, why going through resched for
 local changes, why not setting XNRESCHED directly? And why not setting
 the remote XNRESCHED instead of remote's xnsched_t::resched?
 
 Jan
 

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-30 Thread Philippe Gerum
On Mon, 2010-08-30 at 10:51 +0200, Jan Kiszka wrote:
 Philippe Gerum wrote:
  On Fri, 2010-08-27 at 20:09 +0200, Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Gilles Chanteperdrix wrote:
  Jan Kiszka wrote:
  Hi,
 
  I'm hitting that bug check in __xnpod_schedule after
  xnintr_clock_handler issued a xnpod_schedule like this:
 
if (--sched-inesting == 0) {
__clrbits(sched-status, XNINIRQ);
xnpod_schedule();
}
 
  Either the assumption behind the bug check is no longer correct 
  (no call
  to xnpod_schedule() without a real need), or we should check for
  __xnpod_test_resched(sched) in xnintr_clock_handler (but under 
  nklock then).
 
  Comments?
  You probably have a real bug. This BUG_ON means that the scheduler 
  is
  about to switch context for real, whereas the resched bit is not 
  set,
  which is wrong.
  This happened over my 2.6.35 port - maybe some spurious IRQ 
  enabling.
  Debugging further...
  You should look for something which changes the scheduler state 
  without
  setting the resched bit, or for something which clears the bit 
  without
  taking the scheduler changes into account.
  It looks like a generic Xenomai issue on SMP boxes, though a mostly
  harmless one:
 
  The task that was scheduled in without XNRESCHED set locally has been
  woken up by a remote CPU. The waker requeued the task and set the
  resched condition for itself and in the resched proxy mask for the
  remote CPU. But there is at least one place in the Xenomai code where 
  we
  drop the nklock between xnsched_set_resched and xnpod_schedule:
  do_taskexit_event (I bet there are even more). Now the resched target
  CPU runs into a timer handler, issues xnpod_schedule unconditionally,
  and happens to find the woken-up task before it is actually informed 
  via
  an IPI.
 
  I think this is a harmless race, but it ruins the debug assertion
  need_resched != 0.
  Not that harmless, since without the debugging code, we would miss the
  reschedule too...
  Ok. But we would finally reschedule when handling the IPI. So, the
  effect we see is a useless delay in the rescheduling.
 
  Depends on the POV: The interrupt or context switch between set_resched
  and xnpod_reschedule that may defer rescheduling may also hit us before
  we were able to wake up the thread at all. The worst case should not
  differ significantly.
  Yes, and whether we set the bit and call xnpod_schedule atomically does
  not really matter either: the IPI takes time to propagate, and since
  xnarch_send_ipi does not wait for the IPI to have been received on the
  remote CPU, there is no guarantee that xnpod_schedule could not have
  been called in the mean time.
  Indeed.
 
  More importantly, since in order to do an action on a remote xnsched_t,
  we need to hold the nklock, is there any point in not setting the
  XNRESCHED bit on that distant structure, at the same time as when we set
  the cpu bit on the local sched structure mask and send the IPI? This
  way, setting the XNRESCHED bit in the IPI handler would no longer be
  necessary, and we would avoid the race.
 
  I guess so. The IPI isn't more than a hint that something /may/ have
  changed in the schedule anyway.
 
  
  This makes sense. I'm currently testing the patch below which implements
  a close variant of Gilles's proposal. Could you try it as well, to see
  if things improve?
  http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=3200660065146915976c193387bf0851be10d0cc
 
 Will test ASAP.
 
  
  The logic makes sure that we can keep calling xnsched_set_resched() then
  xnpod_schedule() outside of the same critical section, which is
  something we need. Otherwise this requirement would extend to
  xnpod_suspend/resume_thread(), which is not acceptable.
 
 I still wonder if things can't be even simpler. What is the purpose of
 xnsched_t::resched? I first thought it's just there to coalesce multiple
 remote reschedule requests, thus IPIs triggered by one CPU over
 successive wakeups etc. If that is true, why going through resched for
 local changes, why not setting XNRESCHED directly? And why not setting
 the remote XNRESCHED instead of remote's xnsched_t::resched?
 

Ok, Gilles did not grumble at you, so I'm daring the following patch,
since I agree with you here. Totally untested, not even compiled, just
for the fun of getting lockups and/or threads in limbos. Nah, just
kidding, your shiny SMP box should be bricked even before that:

diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
index f75c6f6..6ad66ba 100644
--- a/include/nucleus/sched.h
+++ b/include/nucleus/sched.h
@@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched 
*sched)
 #define xnsched_set_resched(__sched__) do {\
   xnsched_t *current_sched = xnpod_current_sched();

Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-30 Thread Jan Kiszka
Philippe Gerum wrote:
 Ok, Gilles did not grumble at you, so I'm daring the following patch,
 since I agree with you here. Totally untested, not even compiled, just
 for the fun of getting lockups and/or threads in limbos. Nah, just
 kidding, your shiny SMP box should be bricked even before that:
 
 diff --git a/include/nucleus/sched.h b/include/nucleus/sched.h
 index f75c6f6..6ad66ba 100644
 --- a/include/nucleus/sched.h
 +++ b/include/nucleus/sched.h
 @@ -184,10 +184,9 @@ static inline int xnsched_self_resched_p(struct xnsched 
 *sched)
  #define xnsched_set_resched(__sched__) do {  \
xnsched_t *current_sched = xnpod_current_sched();  \
xnarch_cpu_set(xnsched_cpu(__sched__), current_sched-resched);\

To increase the probability of regressions: What about moving the above
line...

 -  if (unlikely(current_sched != (__sched__)))
 \
 -  xnarch_cpu_set(xnsched_cpu(__sched__), (__sched__)-resched);  \
setbits(current_sched-status, XNRESCHED); \
 -  /* remote will set XNRESCHED locally in the IPI handler */ \
 +  if (current_sched != (__sched__))  \
 +  setbits((__sched__)-status, XNRESCHED);   
 \

...into this conditional block? Then you should be able to...

  } while (0)
  
  void xnsched_zombie_hooks(struct xnthread *thread);
 diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
 index 623bdff..cff76c2 100644
 --- a/ksrc/nucleus/pod.c
 +++ b/ksrc/nucleus/pod.c
 @@ -285,13 +285,6 @@ void xnpod_schedule_handler(void) /* Called with hw 
 interrupts off. */
   xnshadow_rpi_check();
   }
  #endif /* CONFIG_SMP  CONFIG_XENO_OPT_PRIOCPL */
 - /*
 -  * xnsched_set_resched() did set the resched mask remotely. We
 -  * just need to make sure that our rescheduling request won't
 -  * be filtered out locally when testing for XNRESCHED
 -  * presence.
 -  */
 - setbits(sched-status, XNRESCHED);
   xnpod_schedule();
  }
  
 @@ -2167,10 +2160,10 @@ static inline int __xnpod_test_resched(struct xnsched 
 *sched)
  {
   int cpu = xnsched_cpu(sched), resched;
  
 - resched = xnarch_cpu_isset(cpu, sched-resched);
 - xnarch_cpu_clear(cpu, sched-resched);
 + resched = testbits(sched-status, XNRESCHED);
  #ifdef CONFIG_SMP
   /* Send resched IPI to remote CPU(s). */
 + xnarch_cpu_clear(cpu, sched-resched);

...drop the line above as well.

   if (unlikely(xnsched_resched_p(sched))) {
   xnarch_send_ipi(sched-resched);
   xnarch_cpus_clear(sched-resched);
 

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-30 Thread Gilles Chanteperdrix
Philippe Gerum wrote:
 Ok, Gilles did not grumble at you, so I'm daring the following patch,
 since I agree with you here. Totally untested, not even compiled, just
 for the fun of getting lockups and/or threads in limbos. Nah, just
 kidding, your shiny SMP box should be bricked even before that:

We wrote that code many years ago. I have no idea what we had in mind. I
guess we wanted the resched bit never to be set from a distant cpu. The
debug check had been disabled the whole time since we introduced
XENO_DEBUG, and started working again only recently.

Anyway, I have nothing against a bit of cleanup.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

  if (--sched-inesting == 0) {
  __clrbits(sched-status, XNINIRQ);
  xnpod_schedule();
  }

 Either the assumption behind the bug check is no longer correct (no call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then).

 Comments?
 
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.

This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
Debugging further...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

 if (--sched-inesting == 0) {
 __clrbits(sched-status, XNINIRQ);
 xnpod_schedule();
 }

 Either the assumption behind the bug check is no longer correct (no call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then).

 Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
 
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...

You should look for something which changes the scheduler state without
setting the resched bit, or for something which clears the bit without
taking the scheduler changes into account.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

   if (--sched-inesting == 0) {
   __clrbits(sched-status, XNINIRQ);
   xnpod_schedule();
   }

 Either the assumption behind the bug check is no longer correct (no call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...
 You should look for something which changes the scheduler state without
 setting the resched bit, or for something which clears the bit without
 taking the scheduler changes into account.
 
 It looks like a generic Xenomai issue on SMP boxes, though a mostly
 harmless one:
 
 The task that was scheduled in without XNRESCHED set locally has been
 woken up by a remote CPU. The waker requeued the task and set the
 resched condition for itself and in the resched proxy mask for the
 remote CPU. But there is at least one place in the Xenomai code where we
 drop the nklock between xnsched_set_resched and xnpod_schedule:
 do_taskexit_event (I bet there are even more). Now the resched target
 CPU runs into a timer handler, issues xnpod_schedule unconditionally,
 and happens to find the woken-up task before it is actually informed via
 an IPI.
 
 I think this is a harmless race, but it ruins the debug assertion
 need_resched != 0.

Not that harmless, since without the debugging code, we would miss the
reschedule too...

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

  if (--sched-inesting == 0) {
  __clrbits(sched-status, XNINIRQ);
  xnpod_schedule();
  }

 Either the assumption behind the bug check is no longer correct (no call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...
 You should look for something which changes the scheduler state without
 setting the resched bit, or for something which clears the bit without
 taking the scheduler changes into account.
 It looks like a generic Xenomai issue on SMP boxes, though a mostly
 harmless one:

 The task that was scheduled in without XNRESCHED set locally has been
 woken up by a remote CPU. The waker requeued the task and set the
 resched condition for itself and in the resched proxy mask for the
 remote CPU. But there is at least one place in the Xenomai code where we
 drop the nklock between xnsched_set_resched and xnpod_schedule:
 do_taskexit_event (I bet there are even more). Now the resched target
 CPU runs into a timer handler, issues xnpod_schedule unconditionally,
 and happens to find the woken-up task before it is actually informed via
 an IPI.

 I think this is a harmless race, but it ruins the debug assertion
 need_resched != 0.
 
 Not that harmless, since without the debugging code, we would miss the
 reschedule too...

Ok. But we would finally reschedule when handling the IPI. So, the
effect we see is a useless delay in the rescheduling.

-- 
Gilles.


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

 if (--sched-inesting == 0) {
 __clrbits(sched-status, XNINIRQ);
 xnpod_schedule();
 }

 Either the assumption behind the bug check is no longer correct (no call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...
 You should look for something which changes the scheduler state without
 setting the resched bit, or for something which clears the bit without
 taking the scheduler changes into account.
 It looks like a generic Xenomai issue on SMP boxes, though a mostly
 harmless one:

 The task that was scheduled in without XNRESCHED set locally has been
 woken up by a remote CPU. The waker requeued the task and set the
 resched condition for itself and in the resched proxy mask for the
 remote CPU. But there is at least one place in the Xenomai code where we
 drop the nklock between xnsched_set_resched and xnpod_schedule:
 do_taskexit_event (I bet there are even more). Now the resched target
 CPU runs into a timer handler, issues xnpod_schedule unconditionally,
 and happens to find the woken-up task before it is actually informed via
 an IPI.

 I think this is a harmless race, but it ruins the debug assertion
 need_resched != 0.
 Not that harmless, since without the debugging code, we would miss the
 reschedule too...
 
 Ok. But we would finally reschedule when handling the IPI. So, the
 effect we see is a useless delay in the rescheduling.
 

Depends on the POV: The interrupt or context switch between set_resched
and xnpod_reschedule that may defer rescheduling may also hit us before
we were able to wake up the thread at all. The worst case should not
differ significantly.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

if (--sched-inesting == 0) {
__clrbits(sched-status, XNINIRQ);
xnpod_schedule();
}

 Either the assumption behind the bug check is no longer correct (no 
 call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...
 You should look for something which changes the scheduler state without
 setting the resched bit, or for something which clears the bit without
 taking the scheduler changes into account.
 It looks like a generic Xenomai issue on SMP boxes, though a mostly
 harmless one:

 The task that was scheduled in without XNRESCHED set locally has been
 woken up by a remote CPU. The waker requeued the task and set the
 resched condition for itself and in the resched proxy mask for the
 remote CPU. But there is at least one place in the Xenomai code where we
 drop the nklock between xnsched_set_resched and xnpod_schedule:
 do_taskexit_event (I bet there are even more). Now the resched target
 CPU runs into a timer handler, issues xnpod_schedule unconditionally,
 and happens to find the woken-up task before it is actually informed via
 an IPI.

 I think this is a harmless race, but it ruins the debug assertion
 need_resched != 0.
 Not that harmless, since without the debugging code, we would miss the
 reschedule too...
 Ok. But we would finally reschedule when handling the IPI. So, the
 effect we see is a useless delay in the rescheduling.

 
 Depends on the POV: The interrupt or context switch between set_resched
 and xnpod_reschedule that may defer rescheduling may also hit us before
 we were able to wake up the thread at all. The worst case should not
 differ significantly.

Yes, and whether we set the bit and call xnpod_schedule atomically does
not really matter either: the IPI takes time to propagate, and since
xnarch_send_ipi does not wait for the IPI to have been received on the
remote CPU, there is no guarantee that xnpod_schedule could not have
been called in the mean time.

More importantly, since in order to do an action on a remote xnsched_t,
we need to hold the nklock, is there any point in not setting the
XNRESCHED bit on that distant structure, at the same time as when we set
the cpu bit on the local sched structure mask and send the IPI? This
way, setting the XNRESCHED bit in the IPI handler would no longer be
necessary, and we would avoid the race.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

   if (--sched-inesting == 0) {
   __clrbits(sched-status, XNINIRQ);
   xnpod_schedule();
   }

 Either the assumption behind the bug check is no longer correct (no 
 call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...
 You should look for something which changes the scheduler state without
 setting the resched bit, or for something which clears the bit without
 taking the scheduler changes into account.
 It looks like a generic Xenomai issue on SMP boxes, though a mostly
 harmless one:

 The task that was scheduled in without XNRESCHED set locally has been
 woken up by a remote CPU. The waker requeued the task and set the
 resched condition for itself and in the resched proxy mask for the
 remote CPU. But there is at least one place in the Xenomai code where we
 drop the nklock between xnsched_set_resched and xnpod_schedule:
 do_taskexit_event (I bet there are even more). Now the resched target
 CPU runs into a timer handler, issues xnpod_schedule unconditionally,
 and happens to find the woken-up task before it is actually informed via
 an IPI.

 I think this is a harmless race, but it ruins the debug assertion
 need_resched != 0.
 Not that harmless, since without the debugging code, we would miss the
 reschedule too...
 Ok. But we would finally reschedule when handling the IPI. So, the
 effect we see is a useless delay in the rescheduling.

 Depends on the POV: The interrupt or context switch between set_resched
 and xnpod_reschedule that may defer rescheduling may also hit us before
 we were able to wake up the thread at all. The worst case should not
 differ significantly.
 
 Yes, and whether we set the bit and call xnpod_schedule atomically does
 not really matter either: the IPI takes time to propagate, and since
 xnarch_send_ipi does not wait for the IPI to have been received on the
 remote CPU, there is no guarantee that xnpod_schedule could not have
 been called in the mean time.

Indeed.

 
 More importantly, since in order to do an action on a remote xnsched_t,
 we need to hold the nklock, is there any point in not setting the
 XNRESCHED bit on that distant structure, at the same time as when we set
 the cpu bit on the local sched structure mask and send the IPI? This
 way, setting the XNRESCHED bit in the IPI handler would no longer be
 necessary, and we would avoid the race.
 

I guess so. The IPI isn't more than a hint that something /may/ have
changed in the schedule anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core