Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> Hi,
>
> I'm hitting that bug check in __xnpod_schedule after
> xnintr_clock_handler issued a xnpod_schedule like this:
>
>   if (--sched->inesting == 0) {
>   __clrbits(sched->status, XNINIRQ);
>   xnpod_schedule();
>   }
>
> Either the assumption behind the bug check is no longer correct (no 
> call
> to xnpod_schedule() without a real need), or we should check for
> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
> then).
>
> Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
>>> This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
>>> Debugging further...
>> You should look for something which changes the scheduler state without
>> setting the resched bit, or for something which clears the bit without
>> taking the scheduler changes into account.
> It looks like a generic Xenomai issue on SMP boxes, though a mostly
> harmless one:
>
> The task that was scheduled in without XNRESCHED set locally has been
> woken up by a remote CPU. The waker requeued the task and set the
> resched condition for itself and in the resched proxy mask for the
> remote CPU. But there is at least one place in the Xenomai code where we
> drop the nklock between xnsched_set_resched and xnpod_schedule:
> do_taskexit_event (I bet there are even more). Now the resched target
> CPU runs into a timer handler, issues xnpod_schedule unconditionally,
> and happens to find the woken-up task before it is actually informed via
> an IPI.
>
> I think this is a harmless race, but it ruins the debug assertion
> "need_resched != 0".
 Not that harmless, since without the debugging code, we would miss the
 reschedule too...
>>> Ok. But we would finally reschedule when handling the IPI. So, the
>>> effect we see is a useless delay in the rescheduling.
>>>
>> Depends on the POV: The interrupt or context switch between set_resched
>> and xnpod_reschedule that may defer rescheduling may also hit us before
>> we were able to wake up the thread at all. The worst case should not
>> differ significantly.
> 
> Yes, and whether we set the bit and call xnpod_schedule atomically does
> not really matter either: the IPI takes time to propagate, and since
> xnarch_send_ipi does not wait for the IPI to have been received on the
> remote CPU, there is no guarantee that xnpod_schedule could not have
> been called in the mean time.

Indeed.

> 
> More importantly, since in order to do an action on a remote xnsched_t,
> we need to hold the nklock, is there any point in not setting the
> XNRESCHED bit on that distant structure, at the same time as when we set
> the cpu bit on the local sched structure mask and send the IPI? This
> way, setting the XNRESCHED bit in the IPI handler would no longer be
> necessary, and we would avoid the race.
> 

I guess so. The IPI isn't more than a hint that something /may/ have
changed in the schedule anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

if (--sched->inesting == 0) {
__clrbits(sched->status, XNINIRQ);
xnpod_schedule();
}

 Either the assumption behind the bug check is no longer correct (no 
 call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
>>> You probably have a real bug. This BUG_ON means that the scheduler is
>>> about to switch context for real, whereas the resched bit is not set,
>>> which is wrong.
>> This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
>> Debugging further...
> You should look for something which changes the scheduler state without
> setting the resched bit, or for something which clears the bit without
> taking the scheduler changes into account.
 It looks like a generic Xenomai issue on SMP boxes, though a mostly
 harmless one:

 The task that was scheduled in without XNRESCHED set locally has been
 woken up by a remote CPU. The waker requeued the task and set the
 resched condition for itself and in the resched proxy mask for the
 remote CPU. But there is at least one place in the Xenomai code where we
 drop the nklock between xnsched_set_resched and xnpod_schedule:
 do_taskexit_event (I bet there are even more). Now the resched target
 CPU runs into a timer handler, issues xnpod_schedule unconditionally,
 and happens to find the woken-up task before it is actually informed via
 an IPI.

 I think this is a harmless race, but it ruins the debug assertion
 "need_resched != 0".
>>> Not that harmless, since without the debugging code, we would miss the
>>> reschedule too...
>> Ok. But we would finally reschedule when handling the IPI. So, the
>> effect we see is a useless delay in the rescheduling.
>>
> 
> Depends on the POV: The interrupt or context switch between set_resched
> and xnpod_reschedule that may defer rescheduling may also hit us before
> we were able to wake up the thread at all. The worst case should not
> differ significantly.

Yes, and whether we set the bit and call xnpod_schedule atomically does
not really matter either: the IPI takes time to propagate, and since
xnarch_send_ipi does not wait for the IPI to have been received on the
remote CPU, there is no guarantee that xnpod_schedule could not have
been called in the mean time.

More importantly, since in order to do an action on a remote xnsched_t,
we need to hold the nklock, is there any point in not setting the
XNRESCHED bit on that distant structure, at the same time as when we set
the cpu bit on the local sched structure mask and send the IPI? This
way, setting the XNRESCHED bit in the IPI handler would no longer be
necessary, and we would avoid the race.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Hi,
>>>
>>> I'm hitting that bug check in __xnpod_schedule after
>>> xnintr_clock_handler issued a xnpod_schedule like this:
>>>
>>> if (--sched->inesting == 0) {
>>> __clrbits(sched->status, XNINIRQ);
>>> xnpod_schedule();
>>> }
>>>
>>> Either the assumption behind the bug check is no longer correct (no call
>>> to xnpod_schedule() without a real need), or we should check for
>>> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
>>> then).
>>>
>>> Comments?
>> You probably have a real bug. This BUG_ON means that the scheduler is
>> about to switch context for real, whereas the resched bit is not set,
>> which is wrong.
> This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
> Debugging further...
 You should look for something which changes the scheduler state without
 setting the resched bit, or for something which clears the bit without
 taking the scheduler changes into account.
>>> It looks like a generic Xenomai issue on SMP boxes, though a mostly
>>> harmless one:
>>>
>>> The task that was scheduled in without XNRESCHED set locally has been
>>> woken up by a remote CPU. The waker requeued the task and set the
>>> resched condition for itself and in the resched proxy mask for the
>>> remote CPU. But there is at least one place in the Xenomai code where we
>>> drop the nklock between xnsched_set_resched and xnpod_schedule:
>>> do_taskexit_event (I bet there are even more). Now the resched target
>>> CPU runs into a timer handler, issues xnpod_schedule unconditionally,
>>> and happens to find the woken-up task before it is actually informed via
>>> an IPI.
>>>
>>> I think this is a harmless race, but it ruins the debug assertion
>>> "need_resched != 0".
>> Not that harmless, since without the debugging code, we would miss the
>> reschedule too...
> 
> Ok. But we would finally reschedule when handling the IPI. So, the
> effect we see is a useless delay in the rescheduling.
> 

Depends on the POV: The interrupt or context switch between set_resched
and xnpod_reschedule that may defer rescheduling may also hit us before
we were able to wake up the thread at all. The worst case should not
differ significantly.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> I'm hitting that bug check in __xnpod_schedule after
>> xnintr_clock_handler issued a xnpod_schedule like this:
>>
>>  if (--sched->inesting == 0) {
>>  __clrbits(sched->status, XNINIRQ);
>>  xnpod_schedule();
>>  }
>>
>> Either the assumption behind the bug check is no longer correct (no call
>> to xnpod_schedule() without a real need), or we should check for
>> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
>> then).
>>
>> Comments?
> You probably have a real bug. This BUG_ON means that the scheduler is
> about to switch context for real, whereas the resched bit is not set,
> which is wrong.
 This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
 Debugging further...
>>> You should look for something which changes the scheduler state without
>>> setting the resched bit, or for something which clears the bit without
>>> taking the scheduler changes into account.
>> It looks like a generic Xenomai issue on SMP boxes, though a mostly
>> harmless one:
>>
>> The task that was scheduled in without XNRESCHED set locally has been
>> woken up by a remote CPU. The waker requeued the task and set the
>> resched condition for itself and in the resched proxy mask for the
>> remote CPU. But there is at least one place in the Xenomai code where we
>> drop the nklock between xnsched_set_resched and xnpod_schedule:
>> do_taskexit_event (I bet there are even more). Now the resched target
>> CPU runs into a timer handler, issues xnpod_schedule unconditionally,
>> and happens to find the woken-up task before it is actually informed via
>> an IPI.
>>
>> I think this is a harmless race, but it ruins the debug assertion
>> "need_resched != 0".
> 
> Not that harmless, since without the debugging code, we would miss the
> reschedule too...

Ok. But we would finally reschedule when handling the IPI. So, the
effect we see is a useless delay in the rescheduling.

-- 
Gilles.


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> Hi,
>
> I'm hitting that bug check in __xnpod_schedule after
> xnintr_clock_handler issued a xnpod_schedule like this:
>
>   if (--sched->inesting == 0) {
>   __clrbits(sched->status, XNINIRQ);
>   xnpod_schedule();
>   }
>
> Either the assumption behind the bug check is no longer correct (no call
> to xnpod_schedule() without a real need), or we should check for
> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
> then).
>
> Comments?
 You probably have a real bug. This BUG_ON means that the scheduler is
 about to switch context for real, whereas the resched bit is not set,
 which is wrong.
>>> This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
>>> Debugging further...
>> You should look for something which changes the scheduler state without
>> setting the resched bit, or for something which clears the bit without
>> taking the scheduler changes into account.
> 
> It looks like a generic Xenomai issue on SMP boxes, though a mostly
> harmless one:
> 
> The task that was scheduled in without XNRESCHED set locally has been
> woken up by a remote CPU. The waker requeued the task and set the
> resched condition for itself and in the resched proxy mask for the
> remote CPU. But there is at least one place in the Xenomai code where we
> drop the nklock between xnsched_set_resched and xnpod_schedule:
> do_taskexit_event (I bet there are even more). Now the resched target
> CPU runs into a timer handler, issues xnpod_schedule unconditionally,
> and happens to find the woken-up task before it is actually informed via
> an IPI.
> 
> I think this is a harmless race, but it ruins the debug assertion
> "need_resched != 0".

Not that harmless, since without the debugging code, we would miss the
reschedule too...

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Hi,

 I'm hitting that bug check in __xnpod_schedule after
 xnintr_clock_handler issued a xnpod_schedule like this:

if (--sched->inesting == 0) {
__clrbits(sched->status, XNINIRQ);
xnpod_schedule();
}

 Either the assumption behind the bug check is no longer correct (no call
 to xnpod_schedule() without a real need), or we should check for
 __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock 
 then).

 Comments?
>>> You probably have a real bug. This BUG_ON means that the scheduler is
>>> about to switch context for real, whereas the resched bit is not set,
>>> which is wrong.
>> This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
>> Debugging further...
> 
> You should look for something which changes the scheduler state without
> setting the resched bit, or for something which clears the bit without
> taking the scheduler changes into account.

It looks like a generic Xenomai issue on SMP boxes, though a mostly
harmless one:

The task that was scheduled in without XNRESCHED set locally has been
woken up by a remote CPU. The waker requeued the task and set the
resched condition for itself and in the resched proxy mask for the
remote CPU. But there is at least one place in the Xenomai code where we
drop the nklock between xnsched_set_resched and xnpod_schedule:
do_taskexit_event (I bet there are even more). Now the resched target
CPU runs into a timer handler, issues xnpod_schedule unconditionally,
and happens to find the woken-up task before it is actually informed via
an IPI.

I think this is a harmless race, but it ruins the debug assertion
"need_resched != 0".

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Hi,
>>>
>>> I'm hitting that bug check in __xnpod_schedule after
>>> xnintr_clock_handler issued a xnpod_schedule like this:
>>>
>>> if (--sched->inesting == 0) {
>>> __clrbits(sched->status, XNINIRQ);
>>> xnpod_schedule();
>>> }
>>>
>>> Either the assumption behind the bug check is no longer correct (no call
>>> to xnpod_schedule() without a real need), or we should check for
>>> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then).
>>>
>>> Comments?
>> You probably have a real bug. This BUG_ON means that the scheduler is
>> about to switch context for real, whereas the resched bit is not set,
>> which is wrong.
> 
> This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
> Debugging further...

You should look for something which changes the scheduler state without
setting the resched bit, or for something which clears the bit without
taking the scheduler changes into account.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> I'm hitting that bug check in __xnpod_schedule after
>> xnintr_clock_handler issued a xnpod_schedule like this:
>>
>>  if (--sched->inesting == 0) {
>>  __clrbits(sched->status, XNINIRQ);
>>  xnpod_schedule();
>>  }
>>
>> Either the assumption behind the bug check is no longer correct (no call
>> to xnpod_schedule() without a real need), or we should check for
>> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then).
>>
>> Comments?
> 
> You probably have a real bug. This BUG_ON means that the scheduler is
> about to switch context for real, whereas the resched bit is not set,
> which is wrong.

This happened over my 2.6.35 port - maybe some spurious IRQ enabling.
Debugging further...

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Hi,
> 
> I'm hitting that bug check in __xnpod_schedule after
> xnintr_clock_handler issued a xnpod_schedule like this:
> 
>   if (--sched->inesting == 0) {
>   __clrbits(sched->status, XNINIRQ);
>   xnpod_schedule();
>   }
> 
> Either the assumption behind the bug check is no longer correct (no call
> to xnpod_schedule() without a real need), or we should check for
> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then).
> 
> Comments?

You probably have a real bug. This BUG_ON means that the scheduler is
about to switch context for real, whereas the resched bit is not set,
which is wrong.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)?

2010-08-27 Thread Jan Kiszka
Hi,

I'm hitting that bug check in __xnpod_schedule after
xnintr_clock_handler issued a xnpod_schedule like this:

if (--sched->inesting == 0) {
__clrbits(sched->status, XNINIRQ);
xnpod_schedule();
}

Either the assumption behind the bug check is no longer correct (no call
to xnpod_schedule() without a real need), or we should check for
__xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then).

Comments?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core