Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Jan Kiszka
Philippe Gerum wrote:
> Gilles Chanteperdrix wrote:
>> Wolfgang Grandegger wrote:
>>  > Therefore we need a dedicated function to re-enable interrupts in
>> the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
>> more  > obvious. On non-PPC archs it would translate to *_irq_enable.
>> I  > realized, that *_irq_enable is used in various place/skins and
>> therefore  > I have not yet provided a patch.
>>
>> The function xnarch_irq_enable seems to be called in only two functions,
>> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.
>>
>> In any case, since I am not sure if this has to be done at the Adeos
>> level or in Xenomai, we will wait for Philippe to come back and decide.
>>
> 
> ->enable() and ->end() all mixed up illustrates a silly x86 bias I once
> had. We do need to differentiate the mere enabling from the IRQ epilogue
> at PIC level since Linux does it - i.e. we don't want to change the
> semantics here.
> 
> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
> Linux naming scheme, and have the proper epilogue done from there on a
> per-arch basis.
> 
> Current uses of xnarch_enable_irq() should be reserved to the
> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
> source at PIC level outside of any ISR context for such interrupt (*).
> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> layer, since the HAL already controls the way interrupts are ended
> actually; it just does it improperly on some platforms.
> 
> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
> to be used from the ISR too in order to revalidate the source at PIC level?
> 

Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Wolfgang Grandegger


> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
> 
> Philippe Gerum wrote:
> > Gilles Chanteperdrix wrote:
> >> Wolfgang Grandegger wrote:
> >>  > Therefore we need a dedicated function to re-enable interrupts in
> >> the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
> >> more  > obvious. On non-PPC archs it would translate to *_irq_enable.
> >> I  > realized, that *_irq_enable is used in various place/skins and
> >> therefore  > I have not yet provided a patch.
> >>
> >> The function xnarch_irq_enable seems to be called in only two
functions,
> >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.
> >>
> >> In any case, since I am not sure if this has to be done at the Adeos
> >> level or in Xenomai, we will wait for Philippe to come back and decide.
> >>
> > 
> > ->enable() and ->end() all mixed up illustrates a silly x86 bias I once
> > had. We do need to differentiate the mere enabling from the IRQ epilogue
> > at PIC level since Linux does it - i.e. we don't want to change the
> > semantics here.
> > 
> > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
> > Linux naming scheme, and have the proper epilogue done from there on a
> > per-arch basis.
> > 
> > Current uses of xnarch_enable_irq() should be reserved to the
> > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
> > source at PIC level outside of any ISR context for such interrupt (*).
> > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> > layer, since the HAL already controls the way interrupts are ended
> > actually; it just does it improperly on some platforms.
> > 
> > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
> > to be used from the ISR too in order to revalidate the source at PIC
level?
> > 
> 
> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
> after an interrupt, and the documentation does not suggest this either.
> I see no problem here.

But RTDM needs a rtdm_irq_end() functions as well in case the
user wants to reenable the interrupt outside the ISR, I think.

Wolfgang.


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Jan Kiszka
Wolfgang Grandegger wrote:
> 
>> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
>>
>> Philippe Gerum wrote:
>>> Gilles Chanteperdrix wrote:
 Wolfgang Grandegger wrote:
  > Therefore we need a dedicated function to re-enable interrupts in
 the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
 more  > obvious. On non-PPC archs it would translate to *_irq_enable.
 I  > realized, that *_irq_enable is used in various place/skins and
 therefore  > I have not yet provided a patch.

 The function xnarch_irq_enable seems to be called in only two
> functions,
 xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.

 In any case, since I am not sure if this has to be done at the Adeos
 level or in Xenomai, we will wait for Philippe to come back and decide.

>>> ->enable() and ->end() all mixed up illustrates a silly x86 bias I once
>>> had. We do need to differentiate the mere enabling from the IRQ epilogue
>>> at PIC level since Linux does it - i.e. we don't want to change the
>>> semantics here.
>>>
>>> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
>>> Linux naming scheme, and have the proper epilogue done from there on a
>>> per-arch basis.
>>>
>>> Current uses of xnarch_enable_irq() should be reserved to the
>>> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
>>> source at PIC level outside of any ISR context for such interrupt (*).
>>> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
>>> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
>>> layer, since the HAL already controls the way interrupts are ended
>>> actually; it just does it improperly on some platforms.
>>>
>>> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
>>> to be used from the ISR too in order to revalidate the source at PIC
> level?
>> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
>> after an interrupt, and the documentation does not suggest this either.
>> I see no problem here.
> 
> But RTDM needs a rtdm_irq_end() functions as well in case the
> user wants to reenable the interrupt outside the ISR, I think.

If this is a valid use-case, it should be really straightforward to add
this abstraction to RTDM. We should just document that rtdm_irq_end()
shall not be invoked from IRQ context - to avoid breaking the chain in
the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
re-enable the line from the handler.

Jan




signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Wolfgang Grandegger wrote:


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)

Philippe Gerum wrote:


Gilles Chanteperdrix wrote:


Wolfgang Grandegger wrote:
> Therefore we need a dedicated function to re-enable interrupts in
the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
more  > obvious. On non-PPC archs it would translate to *_irq_enable.
I  > realized, that *_irq_enable is used in various place/skins and
therefore  > I have not yet provided a patch.

The function xnarch_irq_enable seems to be called in only two


functions,


xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.

In any case, since I am not sure if this has to be done at the Adeos
level or in Xenomai, we will wait for Philippe to come back and decide.



->enable() and ->end() all mixed up illustrates a silly x86 bias I once
had. We do need to differentiate the mere enabling from the IRQ epilogue
at PIC level since Linux does it - i.e. we don't want to change the
semantics here.

I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
Linux naming scheme, and have the proper epilogue done from there on a
per-arch basis.

Current uses of xnarch_enable_irq() should be reserved to the
non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
source at PIC level outside of any ISR context for such interrupt (*).
XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
layer, since the HAL already controls the way interrupts are ended
actually; it just does it improperly on some platforms.

(*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
to be used from the ISR too in order to revalidate the source at PIC


level?


Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.


But RTDM needs a rtdm_irq_end() functions as well in case the
user wants to reenable the interrupt outside the ISR, I think.



If this is a valid use-case, it should be really straightforward to add
this abstraction to RTDM. We should just document that rtdm_irq_end()
shall not be invoked from IRQ context -


It's the other way around: ->end() would indeed be called from the ISR epilogue, 
and ->enable() would not.


 to avoid breaking the chain in

the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
re-enable the line from the handler.

Jan





--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Gilles Chanteperdrix wrote:


Wolfgang Grandegger wrote:
> Therefore we need a dedicated function to re-enable interrupts in
the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
more  > obvious. On non-PPC archs it would translate to *_irq_enable.
I  > realized, that *_irq_enable is used in various place/skins and
therefore  > I have not yet provided a patch.

The function xnarch_irq_enable seems to be called in only two functions,
xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.

In any case, since I am not sure if this has to be done at the Adeos
level or in Xenomai, we will wait for Philippe to come back and decide.



->enable() and ->end() all mixed up illustrates a silly x86 bias I once
had. We do need to differentiate the mere enabling from the IRQ epilogue
at PIC level since Linux does it - i.e. we don't want to change the
semantics here.

I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
Linux naming scheme, and have the proper epilogue done from there on a
per-arch basis.

Current uses of xnarch_enable_irq() should be reserved to the
non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
source at PIC level outside of any ISR context for such interrupt (*).
XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
layer, since the HAL already controls the way interrupts are ended
actually; it just does it improperly on some platforms.

(*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
to be used from the ISR too in order to revalidate the source at PIC level?




Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.



Ok, so no change would be needed here to implement what's described above.


Jan




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Jan Kiszka
Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Wolfgang Grandegger wrote:
>>
 This is an OpenPGP/MIME signed message (RFC 2440 and 3156)

 Philippe Gerum wrote:

> Gilles Chanteperdrix wrote:
>
>> Wolfgang Grandegger wrote:
>> > Therefore we need a dedicated function to re-enable interrupts in
>> the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
>> more  > obvious. On non-PPC archs it would translate to *_irq_enable.
>> I  > realized, that *_irq_enable is used in various place/skins and
>> therefore  > I have not yet provided a patch.
>>
>> The function xnarch_irq_enable seems to be called in only two
>>>
>>> functions,
>>>
>> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is
>> set.
>>
>> In any case, since I am not sure if this has to be done at the Adeos
>> level or in Xenomai, we will wait for Philippe to come back and
>> decide.
>>
>
> ->enable() and ->end() all mixed up illustrates a silly x86 bias I
> once
> had. We do need to differentiate the mere enabling from the IRQ
> epilogue
> at PIC level since Linux does it - i.e. we don't want to change the
> semantics here.
>
> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with
> the
> Linux naming scheme, and have the proper epilogue done from there on a
> per-arch basis.
>
> Current uses of xnarch_enable_irq() should be reserved to the
> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the
> IRQ
> source at PIC level outside of any ISR context for such interrupt (*).
> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> layer, since the HAL already controls the way interrupts are ended
> actually; it just does it improperly on some platforms.
>
> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it
> intended
> to be used from the ISR too in order to revalidate the source at PIC
>>>
>>> level?
>>>
 Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
 after an interrupt, and the documentation does not suggest this either.
 I see no problem here.
>>>
>>> But RTDM needs a rtdm_irq_end() functions as well in case the
>>> user wants to reenable the interrupt outside the ISR, I think.
>>
>>
>> If this is a valid use-case, it should be really straightforward to add
>> this abstraction to RTDM. We should just document that rtdm_irq_end()
>> shall not be invoked from IRQ context -
> 
> It's the other way around: ->end() would indeed be called from the ISR
> epilogue, and ->enable() would not.

I think we are talking about different things: Wolfgang was asking for
an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the
user API. You are now referring to the back-end which had to provide
some end-mechanism to be used both by the nucleus' ISR epilogue and that
rtdm_irq_end(), and that mechanism shall be told apart from the IRQ
enable/disable API. Well, that's at least how I think to have got it...

> 
>  to avoid breaking the chain in
>> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
>> re-enable the line from the handler.
>>
>> Jan
>>
>>
> 
> 

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Philippe Gerum wrote:

Jan Kiszka wrote:


Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

[Xenomai]
 o Shadow threads have only one stack, i.e. one context. If the
   real-time part is active (this includes it is blocked on some xnsynch
   object or delayed), the original Linux task must NEVER EVER be
   executed, even if it will immediately fall asleep again. That's
   because the stack is in use by the real-time part at that time. And
   this condition is checked in do_schedule_event() [1].

[Linux]
 o A Linux task which has called set_current_state() will
   remain in the run-queue as long as it calls schedule() on its own.
   This means that it can be preempted (if CONFIG_PREEMPT is set)
   between set_current_state() and schedule() and then even be resumed
   again. Only the explicit call of schedule() will trigger
   deactivate_task() which will in turn remove current from the
   run-queue.

Ok, if this is true, let's have a look at xnshadow_harden(): After
grabbing the gatekeeper sem and putting itself in gk->thread, a task
going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
gatekeeper [2]. This does not include a Linux reschedule due to the
_sync version of wake_up_interruptible. What can happen now?

1) No interruption until we can called schedule() [3]. All fine as we
will not be removed from the run-queue before the gatekeeper starts
kicking our RT part, thus no conflict in using the thread's stack.

3) Interruption by a RT IRQ. This would just delay the path described
above, even if some RT threads get executed. Once they are finished, we
continue in xnshadow_harden() - given that the RT part does not trigger
the following case:

3) Interruption by some Linux IRQ. This may cause other threads to
become runnable as well, but the gatekeeper has the highest prio and
will therefore be the next. The problem is that the rescheduling on
Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
remove it from the Linux run-queue. And now we are in real troubles: The
gatekeeper will kick off our RT part which will take over the thread's
stack. As soon as the RT domain falls asleep and Linux takes over again,
it will continue our non-RT part as well! Actually, this seems to be the
reason for the panic in do_schedule_event(). Without
CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
TIME now, thus violating my first assumption. The system gets fatally
corrupted.



Yep, that's it. And we may not lock out the interrupts before calling 
schedule to prevent that.



Well, I would be happy if someone can prove me wrong here.

The problem is that I don't see a solution because Linux does not
provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
currently considering a hack to remove the migrating Linux thread
manually from the run-queue, but this could easily break the Linux
scheduler.



Maybe the best way would be to provide atomic wakeup-and-schedule 
support into the Adeos patch for Linux tasks; previous attempts to fix 
this by circumventing the potential for preemption from outside of the 
scheduler code have all failed, and this bug is uselessly lingering for 
that reason.


Having slept on this, I'm going to add a simple extension to the Linux scheduler 
available from Adeos, in order to get an atomic/unpreemptable path from the 
statement when the current task's state is changed for suspension (e.g. 
TASK_INTERRUPTIBLE), to the point where schedule() normally enters its atomic 
section, which looks like the sanest way to solve this issue, i.e. without gory 
hackery all over the place. Patch will follow later for testing this approach.





Jan


PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.


[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573 

[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461 

[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481 







___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core







--

Philippe.

___

RE: [Xenomai-core] broken docs

2006-01-30 Thread ROSSIER Daniel
Dear Xenomai workers,

Would it be possible to have an updated API documentation for Xenomai 2.0.x ? 
(I mean with formal parameters in function prototypes)

I tried to regenerate it with make generate-doc, but it seems that a SVN 
working dir is required.

It would be great.

Thanks a lot

Daniel

> -Message d'origine-
> De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De
> la part de Jan Kiszka
> Envoyé : mercredi, 18. janvier 2006 20:03
> À : xenomai-core
> Objet : [Xenomai-core] broken docs
> 
> Hi,
> 
> I noticed that the doxygen API output is partly broken. Under the
> nucleus and native skin modules most functions became variables. I
> haven't looked at the source yet, but I guess it should be resolvable,
> especially as the RTDM docs are fine. This mail is to file the issue,
> maybe I will have a look later - /maybe/.
> 
> Moreover, I was lacking a reference to RT_MUTEX_INFO. Does this
> structure just needs to be added to the correct doxygen group? I guess
> there are other undocumented structures out there as well (RT_TASK_INFO,
> ...?).
> 
> Jan


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Jan Kiszka wrote:


Wolfgang Grandegger wrote:



This is an OpenPGP/MIME signed message (RFC 2440 and 3156)

Philippe Gerum wrote:



Gilles Chanteperdrix wrote:



Wolfgang Grandegger wrote:


Therefore we need a dedicated function to re-enable interrupts in


the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
more  > obvious. On non-PPC archs it would translate to *_irq_enable.
I  > realized, that *_irq_enable is used in various place/skins and
therefore  > I have not yet provided a patch.

The function xnarch_irq_enable seems to be called in only two


functions,



xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is
set.

In any case, since I am not sure if this has to be done at the Adeos
level or in Xenomai, we will wait for Philippe to come back and
decide.



->enable() and ->end() all mixed up illustrates a silly x86 bias I
once
had. We do need to differentiate the mere enabling from the IRQ
epilogue
at PIC level since Linux does it - i.e. we don't want to change the
semantics here.

I would go for adding xnarch_end_irq -> rthal_irq_end to stick with
the
Linux naming scheme, and have the proper epilogue done from there on a
per-arch basis.

Current uses of xnarch_enable_irq() should be reserved to the
non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the
IRQ
source at PIC level outside of any ISR context for such interrupt (*).
XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
layer, since the HAL already controls the way interrupts are ended
actually; it just does it improperly on some platforms.

(*) Jan, does rtdm_irq_enable() have the same meaning, or is it
intended
to be used from the ISR too in order to revalidate the source at PIC


level?



Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.


But RTDM needs a rtdm_irq_end() functions as well in case the
user wants to reenable the interrupt outside the ISR, I think.



If this is a valid use-case, it should be really straightforward to add
this abstraction to RTDM. We should just document that rtdm_irq_end()
shall not be invoked from IRQ context -


It's the other way around: ->end() would indeed be called from the ISR
epilogue, and ->enable() would not.



I think we are talking about different things: Wolfgang was asking for
an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the
user API. You are now referring to the back-end which had to provide
some end-mechanism to be used both by the nucleus' ISR epilogue and that
rtdm_irq_end(), and that mechanism shall be told apart from the IRQ
enable/disable API. Well, that's at least how I think to have got it...



My point was only about naming here: *_end() should be reserved for the epilogue 
stuff, hence be callable from ISR context. Aside of that, I'm ok with the 
abstraction you described though.





to avoid breaking the chain in


the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
re-enable the line from the handler.

Jan







Jan




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Wolfgang Grandegger


> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
> 
> Philippe Gerum wrote:
> > Jan Kiszka wrote:
> >> Wolfgang Grandegger wrote:
> >>
>  This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
> 
>  Philippe Gerum wrote:
> 
> > Gilles Chanteperdrix wrote:
> >
> >> Wolfgang Grandegger wrote:
> >> > Therefore we need a dedicated function to re-enable interrupts in
> >> the  > ISR. We could name it *_end_irq, but maybe
*_enable_isr_irq is
> >> more  > obvious. On non-PPC archs it would translate to
*_irq_enable.
> >> I  > realized, that *_irq_enable is used in various place/skins and
> >> therefore  > I have not yet provided a patch.
> >>
> >> The function xnarch_irq_enable seems to be called in only two
> >>>
> >>> functions,
> >>>
> >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is
> >> set.
> >>
> >> In any case, since I am not sure if this has to be done at the
Adeos
> >> level or in Xenomai, we will wait for Philippe to come back and
> >> decide.
> >>
> >
> > ->enable() and ->end() all mixed up illustrates a silly x86 bias I
> > once
> > had. We do need to differentiate the mere enabling from the IRQ
> > epilogue
> > at PIC level since Linux does it - i.e. we don't want to change the
> > semantics here.
> >
> > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with
> > the
> > Linux naming scheme, and have the proper epilogue done from
there on a
> > per-arch basis.
> >
> > Current uses of xnarch_enable_irq() should be reserved to the
> > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the
> > IRQ
> > source at PIC level outside of any ISR context for such
interrupt (*).
> > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> > layer, since the HAL already controls the way interrupts are ended
> > actually; it just does it improperly on some platforms.
> >
> > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it
> > intended
> > to be used from the ISR too in order to revalidate the source at PIC
> >>>
> >>> level?
> >>>
>  Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
>  after an interrupt, and the documentation does not suggest this
either.
>  I see no problem here.
> >>>
> >>> But RTDM needs a rtdm_irq_end() functions as well in case the
> >>> user wants to reenable the interrupt outside the ISR, I think.
> >>
> >>
> >> If this is a valid use-case, it should be really straightforward to add
> >> this abstraction to RTDM. We should just document that rtdm_irq_end()
> >> shall not be invoked from IRQ context -
> > 
> > It's the other way around: ->end() would indeed be called from the ISR
> > epilogue, and ->enable() would not.
> 
> I think we are talking about different things: Wolfgang was asking for
> an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the
> user API. You are now referring to the back-end which had to provide
> some end-mechanism to be used both by the nucleus' ISR epilogue and that
> rtdm_irq_end(), and that mechanism shall be told apart from the IRQ
> enable/disable API. Well, that's at least how I think to have got it...

Yep, I was thinking of deferred interrupt handling in a real-time task.
Then rtdm_irq_end() would be required.
 
> > 
> >  to avoid breaking the chain in
> >> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
> >> re-enable the line from the handler.
> >>
> >> Jan
> >>
> >>
> > 
> > 
> 
> Jan
> 
> 
> 

Wolfgang.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [PATCH] fix pthread cancellation in native skin

2006-01-30 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > Hi,
 > 
 > Gilles' work on cancellation for the posix skin reminded me of this
 > issue I once discovered in the native skin:
 > 
 > https://mail.gna.org/public/xenomai-core/2005-12/msg00014.html
 > 
 > I found out that this can easily be fixed by switching the pthread of a
 > native task to PTHREAD_CANCEL_ASYNCHRONOUS. See attached patch.
 > 
 > 
 > At this chance I discovered that calling rt_task_delete for a task that
 > was created and started with T_SUSP mode but was not yet resumed, locks
 > up the system. More precisely: it raises a fatal warning when
 > XENO_OPT_DEBUG is on. Might be the case that it just works on system
 > without this switched on. Either this is a real bug, or the warning
 > needs to be fixed. (Deleting a task after rt_task_suspend works.)

Actually, the fatal warning happens when starting with rt_task_start the
task which was created with the T_SUSP bit. The task needs to wake-up in
xnshadow_wait_barrier until it gets really suspended in primary mode by
the final xnshadow_harden. This situation triggers the fatal, because
the thread has the nucleus XNSUSP bit and is running in secondary mode.

This is not the only situation where a thread with a nucleus suspension
bit need to run shortly in secondary mode: it also occurs when
suspending with xnpod_suspend_thread() a thread running in secondary
mode; the thread receives the SIGCHLD signal and need to execute shortly
with the suspension bit set in order to cause a migration to primary
mode.

So, the only case when we are sure that a user-space thread can not be
scheduled by Linux seems to be when this thread does not have the
XNRELAX bit.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Dmitry Adamushko
>> ...
> I have not checked it yet but my presupposition that something as easy as :
>> preempt_disable()>> wake_up_interruptible_sync();> schedule();>> preempt_enable();It's a no-go: "scheduling while atomic". One of my first attempts tosolve it.

My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call schedule() while being non-preemptible.
To this end, ACTIVE_PREEMPT is set up.
The use of preempt_enable/disable() here is wrong.
 The only way to enter schedule() without being preemptible is viaACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(
Maybe I have missed something so just for my curiosity : what does prevent the use of PREEMPT_ACTIVE here?
We don't have a "preempted while atomic" message here as it seems to be
a legal way to call schedule() with that flag being set up.
 >>> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)>> --> Best regards,> Dmitry Adamushko>Jan-- Best regards,Dmitry Adamushko
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Jan Kiszka
Dmitry Adamushko wrote:
>>> ...
> 
>> I have not checked it yet but my presupposition that something as easy as
>> :
>>> preempt_disable()
>>>
>>> wake_up_interruptible_sync();
>>> schedule();
>>>
>>> preempt_enable();
>> It's a no-go: "scheduling while atomic". One of my first attempts to
>> solve it.
> 
> 
> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.
> To this end, ACTIVE_PREEMPT is set up.
> The use of preempt_enable/disable() here is wrong.
> 
> 
> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
>> Kind of Gordian knot. :(
> 
> 
> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?
> We don't have a "preempted while atomic" message here as it seems to be a
> legal way to call schedule() with that flag being set up.

When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.

> 
> 
>>> could work... err.. and don't blame me if no, it's some one else who has
>>> written that nonsense :o)
>>>
>>> --
>>> Best regards,
>>> Dmitry Adamushko
>>>
>> Jan
>>
>>
>>
>>
> 
> 
> --
> Best regards,
> Dmitry Adamushko
> 
> 
> 
> 
> 
> ___
> Xenomai-core mailing list
> Xenomai-core@gna.org
> https://mail.gna.org/listinfo/xenomai-core




signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] broken docs

2006-01-30 Thread Jan Kiszka
ROSSIER Daniel wrote:
> Dear Xenomai workers,
> 
> Would it be possible to have an updated API documentation for Xenomai
> 2.0.x ? (I mean with formal parameters in function prototypes)
> 
> I tried to regenerate it with make generate-doc, but it seems that a
> SVN working dir is required.
> 
> It would be great.

I just had a "quick" look at the status of the documentation in
SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express
it politely) when it comes to tracking down bugs in your formatting.
Something is broken in all modules except RTDM, and although I spent *a
lot* of time in getting RTDM correctly formatted, I cannot tell what's
wrong with the rest. This will require some long evenings of
continuous patching the docs, recompiling, and checking the result. Any
volunteers - I'm lacking the time? :-/

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] broken docs

2006-01-30 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > ROSSIER Daniel wrote:
 > > Dear Xenomai workers,
 > > 
 > > Would it be possible to have an updated API documentation for Xenomai
 > > 2.0.x ? (I mean with formal parameters in function prototypes)
 > > 
 > > I tried to regenerate it with make generate-doc, but it seems that a
 > > SVN working dir is required.

make generate-doc is needed for maintenance only. If you want to
generate doxygen documentation, simply add --enable-dox-doc to Xenomai
configure command line.

 > > 
 > > It would be great.
 > 
 > I just had a "quick" look at the status of the documentation in
 > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express
 > it politely) when it comes to tracking down bugs in your formatting.
 > Something is broken in all modules except RTDM, and although I spent *a
 > lot* of time in getting RTDM correctly formatted, I cannot tell what's
 > wrong with the rest. This will require some long evenings of
 > continuous patching the docs, recompiling, and checking the result. Any
 > volunteers - I'm lacking the time? :-/

Looking at the difference between RTDM documentation blocks and the
other modules is that the other modules use the "fn" tag. Removing the
"fn" tag from other modules documentation blocks seems to solve the
issue.

-- 


Gilles Chanteperdrix.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] broken docs

2006-01-30 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > ROSSIER Daniel wrote:
>  > > Dear Xenomai workers,
>  > > 
>  > > Would it be possible to have an updated API documentation for Xenomai
>  > > 2.0.x ? (I mean with formal parameters in function prototypes)
>  > > 
>  > > I tried to regenerate it with make generate-doc, but it seems that a
>  > > SVN working dir is required.
> 
> make generate-doc is needed for maintenance only. If you want to
> generate doxygen documentation, simply add --enable-dox-doc to Xenomai
> configure command line.
> 
>  > > 
>  > > It would be great.
>  > 
>  > I just had a "quick" look at the status of the documentation in
>  > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express
>  > it politely) when it comes to tracking down bugs in your formatting.
>  > Something is broken in all modules except RTDM, and although I spent *a
>  > lot* of time in getting RTDM correctly formatted, I cannot tell what's
>  > wrong with the rest. This will require some long evenings of
>  > continuous patching the docs, recompiling, and checking the result. Any
>  > volunteers - I'm lacking the time? :-/
> 
> Looking at the difference between RTDM documentation blocks and the
> other modules is that the other modules use the "fn" tag. Removing the
> "fn" tag from other modules documentation blocks seems to solve the
> issue.
> 

Indeed, works. Amazingly blind I was.

Anyway, it still needs some work to remove that stuff (I wonder what the
"correct" usage of @fn is...) and to wrap functions without bodies via
"#ifdef DOXYGEN_CPP" like RTDM does. At this chance, I would also
suggest to replace all \tag by @tag for the sake of a unified style (and
who knows what side effects mixing up both may have).

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Dmitry Adamushko
On 30/01/06, Jan Kiszka <[EMAIL PROTECTED]> wrote:
Dmitry Adamushko wrote:>>> ...>>> I have not checked it yet but my presupposition that something as easy as>> :>>> preempt_disable()>> wake_up_interruptible_sync();
>>> schedule();>> preempt_enable();>> It's a no-go: "scheduling while atomic". One of my first attempts to>> solve it.>>> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.> To this end, ACTIVE_PREEMPT is set up.> The use of preempt_enable/disable() here is wrong.>>> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.>> Kind of Gordian knot. :(>>> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?> We don't have a "preempted while atomic" message here as it seems to be a> legal way to call schedule() with that flag being set up.When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.
Err...  that's exactly the reason I have explained in my first
mail for this thread :) Blah.. I wish I was smoking something special
before so I would point that as the reason of my forgetfulness.

Actually, we could use PREEMPT_ACTIVE indeed + something else (probably
another flag) to distinguish between a case when PREEMPT_ACTIVE is set
by Linux and another case when it's set by xnshadow_harden().

xnshadow_harden()
{
struct task_struct *this_task = current;
...
xnthread_t *thread = xnshadow_thread(this_task);

if (!thread)
    return;

...
gk->thread = thread;

+ add_preempt_count(PREEMPT_ACTIVE);

// should be checked in schedule()
+ xnthread_set_flags(thread, XNATOMIC_TRANSIT);

set_current_state(TASK_INTERRUPTIBLE);
wake_up_interruptible_sync(&gk->waitq);
+ schedule();

+ sub_preempt_count(PREEMPT_ACTIVE);
...
}

Then, something like the following code should be called from schedule() : 

void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq)
{
xnthread_t *thread = xnshadow_thread(task);

if (!thread)
    return;

if (xnthread_test_flags(thread, XNATOMIC_TRANSIT))
    {
    xnthread_clear_flags(thread, XNATOMIC_TRANSIT);
    deactivate_task(task, rq);
    }
}

-

schedule.c : 
...

    switch_count = &prev->nivcsw;

    if (prev->state && !(preempt_count()
& PREEMPT_ACTIVE))         switch_count = &prev->nvcsw;

        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&

                unlikely(signal_pending(prev))
))
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
           
deactivate_task(prev, rq);
        }
    }

// removes a task from the active queue if PREEMPT_ACTIVE + // XNATOMIC_TRANSIT

+ #ifdef CONFIG_IPIPE
+ ipipe_transit_cleanup(prev, rq);
+ #endif /* CONFIG_IPIPE */
...

Not very gracefully maybe, but could work or am I missing something important?

-- 
Best regards,Dmitry Adamushko


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if 
the interrupt handler takes too long (i.e. next interrupt gets generated before 
the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


Any ideas of where to look?

Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:
> Hello,
> 
> 
> I'm currently not at a level to participate in your discussion. Although I'm

> willing to supply you with stresstests, I would nevertheless like to learn
> more from task migration as this debugging session proceeds. In order to do
> so, please confirm the following statements or indicate where I went wrong.
> I hope others may learn from this as well.
> 
> xn_shadow_harden(): This is called whenever a Xenomai thread performs a
> Linux (root domain) system call (notified by Adeos ?). 


xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or does it
> merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as



Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be unlinked from 
the Linux runqueue before the gatekeeper processes the resumption request, 
whatever event the kernel is processing asynchronously in the meantime. This is 
the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our 
toy by stealing the CPU from the hardening thread whilst keeping it linked to the 
runqueue: upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both the Linux 
and Xeno schedulers. Rainy day indeed.


We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually promotes it to 
the Xenomai scheduling mode and downgrades this priority back to normal, but we 
would pay additional latencies induced by each aborted rescheduling attempt that 
may occur during the atomic path we want to enforce.


The other way is to make sure that no in-kernel preemption of the hardening task 
could occur after step 1) and until step 2) is performed, given that we cannot 
currently call schedule() with interrupts or preemption off. I'm on it.





suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

> And how does it terminate: is only the system call migrated or is the thread
> allowed to continue run (at a priority level equal to the Xenomai
> priority level) until it hits something of the Xenomai API (or trivially:
> explicitly go to RT using th

Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Jan Kiszka
Anders Blomdell wrote:
> On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
> following if the interrupt handler takes too long (i.e. next interrupt
> gets generated before the previous one has finished)
> 
> [   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
> [   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
> [   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
> [   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
> [   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   42.923029]  [] 0x0
> [   42.959695]  [c0038348] __do_IRQ+0x134/0x164
> [   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
> [   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
> [   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
> [   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
> [   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   43.411145]  [c0006524] default_idle+0x10/0x60
> 

I think some probably important information is missing above this
back-trace. What does the kernel state before these lines?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Anders Blomdell
In the following code (ppc), shouldn't first be either declared static or 
deleted? To me it looks like first is always equal to one when the else clause 
is evaluated.


asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) >= 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2 && first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain &&
!test_bit(IPIPE_STALL_FLAG,
  &ipipe_root_domain->cpudata[cpuid].status));
}


Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Philippe Gerum wrote:

Jan Kiszka wrote:


Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?


Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so 
either, i.e.


1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.


We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.


The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.




> Could anyone interested in this issue test the following couple of patches?

> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

> Both patches are needed to fix the issue.

> TIA,

And now, Ladies and Gentlemen, with the patches attached.

--

Philippe.

--- 2.6.15-x86/kernel/sched.c	2006-01-07 15:18:31.0 +0100
+++ 2.6.15-ipipe/kernel/sched.c	2006-01-30 15:15:27.0 +0100
@@ -2963,7 +2963,7 @@
 	 * Otherwise, whine if we are scheduling when we should not be.
 	 */
 	if (likely(!current->exit_state)) {
-		if (unlikely(in_atomic())) {
+		if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) {
 			printk(KERN_ERR "scheduling while atomic: "
 "%s/0x%08x/%d\n",
 current->comm, preempt_count(), current->pid);
@@ -2972,8 +2972,13 @@
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_ad

Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Philippe Gerum wrote:

Jan Kiszka wrote:


Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?


Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so 
either, i.e.


1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.


We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.


The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.




Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,

--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Heikki Lindholm

Anders Blomdell kirjoitti:
In the following code (ppc), shouldn't first be either declared static 
or deleted? To me it looks like first is always equal to one when the 
else clause is evaluated.


You're right. "first" doesn't need to be there at all, it's probably an 
old copy of something in the kernel.



asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) >= 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2 && first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain &&
!test_bit(IPIPE_STALL_FLAG,
  &ipipe_root_domain->cpudata[cpuid].status));
}


Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Philippe Gerum

Heikki Lindholm wrote:

Anders Blomdell kirjoitti:

In the following code (ppc), shouldn't first be either declared static 
or deleted? To me it looks like first is always equal to one when the 
else clause is evaluated.



You're right. "first" doesn't need to be there at all, it's probably an 
old copy of something in the kernel.




Yep; used to be a while() loop in the original implementation we do not perform 
here.


asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) >= 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2 && first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain &&
!test_bit(IPIPE_STALL_FLAG,
  &ipipe_root_domain->cpudata[cpuid].status));
}


Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Jan Kiszka
Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
 Jeroen Van den Keybus wrote:
 > Hello,
 > > > I'm currently not at a level to participate in your
 discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like
 to learn
 > more from task migration as this debugging session proceeds. In
 order to do
 > so, please confirm the following statements or indicate where I
 went wrong.
 > I hope others may learn from this as well.
 > > xn_shadow_harden(): This is called whenever a Xenomai thread
 performs a
 > Linux (root domain) system call (notified by Adeos ?).
 xnshadow_harden() is called whenever a thread running in secondary
 mode (that is, running as a regular Linux thread, handled by Linux
 scheduler) is switching to primary mode (where it will run as a Xenomai
 thread, handled by Xenomai scheduler). Migrations occur for some system
 calls. More precisely, Xenomai skin system calls tables associates a
 few
 flags with each system call, and some of these flags cause migration of
 the caller when it issues the system call.

 Each Xenomai user-space thread has two contexts, a regular Linux
 thread context, and a Xenomai thread called "shadow" thread. Both
 contexts share the same stack and program counter, so that at any time,
 at least one of the two contexts is seen as suspended by the scheduler
 which handles it.

 Before xnshadow_harden is called, the Linux thread is running, and its
 shadow is seen in suspended state with XNRELAX bit by Xenomai
 scheduler. After xnshadow_harden, the Linux context is seen suspended
 with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
 running by Xenomai scheduler.

 The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or
 does it
 > merely put the thread in some Linux to-do list (I assumed the
 first case) ?

 Here, I am not sure, but it seems that when calling
 wake_up_interruptible_sync the woken up task is put in the current CPU
 runqueue, and this task (i.e. the gatekeeper), will not run until the
 current thread (i.e. the thread running xnshadow_harden) marks
 itself as
 suspended and calls schedule(). Maybe, marking the running thread as
>>>
>>>
>>>
>>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>> here - and a switch if the prio of the woken up task is higher.
>>>
>>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>> from wake_up_interruptible. As I understand it this _sync is just an
>>> optimisation hint for Linux to avoid needless scheduler runs.
>>>
>>
>> You could not guarantee the following execution sequence doing so
>> either, i.e.
>>
>> 1- current wakes up the gatekeeper
>> 2- current goes sleeping to exit the Linux runqueue in schedule()
>> 3- the gatekeeper resumes the shadow-side of the old current
>>
>> The point is all about making 100% sure that current is going to be
>> unlinked from the Linux runqueue before the gatekeeper processes the
>> resumption request, whatever event the kernel is processing
>> asynchronously in the meantime. This is the reason why, as you already
>> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>> CPU from the hardening thread whilst keeping it linked to the
>> runqueue: upon return from such preemption, the gatekeeper might have
>> run already,  hence the newly hardened thread ends up being seen as
>> runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>
>> We could rely on giving "current" the highest SCHED_FIFO priority in
>> xnshadow_harden() before waking up the gk, until the gk eventually
>> promotes it to the Xenomai scheduling mode and downgrades this
>> priority back to normal, but we would pay additional latencies induced
>> by each aborted rescheduling attempt that may occur during the atomic
>> path we want to enforce.
>>
>> The other way is to make sure that no in-kernel preemption of the
>> hardening task could occur after step 1) and until step 2) is
>> performed, given that we cannot currently call schedule() with
>> interrupts or preemption off. I'm on it.
>>
> 
> Could anyone interested in this issue test the following couple of patches?
> 
> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
> 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
> 
> Both patches are needed to fix the issue.
> 
> TIA,
> 

Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
re

Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 

You are so right!

> What does the kernel state before these lines?

[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0, 
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt is a 
shared one (Harrier chip, "Functional Exception"), that is used for both 
message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ 
handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), 
because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has 
left the interrupts turned off.


What I believe should be done, is

  1. When UART interrupt is received, disable further non-RT interrupts
 on this IRQ-line, pend interrupt to Linux.
  2. Handle RT interrupts on this IRQ line
  3. When Linux has finished the pended interrupt, reenable non-RT interrupts.

but I have neither been able to achieve this, nor to verify that it is the right 
thing to do...


Regards

Anders Blomdell


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Jan Kiszka
Anders Blomdell wrote:
> Jan Kiszka wrote:
>> Anders Blomdell wrote:
>>
>>> On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
>>> following if the interrupt handler takes too long (i.e. next interrupt
>>> gets generated before the previous one has finished)
>>>
>>> [   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
>>> [   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
>>> [   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
>>> [   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
>>> [   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
>>> [   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
>>> [   42.923029]  [] 0x0
>>> [   42.959695]  [c0038348] __do_IRQ+0x134/0x164
>>> [   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
>>> [   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
>>> [   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
>>> [   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
>>> [   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
>>> [   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
>>> [   43.411145]  [c0006524] default_idle+0x10/0x60
>>>
>>
>>
>> I think some probably important information is missing above this
>> back-trace. 
> You are so right!
> 
>> What does the kernel state before these lines?
> 
> [   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
> [   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0,
> .owner_cpu: 0
> [   42.511681] Call trace:
> [   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
> [   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
> [   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
> [   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
> [   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   42.923029]  [] 0x0
> [   42.959695]  [c0038348] __do_IRQ+0x134/0x164
> [   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
> [   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
> [   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
> [   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
> [   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   43.411145]  [c0006524] default_idle+0x10/0x60
> 
> 
> It might be that the problem is related to the fact that the interrupt
> is a shared one (Harrier chip, "Functional Exception"), that is used for
> both message-passing (should be RT) and UART (Linux, i.e. non-RT), my
> current IRQ handler always pends the interrupt to the linux domain
> (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when
> it wasn't a UART interrupt) has left the interrupts turned off.
> 
> What I believe should be done, is
> 
>   1. When UART interrupt is received, disable further non-RT interrupts
>  on this IRQ-line, pend interrupt to Linux.
>   2. Handle RT interrupts on this IRQ line
>   3. When Linux has finished the pended interrupt, reenable non-RT
> interrupts.
> 
> but I have neither been able to achieve this, nor to verify that it is
> the right thing to do...

Your approach is basically what I proposed some years back on rtai-dev
for handling unresolvable shared RT/NRT IRQs. I once successfully tested
such a setup with two network cards, one RT, the other Linux.

So when you are really doomed and cannot change the IRQ line of your RT
device, this is a kind of emergency workaround. Not nice and generic
(you have to write the stub for disabling the NRT IRQ source), but it
should work.


Anyway, I do not understand what made your spinlock recurs. This shared
IRQ scenario should only cause indeterminism to the RT driver (by
blocking the line until the Linux handler can release it), but it must
not trigger this bug.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Philippe Gerum

Anders Blomdell wrote:
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the 
following if the interrupt handler takes too long (i.e. next interrupt 
gets generated before the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184


Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock for any 
given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt 
at hw level. So we seem to have:


spin_lock_irqsave(&desc->lock)

__ipipe_grab_irq
__ipipe_handle_irq
__ipipe_ack_irq
spin_lock...(&desc->lock)
deadlock.

The point is about having spinlock_irqsave only _virtually_ masking the interrupts 
by preventing their associated Linux handler from being called, but despite this, 
Adeos still actually acquires and acknowledges the incoming hw events before 
logging them, even if their associated action happen to be postponed until 
spinlock_irq_restore() is called.


To solve this, all spinlocks potentially touched by the ipipe's primary IRQ 
handler and/or the code it calls indirectly, _must_ be operated using the _hw() 
call variant all over the kernel, so that no hw IRQ can be taken while those 
spinlocks are held by Linux. Usually, only the spinlock(s) protecting the 
interrupt descriptors or the PIC hardware are concerned.



[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


Any ideas of where to look?

Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


Jan Kiszka wrote:


Anders Blomdell wrote:



On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 


You are so right!



What does the kernel state before these lines?


[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0,
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt
is a shared one (Harrier chip, "Functional Exception"), that is used for
both message-passing (should be RT) and UART (Linux, i.e. non-RT), my
current IRQ handler always pends the interrupt to the linux domain
(RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when
it wasn't a UART interrupt) has left the interrupts turned off.

What I believe should be done, is

 1. When UART interrupt is received, disable further non-RT interrupts
on this IRQ-line, pend interrupt to Linux.
 2. Handle RT interrupts on this IRQ line
 3. When Linux has finished the pended interrupt, reenable non-RT
interrupts.

but I have neither been able to achieve this, nor to verify that it is
the right thing to do...



Your approach is basically what I proposed some years back on rtai-dev
for handling unresolvable shared RT/NRT IRQs. I once successfully tested
such a setup with two network cards, one RT, the other Linux.

So when you are really doomed and cannot change the IRQ line of your RT
device, this is a kind of emergency workaround. Not nice and generic
(you have to write the stub for disabling the NRT IRQ source), but it
should work.

I'm doomed, the interrupts live in the same chip...
The problem is that I have not found any good place to reenable the non-RT 
interrupts.



Anyway, I do not understand what made your spinlock recurs. This shared
IRQ scenario should only cause indeterminism to the RT driver (by
blocking the line until the Linux handler can release it), but it must
not trigger this bug.

OK, seems like  have two problems then, I'll try to hunt it down


/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Philippe Gerum wrote:

Anders Blomdell wrote:

On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the 
following if the interrupt handler takes too long (i.e. next interrupt 
gets generated before the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184



Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock 

more likely arch/ppc/kernel/*.c :-)

for any given IRQ without using the Adeos *_hw() spinlock variant that 
masks the interrupt at hw level. So we seem to have:


spin_lock_irqsave(&desc->lock)

__ipipe_grab_irq
__ipipe_handle_irq
__ipipe_ack_irq
spin_lock...(&desc->lock)
deadlock.

The point is about having spinlock_irqsave only _virtually_ masking the 
interrupts by preventing their associated Linux handler from being 
called, but despite this, Adeos still actually acquires and acknowledges 
the incoming hw events before logging them, even if their associated 
action happen to be postponed until spinlock_irq_restore() is called.


To solve this, all spinlocks potentially touched by the ipipe's primary 
IRQ handler and/or the code it calls indirectly, _must_ be operated 
using the _hw() call variant all over the kernel, so that no hw IRQ can 
be taken while those spinlocks are held by Linux. Usually, only the 
spinlock(s) protecting the interrupt descriptors or the PIC hardware are 
concerned.

So you will expect an addition to the ipipe patch then?

/Anders

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Philippe Gerum

Anders Blomdell wrote:

Philippe Gerum wrote:


Anders Blomdell wrote:

On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the 
following if the interrupt handler takes too long (i.e. next 
interrupt gets generated before the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184




Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock 


more likely arch/ppc/kernel/*.c :-)



Gah... looks like I'm still confused by ia64 issues I'm chasing right now. (Why on 
earth do we need so many bits on our CPUs that only serve the purpose of raising 
so many problems?)


for any given IRQ without using the Adeos *_hw() spinlock variant that 
masks the interrupt at hw level. So we seem to have:


spin_lock_irqsave(&desc->lock)

__ipipe_grab_irq
__ipipe_handle_irq
__ipipe_ack_irq
spin_lock...(&desc->lock)
deadlock.

The point is about having spinlock_irqsave only _virtually_ masking 
the interrupts by preventing their associated Linux handler from being 
called, but despite this, Adeos still actually acquires and 
acknowledges the incoming hw events before logging them, even if their 
associated action happen to be postponed until spinlock_irq_restore() 
is called.


To solve this, all spinlocks potentially touched by the ipipe's 
primary IRQ handler and/or the code it calls indirectly, _must_ be 
operated using the _hw() call variant all over the kernel, so that no 
hw IRQ can be taken while those spinlocks are held by Linux. Usually, 
only the spinlock(s) protecting the interrupt descriptors or the PIC 
hardware are concerned.


So you will expect an addition to the ipipe patch then?



Yep. We first need to find out who's grabbing the shared spinlock using the 
vanilla Linux primitives.



/Anders




--

Philippe.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] [PATCH] rt_heap reminder

2006-01-30 Thread Stefan Kisdaroczi
Hi all,

as a reminder (userspace, native skin, shared heap) [1]:
API documentation: "If the heap is shared, this value can be either zero, or 
the same value given to rt_heap_create()."
This is not true. As the heapsize gets altered in rt_heap_create for page size 
alignment, the following call to rt_heap_alloc with the same value will fail.

Ex:
rt_heap_create( ..., ..., 1, ... )
rt_heap_alloc( ..., 1, ...,  ) -> This call fails

I suggest only accepting zero as a valid size for shared heaps.

about attached patch:
1) not tested
2) there are possible better names than H_ALL
3) the comments could be in a better english
4) i hope you get the idea

thx
kisda

[1] https://mail.gna.org/public/xenomai-core/2006-01/msg00177.html
Index: include/native/heap.h
===
--- include/native/heap.h	(Revision 465)
+++ include/native/heap.h	(Arbeitskopie)
@@ -32,6 +32,9 @@
 #define H_DMA0x100		/* Use memory suitable for DMA. */
 #define H_SHARED 0x200		/* Use mappable shared memory. */
 
+/* Operation flags. */
+#define H_ALL0x0	/* Entire heap space. */
+
 typedef struct rt_heap_info {
 
 int nwaiters;		/* !< Number of pending tasks. */
Index: ksrc/skins/native/heap.c
===
--- ksrc/skins/native/heap.c	(Revision 465)
+++ ksrc/skins/native/heap.c	(Arbeitskopie)
@@ -410,10 +410,9 @@
  * from.
  *
  * @param size The requested size in bytes of the block. If the heap
- * is shared, this value can be either zero, or the same value given
- * to rt_heap_create(). In any case, the same block covering the
- * entire heap space will always be returned to all callers of this
- * service.
+ * is shared, H_ALL should be passed, as always the same block
+ * covering the entire heap space will be returned to all callers of
+ * this service.
  *
  * @param timeout The number of clock ticks to wait for a block of
  * sufficient size to be available from a local heap (see
@@ -432,8 +431,7 @@
  * @return 0 is returned upon success. Otherwise:
  *
  * - -EINVAL is returned if @a heap is not a heap descriptor, or @a
- * heap is shared (i.e. H_SHARED mode) and @a size is non-zero but
- * does not match the actual heap size passed to rt_heap_create().
+ * heap is shared (i.e. H_SHARED mode) and @a size is not H_ALL.
  *
  * - -EIDRM is returned if @a heap is a deleted heap descriptor.
  *
@@ -503,12 +501,7 @@
 
 	if (!block)
 	{
-	/* It's ok to pass zero for size here, since the requested
-	   size is implicitely the whole heap space; but if
-	   non-zero is given, it must match the actual heap
-	   size. */
-
-	if (size > 0 && size != xnheap_size(&heap->heap_base))
+	if (size != H_ALL)
 		{
 		err = -EINVAL;
 		goto unlock_and_exit;


pgpGNv4QOGFQq.pgp
Description: PGP signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Philippe Gerum wrote:


Jan Kiszka wrote:



Gilles Chanteperdrix wrote:



Jeroen Van den Keybus wrote:


Hello,


I'm currently not at a level to participate in your


discussion. Although I'm


willing to supply you with stresstests, I would nevertheless like


to learn


more from task migration as this debugging session proceeds. In


order to do


so, please confirm the following statements or indicate where I


went wrong.


I hope others may learn from this as well.


xn_shadow_harden(): This is called whenever a Xenomai thread


performs a


Linux (root domain) system call (notified by Adeos ?).


xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a
few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread


(nRT) is marked INTERRUPTIBLE and run by the Linux kernel
wake_up_interruptible_sync() call. Is this thread actually run or


does it


merely put the thread in some Linux to-do list (I assumed the


first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks
itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so
either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be
unlinked from the Linux runqueue before the gatekeeper processes the
resumption request, whatever event the kernel is processing
asynchronously in the meantime. This is the reason why, as you already
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
CPU from the hardening thread whilst keeping it linked to the
runqueue: upon return from such preemption, the gatekeeper might have
run already,  hence the newly hardened thread ends up being seen as
runnable by both the Linux and Xeno schedulers. Rainy day indeed.

We could rely on giving "current" the highest SCHED_FIFO priority in
xnshadow_harden() before waking up the gk, until the gk eventually
promotes it to the Xenomai scheduling mode and downgrades this
priority back to normal, but we would pay additional latencies induced
by each aborted rescheduling attempt that may occur during the atomic
path we want to enforce.

The other way is to make sure that no in-kernel preemption of the
hardening task could occur after step 1) and until step 2) is
performed, given that we cannot currently call schedule() with
interrupts or preemption off. I'm on it.



Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,




Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)



Looks like, indeed.


At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
ready for merge.



No issue with the ipipe side since it only touches the tracer support code. No 
issue either at first sight with the Xeno side, aside of the trace being frozen 
twice in do_schedule_event? (once in this routine, twice in xnpod_fatal); but 
maybe it's wanted to freeze the situation before the stack is dumped

Re: [Xenomai-core] Initialization of a nucleus pod

2006-01-30 Thread Philippe Gerum

Germain Olivier wrote:

Thank you for your response

So rootcb isn't the "scheduler task".
I was thinking it was this task which was determining what thread to run,
depending of its parameters (priority, periodicity, scheduling mode).

I go back to the code to understand how it work ...



Use the simulator to understand the dynamics of this code: it brings you 
single-stepping of the entire Xenomai core over GDB, at source code level.




Germain



xnthread_init does part of the initialization. The low level part of
rootcb (its xnarchtcb_t member) is initialized twice, first by the call
to xnarch_init_tcb in xnthread_init, and then overriden by
xnarch_init_root_tcb in xnpod_init.

For any other thread than root, the thread would be given a stack and
entry point by the call to xnarch_init_thread in xnpod_start_thread. But
the root thread is Xenomai idle task, a placeholder for whatever task
Linux is currenty running. At the time where xnpod_init is called, the
root thread is the current context, so already has a stack and is
already running.

--


Gilles Chanteperdrix.





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--

Philippe.



Re: [Xenomai-core] Scheduling while atomic

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Jan Kiszka wrote:


...
[Update] While writing this mail and letting your test run for a while,
I *did* get a hard lock-up. Hold on, digging deeper...




And here are its last words, spoken via serial console:

c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020
   c012e564 0022  0246 c30d1a90 c4866ce0 0033
c482
   c482a360 c4866ca0  c48293a4 c48524e1  
0002
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] e100_hw_init+0x3ad/0xa81 [e100]
 [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus]
 [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus]
 [] rt_sem_p+0xa6/0x10a [xeno_native]
 [] __rt_sem_p+0x5d/0x66 [xeno_native]
 [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus]
 [] __ipipe_dispatch_event+0x56/0xdd
 [] __ipipe_syscall_root+0x53/0xbe
 [] system_call+0x20/0x41
Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082,
sig=0, prev=gatekeeper/0[809])
 CPU  PIDPRI  TIMEOUT  STAT  NAME


0  0  30   000500080  ROOT


   0  86430   000300180  task0
   0  86529   000300288  task1
   0  8631000300082  main
Timer: oneshot [tickval=1 ns, elapsed=175144731477]

c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500
   c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022
0001
   c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610
c030cd80
Call Trace:
 [] __ipipe_dispatch_event+0x56/0xdd
 [] schedule+0x3ef/0x5ed
 [] gatekeeper_thread+0x0/0x179 [xeno_nucleus]
 [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus]
 [] default_wake_function+0x0/0x12
 [] kthread+0x68/0x95
 [] kthread+0x0/0x95
 [] kernel_thread_helper+0x5/0xb

Any bells already ringing?


Yes; the bad news is that this looks like the same bug than you reported recently, 
which I only partially fixed, it seems. xnshadow_harden() is still not working 
properly under certain preemption situation induced by CONFIG_PREEMPT, and the 
hardening thread is likely unexpectedly moved back to the Linux runqueue while 
transitioning to Xenomai. The good news is that it's a well identified issue, at 
least...




Will try Gilles' patch now...

Jan





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--

Philippe.



Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

[Xenomai]
 o Shadow threads have only one stack, i.e. one context. If the
   real-time part is active (this includes it is blocked on some xnsynch
   object or delayed), the original Linux task must NEVER EVER be
   executed, even if it will immediately fall asleep again. That's
   because the stack is in use by the real-time part at that time. And
   this condition is checked in do_schedule_event() [1].

[Linux]
 o A Linux task which has called set_current_state() will
   remain in the run-queue as long as it calls schedule() on its own.
   This means that it can be preempted (if CONFIG_PREEMPT is set)
   between set_current_state() and schedule() and then even be resumed
   again. Only the explicit call of schedule() will trigger
   deactivate_task() which will in turn remove current from the
   run-queue.

Ok, if this is true, let's have a look at xnshadow_harden(): After
grabbing the gatekeeper sem and putting itself in gk->thread, a task
going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
gatekeeper [2]. This does not include a Linux reschedule due to the
_sync version of wake_up_interruptible. What can happen now?

1) No interruption until we can called schedule() [3]. All fine as we
will not be removed from the run-queue before the gatekeeper starts
kicking our RT part, thus no conflict in using the thread's stack.

3) Interruption by a RT IRQ. This would just delay the path described
above, even if some RT threads get executed. Once they are finished, we
continue in xnshadow_harden() - given that the RT part does not trigger
the following case:

3) Interruption by some Linux IRQ. This may cause other threads to
become runnable as well, but the gatekeeper has the highest prio and
will therefore be the next. The problem is that the rescheduling on
Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
remove it from the Linux run-queue. And now we are in real troubles: The
gatekeeper will kick off our RT part which will take over the thread's
stack. As soon as the RT domain falls asleep and Linux takes over again,
it will continue our non-RT part as well! Actually, this seems to be the
reason for the panic in do_schedule_event(). Without
CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
TIME now, thus violating my first assumption. The system gets fatally
corrupted.



Yep, that's it. And we may not lock out the interrupts before calling schedule to 
prevent that.



Well, I would be happy if someone can prove me wrong here.

The problem is that I don't see a solution because Linux does not
provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
currently considering a hack to remove the migrating Linux thread
manually from the run-queue, but this could easily break the Linux
scheduler.



Maybe the best way would be to provide atomic wakeup-and-schedule support into the 
Adeos patch for Linux tasks; previous attempts to fix this by circumventing the 
potential for preemption from outside of the scheduler code have all failed, and 
this bug is uselessly lingering for that reason.



Jan


PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.


[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573
[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461
[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481





___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core



--

Philippe.



Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Jan Kiszka
Philippe Gerum wrote:
> Gilles Chanteperdrix wrote:
>> Wolfgang Grandegger wrote:
>>  > Therefore we need a dedicated function to re-enable interrupts in
>> the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
>> more  > obvious. On non-PPC archs it would translate to *_irq_enable.
>> I  > realized, that *_irq_enable is used in various place/skins and
>> therefore  > I have not yet provided a patch.
>>
>> The function xnarch_irq_enable seems to be called in only two functions,
>> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.
>>
>> In any case, since I am not sure if this has to be done at the Adeos
>> level or in Xenomai, we will wait for Philippe to come back and decide.
>>
> 
> ->enable() and ->end() all mixed up illustrates a silly x86 bias I once
> had. We do need to differentiate the mere enabling from the IRQ epilogue
> at PIC level since Linux does it - i.e. we don't want to change the
> semantics here.
> 
> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
> Linux naming scheme, and have the proper epilogue done from there on a
> per-arch basis.
> 
> Current uses of xnarch_enable_irq() should be reserved to the
> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
> source at PIC level outside of any ISR context for such interrupt (*).
> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> layer, since the HAL already controls the way interrupts are ended
> actually; it just does it improperly on some platforms.
> 
> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
> to be used from the ISR too in order to revalidate the source at PIC level?
> 

Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Wolfgang Grandegger


> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
> 
> Philippe Gerum wrote:
> > Gilles Chanteperdrix wrote:
> >> Wolfgang Grandegger wrote:
> >>  > Therefore we need a dedicated function to re-enable interrupts in
> >> the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
> >> more  > obvious. On non-PPC archs it would translate to *_irq_enable.
> >> I  > realized, that *_irq_enable is used in various place/skins and
> >> therefore  > I have not yet provided a patch.
> >>
> >> The function xnarch_irq_enable seems to be called in only two
functions,
> >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.
> >>
> >> In any case, since I am not sure if this has to be done at the Adeos
> >> level or in Xenomai, we will wait for Philippe to come back and decide.
> >>
> > 
> > ->enable() and ->end() all mixed up illustrates a silly x86 bias I once
> > had. We do need to differentiate the mere enabling from the IRQ epilogue
> > at PIC level since Linux does it - i.e. we don't want to change the
> > semantics here.
> > 
> > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
> > Linux naming scheme, and have the proper epilogue done from there on a
> > per-arch basis.
> > 
> > Current uses of xnarch_enable_irq() should be reserved to the
> > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
> > source at PIC level outside of any ISR context for such interrupt (*).
> > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> > layer, since the HAL already controls the way interrupts are ended
> > actually; it just does it improperly on some platforms.
> > 
> > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
> > to be used from the ISR too in order to revalidate the source at PIC
level?
> > 
> 
> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
> after an interrupt, and the documentation does not suggest this either.
> I see no problem here.

But RTDM needs a rtdm_irq_end() functions as well in case the
user wants to reenable the interrupt outside the ISR, I think.

Wolfgang.




Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Jan Kiszka
Wolfgang Grandegger wrote:
> 
>> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
>>
>> Philippe Gerum wrote:
>>> Gilles Chanteperdrix wrote:
 Wolfgang Grandegger wrote:
  > Therefore we need a dedicated function to re-enable interrupts in
 the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
 more  > obvious. On non-PPC archs it would translate to *_irq_enable.
 I  > realized, that *_irq_enable is used in various place/skins and
 therefore  > I have not yet provided a patch.

 The function xnarch_irq_enable seems to be called in only two
> functions,
 xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.

 In any case, since I am not sure if this has to be done at the Adeos
 level or in Xenomai, we will wait for Philippe to come back and decide.

>>> ->enable() and ->end() all mixed up illustrates a silly x86 bias I once
>>> had. We do need to differentiate the mere enabling from the IRQ epilogue
>>> at PIC level since Linux does it - i.e. we don't want to change the
>>> semantics here.
>>>
>>> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
>>> Linux naming scheme, and have the proper epilogue done from there on a
>>> per-arch basis.
>>>
>>> Current uses of xnarch_enable_irq() should be reserved to the
>>> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
>>> source at PIC level outside of any ISR context for such interrupt (*).
>>> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
>>> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
>>> layer, since the HAL already controls the way interrupts are ended
>>> actually; it just does it improperly on some platforms.
>>>
>>> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
>>> to be used from the ISR too in order to revalidate the source at PIC
> level?
>> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
>> after an interrupt, and the documentation does not suggest this either.
>> I see no problem here.
> 
> But RTDM needs a rtdm_irq_end() functions as well in case the
> user wants to reenable the interrupt outside the ISR, I think.

If this is a valid use-case, it should be really straightforward to add
this abstraction to RTDM. We should just document that rtdm_irq_end()
shall not be invoked from IRQ context - to avoid breaking the chain in
the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
re-enable the line from the handler.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Wolfgang Grandegger wrote:


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)

Philippe Gerum wrote:


Gilles Chanteperdrix wrote:


Wolfgang Grandegger wrote:
> Therefore we need a dedicated function to re-enable interrupts in
the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
more  > obvious. On non-PPC archs it would translate to *_irq_enable.
I  > realized, that *_irq_enable is used in various place/skins and
therefore  > I have not yet provided a patch.

The function xnarch_irq_enable seems to be called in only two


functions,


xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.

In any case, since I am not sure if this has to be done at the Adeos
level or in Xenomai, we will wait for Philippe to come back and decide.



->enable() and ->end() all mixed up illustrates a silly x86 bias I once
had. We do need to differentiate the mere enabling from the IRQ epilogue
at PIC level since Linux does it - i.e. we don't want to change the
semantics here.

I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
Linux naming scheme, and have the proper epilogue done from there on a
per-arch basis.

Current uses of xnarch_enable_irq() should be reserved to the
non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
source at PIC level outside of any ISR context for such interrupt (*).
XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
layer, since the HAL already controls the way interrupts are ended
actually; it just does it improperly on some platforms.

(*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
to be used from the ISR too in order to revalidate the source at PIC


level?


Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.


But RTDM needs a rtdm_irq_end() functions as well in case the
user wants to reenable the interrupt outside the ISR, I think.



If this is a valid use-case, it should be really straightforward to add
this abstraction to RTDM. We should just document that rtdm_irq_end()
shall not be invoked from IRQ context -


It's the other way around: ->end() would indeed be called from the ISR epilogue, 
and ->enable() would not.


 to avoid breaking the chain in

the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
re-enable the line from the handler.

Jan





--

Philippe.



Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Gilles Chanteperdrix wrote:


Wolfgang Grandegger wrote:
> Therefore we need a dedicated function to re-enable interrupts in
the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
more  > obvious. On non-PPC archs it would translate to *_irq_enable.
I  > realized, that *_irq_enable is used in various place/skins and
therefore  > I have not yet provided a patch.

The function xnarch_irq_enable seems to be called in only two functions,
xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set.

In any case, since I am not sure if this has to be done at the Adeos
level or in Xenomai, we will wait for Philippe to come back and decide.



->enable() and ->end() all mixed up illustrates a silly x86 bias I once
had. We do need to differentiate the mere enabling from the IRQ epilogue
at PIC level since Linux does it - i.e. we don't want to change the
semantics here.

I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the
Linux naming scheme, and have the proper epilogue done from there on a
per-arch basis.

Current uses of xnarch_enable_irq() should be reserved to the
non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ
source at PIC level outside of any ISR context for such interrupt (*).
XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
layer, since the HAL already controls the way interrupts are ended
actually; it just does it improperly on some platforms.

(*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended
to be used from the ISR too in order to revalidate the source at PIC level?




Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.



Ok, so no change would be needed here to implement what's described above.


Jan




--

Philippe.



Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Jan Kiszka
Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Wolfgang Grandegger wrote:
>>
 This is an OpenPGP/MIME signed message (RFC 2440 and 3156)

 Philippe Gerum wrote:

> Gilles Chanteperdrix wrote:
>
>> Wolfgang Grandegger wrote:
>> > Therefore we need a dedicated function to re-enable interrupts in
>> the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
>> more  > obvious. On non-PPC archs it would translate to *_irq_enable.
>> I  > realized, that *_irq_enable is used in various place/skins and
>> therefore  > I have not yet provided a patch.
>>
>> The function xnarch_irq_enable seems to be called in only two
>>>
>>> functions,
>>>
>> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is
>> set.
>>
>> In any case, since I am not sure if this has to be done at the Adeos
>> level or in Xenomai, we will wait for Philippe to come back and
>> decide.
>>
>
> ->enable() and ->end() all mixed up illustrates a silly x86 bias I
> once
> had. We do need to differentiate the mere enabling from the IRQ
> epilogue
> at PIC level since Linux does it - i.e. we don't want to change the
> semantics here.
>
> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with
> the
> Linux naming scheme, and have the proper epilogue done from there on a
> per-arch basis.
>
> Current uses of xnarch_enable_irq() should be reserved to the
> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the
> IRQ
> source at PIC level outside of any ISR context for such interrupt (*).
> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> layer, since the HAL already controls the way interrupts are ended
> actually; it just does it improperly on some platforms.
>
> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it
> intended
> to be used from the ISR too in order to revalidate the source at PIC
>>>
>>> level?
>>>
 Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
 after an interrupt, and the documentation does not suggest this either.
 I see no problem here.
>>>
>>> But RTDM needs a rtdm_irq_end() functions as well in case the
>>> user wants to reenable the interrupt outside the ISR, I think.
>>
>>
>> If this is a valid use-case, it should be really straightforward to add
>> this abstraction to RTDM. We should just document that rtdm_irq_end()
>> shall not be invoked from IRQ context -
> 
> It's the other way around: ->end() would indeed be called from the ISR
> epilogue, and ->enable() would not.

I think we are talking about different things: Wolfgang was asking for
an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the
user API. You are now referring to the back-end which had to provide
some end-mechanism to be used both by the nucleus' ISR epilogue and that
rtdm_irq_end(), and that mechanism shall be told apart from the IRQ
enable/disable API. Well, that's at least how I think to have got it...

> 
>  to avoid breaking the chain in
>> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
>> re-enable the line from the handler.
>>
>> Jan
>>
>>
> 
> 

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Philippe Gerum wrote:

Jan Kiszka wrote:


Hi,

well, if I'm not totally wrong, we have a design problem in the
RT-thread hardening path. I dug into the crash Jeroen reported and I'm
quite sure that this is the reason.

So that's the bad news. The good one is that we can at least work around
it by switching off CONFIG_PREEMPT for Linux (this implicitly means that
it's a 2.6-only issue).

@Jeroen: Did you verify that your setup also works fine without
CONFIG_PREEMPT?

But let's start with two assumptions my further analysis is based on:

[Xenomai]
 o Shadow threads have only one stack, i.e. one context. If the
   real-time part is active (this includes it is blocked on some xnsynch
   object or delayed), the original Linux task must NEVER EVER be
   executed, even if it will immediately fall asleep again. That's
   because the stack is in use by the real-time part at that time. And
   this condition is checked in do_schedule_event() [1].

[Linux]
 o A Linux task which has called set_current_state() will
   remain in the run-queue as long as it calls schedule() on its own.
   This means that it can be preempted (if CONFIG_PREEMPT is set)
   between set_current_state() and schedule() and then even be resumed
   again. Only the explicit call of schedule() will trigger
   deactivate_task() which will in turn remove current from the
   run-queue.

Ok, if this is true, let's have a look at xnshadow_harden(): After
grabbing the gatekeeper sem and putting itself in gk->thread, a task
going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the
gatekeeper [2]. This does not include a Linux reschedule due to the
_sync version of wake_up_interruptible. What can happen now?

1) No interruption until we can called schedule() [3]. All fine as we
will not be removed from the run-queue before the gatekeeper starts
kicking our RT part, thus no conflict in using the thread's stack.

3) Interruption by a RT IRQ. This would just delay the path described
above, even if some RT threads get executed. Once they are finished, we
continue in xnshadow_harden() - given that the RT part does not trigger
the following case:

3) Interruption by some Linux IRQ. This may cause other threads to
become runnable as well, but the gatekeeper has the highest prio and
will therefore be the next. The problem is that the rescheduling on
Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT
remove it from the Linux run-queue. And now we are in real troubles: The
gatekeeper will kick off our RT part which will take over the thread's
stack. As soon as the RT domain falls asleep and Linux takes over again,
it will continue our non-RT part as well! Actually, this seems to be the
reason for the panic in do_schedule_event(). Without
CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME
TIME now, thus violating my first assumption. The system gets fatally
corrupted.



Yep, that's it. And we may not lock out the interrupts before calling 
schedule to prevent that.



Well, I would be happy if someone can prove me wrong here.

The problem is that I don't see a solution because Linux does not
provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm
currently considering a hack to remove the migrating Linux thread
manually from the run-queue, but this could easily break the Linux
scheduler.



Maybe the best way would be to provide atomic wakeup-and-schedule 
support into the Adeos patch for Linux tasks; previous attempts to fix 
this by circumventing the potential for preemption from outside of the 
scheduler code have all failed, and this bug is uselessly lingering for 
that reason.


Having slept on this, I'm going to add a simple extension to the Linux scheduler 
available from Adeos, in order to get an atomic/unpreemptable path from the 
statement when the current task's state is changed for suspension (e.g. 
TASK_INTERRUPTIBLE), to the point where schedule() normally enters its atomic 
section, which looks like the sanest way to solve this issue, i.e. without gory 
hackery all over the place. Patch will follow later for testing this approach.





Jan


PS: Out of curiosity I also checked RTAI's migration mechanism in this
regard. It's similar except for the fact that it does the gatekeeper's
work in the Linux scheduler's tail (i.e. after the next context switch).
And RTAI seems it suffers from the very same race. So this is either a
fundamental issue - or I'm fundamentally wrong.


[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573 

[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461 

[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481 







___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core







--

Philippe.



RE: [Xenomai-core] broken docs

2006-01-30 Thread ROSSIER Daniel
Dear Xenomai workers,

Would it be possible to have an updated API documentation for Xenomai 2.0.x ? 
(I mean with formal parameters in function prototypes)

I tried to regenerate it with make generate-doc, but it seems that a SVN 
working dir is required.

It would be great.

Thanks a lot

Daniel

> -Message d'origine-
> De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De
> la part de Jan Kiszka
> Envoyé : mercredi, 18. janvier 2006 20:03
> À : xenomai-core
> Objet : [Xenomai-core] broken docs
> 
> Hi,
> 
> I noticed that the doxygen API output is partly broken. Under the
> nucleus and native skin modules most functions became variables. I
> haven't looked at the source yet, but I guess it should be resolvable,
> especially as the RTDM docs are fine. This mail is to file the issue,
> maybe I will have a look later - /maybe/.
> 
> Moreover, I was lacking a reference to RT_MUTEX_INFO. Does this
> structure just needs to be added to the correct doxygen group? I guess
> there are other undocumented structures out there as well (RT_TASK_INFO,
> ...?).
> 
> Jan




Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Wolfgang Grandegger


> This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
> 
> Philippe Gerum wrote:
> > Jan Kiszka wrote:
> >> Wolfgang Grandegger wrote:
> >>
>  This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
> 
>  Philippe Gerum wrote:
> 
> > Gilles Chanteperdrix wrote:
> >
> >> Wolfgang Grandegger wrote:
> >> > Therefore we need a dedicated function to re-enable interrupts in
> >> the  > ISR. We could name it *_end_irq, but maybe
*_enable_isr_irq is
> >> more  > obvious. On non-PPC archs it would translate to
*_irq_enable.
> >> I  > realized, that *_irq_enable is used in various place/skins and
> >> therefore  > I have not yet provided a patch.
> >>
> >> The function xnarch_irq_enable seems to be called in only two
> >>>
> >>> functions,
> >>>
> >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is
> >> set.
> >>
> >> In any case, since I am not sure if this has to be done at the
Adeos
> >> level or in Xenomai, we will wait for Philippe to come back and
> >> decide.
> >>
> >
> > ->enable() and ->end() all mixed up illustrates a silly x86 bias I
> > once
> > had. We do need to differentiate the mere enabling from the IRQ
> > epilogue
> > at PIC level since Linux does it - i.e. we don't want to change the
> > semantics here.
> >
> > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with
> > the
> > Linux naming scheme, and have the proper epilogue done from
there on a
> > per-arch basis.
> >
> > Current uses of xnarch_enable_irq() should be reserved to the
> > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the
> > IRQ
> > source at PIC level outside of any ISR context for such
interrupt (*).
> > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
> > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
> > layer, since the HAL already controls the way interrupts are ended
> > actually; it just does it improperly on some platforms.
> >
> > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it
> > intended
> > to be used from the ISR too in order to revalidate the source at PIC
> >>>
> >>> level?
> >>>
>  Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
>  after an interrupt, and the documentation does not suggest this
either.
>  I see no problem here.
> >>>
> >>> But RTDM needs a rtdm_irq_end() functions as well in case the
> >>> user wants to reenable the interrupt outside the ISR, I think.
> >>
> >>
> >> If this is a valid use-case, it should be really straightforward to add
> >> this abstraction to RTDM. We should just document that rtdm_irq_end()
> >> shall not be invoked from IRQ context -
> > 
> > It's the other way around: ->end() would indeed be called from the ISR
> > epilogue, and ->enable() would not.
> 
> I think we are talking about different things: Wolfgang was asking for
> an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the
> user API. You are now referring to the back-end which had to provide
> some end-mechanism to be used both by the nucleus' ISR epilogue and that
> rtdm_irq_end(), and that mechanism shall be told apart from the IRQ
> enable/disable API. Well, that's at least how I think to have got it...

Yep, I was thinking of deferred interrupt handling in a real-time task.
Then rtdm_irq_end() would be required.
 
> > 
> >  to avoid breaking the chain in
> >> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
> >> re-enable the line from the handler.
> >>
> >> Jan
> >>
> >>
> > 
> > 
> 
> Jan
> 
> 
> 

Wolfgang.



Re: [Xenomai-core] Missing IRQ end function on PowerPC

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Philippe Gerum wrote:


Jan Kiszka wrote:


Wolfgang Grandegger wrote:



This is an OpenPGP/MIME signed message (RFC 2440 and 3156)

Philippe Gerum wrote:



Gilles Chanteperdrix wrote:



Wolfgang Grandegger wrote:


Therefore we need a dedicated function to re-enable interrupts in


the  > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is
more  > obvious. On non-PPC archs it would translate to *_irq_enable.
I  > realized, that *_irq_enable is used in various place/skins and
therefore  > I have not yet provided a patch.

The function xnarch_irq_enable seems to be called in only two


functions,



xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is
set.

In any case, since I am not sure if this has to be done at the Adeos
level or in Xenomai, we will wait for Philippe to come back and
decide.



->enable() and ->end() all mixed up illustrates a silly x86 bias I
once
had. We do need to differentiate the mere enabling from the IRQ
epilogue
at PIC level since Linux does it - i.e. we don't want to change the
semantics here.

I would go for adding xnarch_end_irq -> rthal_irq_end to stick with
the
Linux naming scheme, and have the proper epilogue done from there on a
per-arch basis.

Current uses of xnarch_enable_irq() should be reserved to the
non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the
IRQ
source at PIC level outside of any ISR context for such interrupt (*).
XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of
xnarch_enable_irq. I see no reason for this fix to leak to the Adeos
layer, since the HAL already controls the way interrupts are ended
actually; it just does it improperly on some platforms.

(*) Jan, does rtdm_irq_enable() have the same meaning, or is it
intended
to be used from the ISR too in order to revalidate the source at PIC


level?



Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line
after an interrupt, and the documentation does not suggest this either.
I see no problem here.


But RTDM needs a rtdm_irq_end() functions as well in case the
user wants to reenable the interrupt outside the ISR, I think.



If this is a valid use-case, it should be really straightforward to add
this abstraction to RTDM. We should just document that rtdm_irq_end()
shall not be invoked from IRQ context -


It's the other way around: ->end() would indeed be called from the ISR
epilogue, and ->enable() would not.



I think we are talking about different things: Wolfgang was asking for
an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the
user API. You are now referring to the back-end which had to provide
some end-mechanism to be used both by the nucleus' ISR epilogue and that
rtdm_irq_end(), and that mechanism shall be told apart from the IRQ
enable/disable API. Well, that's at least how I think to have got it...



My point was only about naming here: *_end() should be reserved for the epilogue 
stuff, hence be callable from ISR context. Aside of that, I'm ok with the 
abstraction you described though.





to avoid breaking the chain in


the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to
re-enable the line from the handler.

Jan







Jan




--

Philippe.



Re: [Xenomai-core] [PATCH] fix pthread cancellation in native skin

2006-01-30 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > Hi,
 > 
 > Gilles' work on cancellation for the posix skin reminded me of this
 > issue I once discovered in the native skin:
 > 
 > https://mail.gna.org/public/xenomai-core/2005-12/msg00014.html
 > 
 > I found out that this can easily be fixed by switching the pthread of a
 > native task to PTHREAD_CANCEL_ASYNCHRONOUS. See attached patch.
 > 
 > 
 > At this chance I discovered that calling rt_task_delete for a task that
 > was created and started with T_SUSP mode but was not yet resumed, locks
 > up the system. More precisely: it raises a fatal warning when
 > XENO_OPT_DEBUG is on. Might be the case that it just works on system
 > without this switched on. Either this is a real bug, or the warning
 > needs to be fixed. (Deleting a task after rt_task_suspend works.)

Actually, the fatal warning happens when starting with rt_task_start the
task which was created with the T_SUSP bit. The task needs to wake-up in
xnshadow_wait_barrier until it gets really suspended in primary mode by
the final xnshadow_harden. This situation triggers the fatal, because
the thread has the nucleus XNSUSP bit and is running in secondary mode.

This is not the only situation where a thread with a nucleus suspension
bit need to run shortly in secondary mode: it also occurs when
suspending with xnpod_suspend_thread() a thread running in secondary
mode; the thread receives the SIGCHLD signal and need to execute shortly
with the suspension bit set in order to cause a migration to primary
mode.

So, the only case when we are sure that a user-space thread can not be
scheduled by Linux seems to be when this thread does not have the
XNRELAX bit.

-- 


Gilles Chanteperdrix.



Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Dmitry Adamushko
>> ...
> I have not checked it yet but my presupposition that something as easy as :
>> preempt_disable()>> wake_up_interruptible_sync();> schedule();>> preempt_enable();It's a no-go: "scheduling while atomic". One of my first attempts tosolve it.

My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call schedule() while being non-preemptible.
To this end, ACTIVE_PREEMPT is set up.
The use of preempt_enable/disable() here is wrong.
 The only way to enter schedule() without being preemptible is viaACTIVE_PREEMPT. But the effect of that flag should be well-known now.
Kind of Gordian knot. :(
Maybe I have missed something so just for my curiosity : what does prevent the use of PREEMPT_ACTIVE here?
We don't have a "preempted while atomic" message here as it seems to be
a legal way to call schedule() with that flag being set up.
 >>> could work... err.. and don't blame me if no, it's some one else who has
> written that nonsense :o)>> --> Best regards,> Dmitry Adamushko>Jan-- Best regards,Dmitry Adamushko


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Jan Kiszka
Dmitry Adamushko wrote:
>>> ...
> 
>> I have not checked it yet but my presupposition that something as easy as
>> :
>>> preempt_disable()
>>>
>>> wake_up_interruptible_sync();
>>> schedule();
>>>
>>> preempt_enable();
>> It's a no-go: "scheduling while atomic". One of my first attempts to
>> solve it.
> 
> 
> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.
> To this end, ACTIVE_PREEMPT is set up.
> The use of preempt_enable/disable() here is wrong.
> 
> 
> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.
>> Kind of Gordian knot. :(
> 
> 
> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?
> We don't have a "preempted while atomic" message here as it seems to be a
> legal way to call schedule() with that flag being set up.

When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.

> 
> 
>>> could work... err.. and don't blame me if no, it's some one else who has
>>> written that nonsense :o)
>>>
>>> --
>>> Best regards,
>>> Dmitry Adamushko
>>>
>> Jan
>>
>>
>>
>>
> 
> 
> --
> Best regards,
> Dmitry Adamushko
> 
> 
> 
> 
> 
> ___
> Xenomai-core mailing list
> Xenomai-core@gna.org
> https://mail.gna.org/listinfo/xenomai-core




signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] broken docs

2006-01-30 Thread Jan Kiszka
ROSSIER Daniel wrote:
> Dear Xenomai workers,
> 
> Would it be possible to have an updated API documentation for Xenomai
> 2.0.x ? (I mean with formal parameters in function prototypes)
> 
> I tried to regenerate it with make generate-doc, but it seems that a
> SVN working dir is required.
> 
> It would be great.

I just had a "quick" look at the status of the documentation in
SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express
it politely) when it comes to tracking down bugs in your formatting.
Something is broken in all modules except RTDM, and although I spent *a
lot* of time in getting RTDM correctly formatted, I cannot tell what's
wrong with the rest. This will require some long evenings of
continuous patching the docs, recompiling, and checking the result. Any
volunteers - I'm lacking the time? :-/

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] broken docs

2006-01-30 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 > ROSSIER Daniel wrote:
 > > Dear Xenomai workers,
 > > 
 > > Would it be possible to have an updated API documentation for Xenomai
 > > 2.0.x ? (I mean with formal parameters in function prototypes)
 > > 
 > > I tried to regenerate it with make generate-doc, but it seems that a
 > > SVN working dir is required.

make generate-doc is needed for maintenance only. If you want to
generate doxygen documentation, simply add --enable-dox-doc to Xenomai
configure command line.

 > > 
 > > It would be great.
 > 
 > I just had a "quick" look at the status of the documentation in
 > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express
 > it politely) when it comes to tracking down bugs in your formatting.
 > Something is broken in all modules except RTDM, and although I spent *a
 > lot* of time in getting RTDM correctly formatted, I cannot tell what's
 > wrong with the rest. This will require some long evenings of
 > continuous patching the docs, recompiling, and checking the result. Any
 > volunteers - I'm lacking the time? :-/

Looking at the difference between RTDM documentation blocks and the
other modules is that the other modules use the "fn" tag. Removing the
"fn" tag from other modules documentation blocks seems to solve the
issue.

-- 


Gilles Chanteperdrix.



Re: [Xenomai-core] broken docs

2006-01-30 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>  > ROSSIER Daniel wrote:
>  > > Dear Xenomai workers,
>  > > 
>  > > Would it be possible to have an updated API documentation for Xenomai
>  > > 2.0.x ? (I mean with formal parameters in function prototypes)
>  > > 
>  > > I tried to regenerate it with make generate-doc, but it seems that a
>  > > SVN working dir is required.
> 
> make generate-doc is needed for maintenance only. If you want to
> generate doxygen documentation, simply add --enable-dox-doc to Xenomai
> configure command line.
> 
>  > > 
>  > > It would be great.
>  > 
>  > I just had a "quick" look at the status of the documentation in
>  > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express
>  > it politely) when it comes to tracking down bugs in your formatting.
>  > Something is broken in all modules except RTDM, and although I spent *a
>  > lot* of time in getting RTDM correctly formatted, I cannot tell what's
>  > wrong with the rest. This will require some long evenings of
>  > continuous patching the docs, recompiling, and checking the result. Any
>  > volunteers - I'm lacking the time? :-/
> 
> Looking at the difference between RTDM documentation blocks and the
> other modules is that the other modules use the "fn" tag. Removing the
> "fn" tag from other modules documentation blocks seems to solve the
> issue.
> 

Indeed, works. Amazingly blind I was.

Anyway, it still needs some work to remove that stuff (I wonder what the
"correct" usage of @fn is...) and to wrap functions without bodies via
"#ifdef DOXYGEN_CPP" like RTDM does. At this chance, I would also
suggest to replace all \tag by @tag for the sake of a unified style (and
who knows what side effects mixing up both may have).

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Dmitry Adamushko
On 30/01/06, Jan Kiszka <[EMAIL PROTECTED]> wrote:
Dmitry Adamushko wrote:>>> ...>>> I have not checked it yet but my presupposition that something as easy as>> :>>> preempt_disable()>> wake_up_interruptible_sync();
>>> schedule();>> preempt_enable();>> It's a no-go: "scheduling while atomic". One of my first attempts to>> solve it.>>> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call
> schedule() while being non-preemptible.> To this end, ACTIVE_PREEMPT is set up.> The use of preempt_enable/disable() here is wrong.>>> The only way to enter schedule() without being preemptible is via
>> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.>> Kind of Gordian knot. :(>>> Maybe I have missed something so just for my curiosity : what does prevent
> the use of PREEMPT_ACTIVE here?> We don't have a "preempted while atomic" message here as it seems to be a> legal way to call schedule() with that flag being set up.When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from
the run queue - independent of its current status.
Err...  that's exactly the reason I have explained in my first
mail for this thread :) Blah.. I wish I was smoking something special
before so I would point that as the reason of my forgetfulness.

Actually, we could use PREEMPT_ACTIVE indeed + something else (probably
another flag) to distinguish between a case when PREEMPT_ACTIVE is set
by Linux and another case when it's set by xnshadow_harden().

xnshadow_harden()
{
struct task_struct *this_task = current;
...
xnthread_t *thread = xnshadow_thread(this_task);

if (!thread)
    return;

...
gk->thread = thread;

+ add_preempt_count(PREEMPT_ACTIVE);

// should be checked in schedule()
+ xnthread_set_flags(thread, XNATOMIC_TRANSIT);

set_current_state(TASK_INTERRUPTIBLE);
wake_up_interruptible_sync(&gk->waitq);
+ schedule();

+ sub_preempt_count(PREEMPT_ACTIVE);
...
}

Then, something like the following code should be called from schedule() : 

void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq)
{
xnthread_t *thread = xnshadow_thread(task);

if (!thread)
    return;

if (xnthread_test_flags(thread, XNATOMIC_TRANSIT))
    {
    xnthread_clear_flags(thread, XNATOMIC_TRANSIT);
    deactivate_task(task, rq);
    }
}

-

schedule.c : 
...

    switch_count = &prev->nivcsw;

    if (prev->state && !(preempt_count()
& PREEMPT_ACTIVE))         switch_count = &prev->nvcsw;

        if (unlikely((prev->state & TASK_INTERRUPTIBLE) &&

                unlikely(signal_pending(prev))
))
            prev->state = TASK_RUNNING;
        else {
            if (prev->state == TASK_UNINTERRUPTIBLE)
                rq->nr_uninterruptible++;
           
deactivate_task(prev, rq);
        }
    }

// removes a task from the active queue if PREEMPT_ACTIVE + // XNATOMIC_TRANSIT

+ #ifdef CONFIG_IPIPE
+ ipipe_transit_cleanup(prev, rq);
+ #endif /* CONFIG_IPIPE */
...

Not very gracefully maybe, but could work or am I missing something important?

-- 
Best regards,Dmitry Adamushko




[Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if 
the interrupt handler takes too long (i.e. next interrupt gets generated before 
the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


Any ideas of where to look?

Regards

Anders Blomdell





Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Jan Kiszka
Anders Blomdell wrote:
> On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
> following if the interrupt handler takes too long (i.e. next interrupt
> gets generated before the previous one has finished)
> 
> [   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
> [   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
> [   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
> [   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
> [   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   42.923029]  [] 0x0
> [   42.959695]  [c0038348] __do_IRQ+0x134/0x164
> [   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
> [   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
> [   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
> [   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
> [   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   43.411145]  [c0006524] default_idle+0x10/0x60
> 

I think some probably important information is missing above this
back-trace. What does the kernel state before these lines?

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Jan Kiszka wrote:

Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:
> Hello,
> 
> 
> I'm currently not at a level to participate in your discussion. Although I'm

> willing to supply you with stresstests, I would nevertheless like to learn
> more from task migration as this debugging session proceeds. In order to do
> so, please confirm the following statements or indicate where I went wrong.
> I hope others may learn from this as well.
> 
> xn_shadow_harden(): This is called whenever a Xenomai thread performs a
> Linux (root domain) system call (notified by Adeos ?). 


xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or does it
> merely put the thread in some Linux to-do list (I assumed the first case) ?

Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as



Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so either, i.e.

1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be unlinked from 
the Linux runqueue before the gatekeeper processes the resumption request, 
whatever event the kernel is processing asynchronously in the meantime. This is 
the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our 
toy by stealing the CPU from the hardening thread whilst keeping it linked to the 
runqueue: upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both the Linux 
and Xeno schedulers. Rainy day indeed.


We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually promotes it to 
the Xenomai scheduling mode and downgrades this priority back to normal, but we 
would pay additional latencies induced by each aborted rescheduling attempt that 
may occur during the atomic path we want to enforce.


The other way is to make sure that no in-kernel preemption of the hardening task 
could occur after step 1) and until step 2) is performed, given that we cannot 
currently call schedule() with interrupts or preemption off. I'm on it.





suspended is not needed, since the gatekeeper may have a high priority,
and calling schedule() is enough. In any case, the waken up thread does
not seem to be run immediately, so this rather look like the second
case.

Since in xnshadow_harden, the running thread marks itself as suspended
before running wake_up_interruptible_sync, the gatekeeper will run when
schedule() get called, which in turn, depend on the CONFIG_PREEMPT*
configuration. In the non-preempt case, the current thread will be
suspended and the gatekeeper will run when schedule() is explicitely
called in xnshadow_harden(). In the preempt case, schedule gets called
when the outermost spinlock is unlocked in wake_up_interruptible_sync().

> And how does it terminate: is only the system call migrated or is the thread
> allowed to continue run (at a priority level equal to the Xenomai
> priority level) until it hits something of the Xenomai API (or trivially:
> explicitly go to RT using th

[Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Anders Blomdell
In the following code (ppc), shouldn't first be either declared static or 
deleted? To me it looks like first is always equal to one when the else clause 
is evaluated.


asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) >= 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2 && first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain &&
!test_bit(IPIPE_STALL_FLAG,
  &ipipe_root_domain->cpudata[cpuid].status));
}


Regards

Anders Blomdell





Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Philippe Gerum wrote:

Jan Kiszka wrote:


Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?


Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so 
either, i.e.


1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.


We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.


The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.




Could anyone interested in this issue test the following couple of patches?

atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

Both patches are needed to fix the issue.

TIA,

--

Philippe.



Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Philippe Gerum

Philippe Gerum wrote:

Jan Kiszka wrote:


Gilles Chanteperdrix wrote:


Jeroen Van den Keybus wrote:
> Hello,
> > > I'm currently not at a level to participate in your discussion. 
Although I'm
> willing to supply you with stresstests, I would nevertheless like 
to learn
> more from task migration as this debugging session proceeds. In 
order to do
> so, please confirm the following statements or indicate where I 
went wrong.

> I hope others may learn from this as well.
> > xn_shadow_harden(): This is called whenever a Xenomai thread 
performs a

> Linux (root domain) system call (notified by Adeos ?).
xnshadow_harden() is called whenever a thread running in secondary
mode (that is, running as a regular Linux thread, handled by Linux
scheduler) is switching to primary mode (where it will run as a Xenomai
thread, handled by Xenomai scheduler). Migrations occur for some system
calls. More precisely, Xenomai skin system calls tables associates a few
flags with each system call, and some of these flags cause migration of
the caller when it issues the system call.

Each Xenomai user-space thread has two contexts, a regular Linux
thread context, and a Xenomai thread called "shadow" thread. Both
contexts share the same stack and program counter, so that at any time,
at least one of the two contexts is seen as suspended by the scheduler
which handles it.

Before xnshadow_harden is called, the Linux thread is running, and its
shadow is seen in suspended state with XNRELAX bit by Xenomai
scheduler. After xnshadow_harden, the Linux context is seen suspended
with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
running by Xenomai scheduler.

The migrating thread
> (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
> wake_up_interruptible_sync() call. Is this thread actually run or 
does it
> merely put the thread in some Linux to-do list (I assumed the first 
case) ?


Here, I am not sure, but it seems that when calling
wake_up_interruptible_sync the woken up task is put in the current CPU
runqueue, and this task (i.e. the gatekeeper), will not run until the
current thread (i.e. the thread running xnshadow_harden) marks itself as
suspended and calls schedule(). Maybe, marking the running thread as




Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
here - and a switch if the prio of the woken up task is higher.

BTW, an easy way to enforce the current trouble is to remove the "_sync"
from wake_up_interruptible. As I understand it this _sync is just an
optimisation hint for Linux to avoid needless scheduler runs.



You could not guarantee the following execution sequence doing so 
either, i.e.


1- current wakes up the gatekeeper
2- current goes sleeping to exit the Linux runqueue in schedule()
3- the gatekeeper resumes the shadow-side of the old current

The point is all about making 100% sure that current is going to be 
unlinked from the Linux runqueue before the gatekeeper processes the 
resumption request, whatever event the kernel is processing 
asynchronously in the meantime. This is the reason why, as you already 
noticed, preempt_schedule_irq() nicely breaks our toy by stealing the 
CPU from the hardening thread whilst keeping it linked to the runqueue: 
upon return from such preemption, the gatekeeper might have run already, 
 hence the newly hardened thread ends up being seen as runnable by both 
the Linux and Xeno schedulers. Rainy day indeed.


We could rely on giving "current" the highest SCHED_FIFO priority in 
xnshadow_harden() before waking up the gk, until the gk eventually 
promotes it to the Xenomai scheduling mode and downgrades this priority 
back to normal, but we would pay additional latencies induced by each 
aborted rescheduling attempt that may occur during the atomic path we 
want to enforce.


The other way is to make sure that no in-kernel preemption of the 
hardening task could occur after step 1) and until step 2) is performed, 
given that we cannot currently call schedule() with interrupts or 
preemption off. I'm on it.




> Could anyone interested in this issue test the following couple of patches?

> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2

> Both patches are needed to fix the issue.

> TIA,

And now, Ladies and Gentlemen, with the patches attached.

--

Philippe.

--- 2.6.15-x86/kernel/sched.c	2006-01-07 15:18:31.0 +0100
+++ 2.6.15-ipipe/kernel/sched.c	2006-01-30 15:15:27.0 +0100
@@ -2963,7 +2963,7 @@
 	 * Otherwise, whine if we are scheduling when we should not be.
 	 */
 	if (likely(!current->exit_state)) {
-		if (unlikely(in_atomic())) {
+		if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) {
 			printk(KERN_ERR "scheduling while atomic: "
 "%s/0x%08x/%d\n",
 current->comm, preempt_count(), current->pid);
@@ -2972,8 +2972,13 @@
 	}
 	profile_hit(SCHED_PROFILING, __builtin_return_ad

Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Heikki Lindholm

Anders Blomdell kirjoitti:
In the following code (ppc), shouldn't first be either declared static 
or deleted? To me it looks like first is always equal to one when the 
else clause is evaluated.


You're right. "first" doesn't need to be there at all, it's probably an 
old copy of something in the kernel.



asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) >= 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2 && first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain &&
!test_bit(IPIPE_STALL_FLAG,
  &ipipe_root_domain->cpudata[cpuid].status));
}


Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core





Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq

2006-01-30 Thread Philippe Gerum

Heikki Lindholm wrote:

Anders Blomdell kirjoitti:

In the following code (ppc), shouldn't first be either declared static 
or deleted? To me it looks like first is always equal to one when the 
else clause is evaluated.



You're right. "first" doesn't need to be there at all, it's probably an 
old copy of something in the kernel.




Yep; used to be a while() loop in the original implementation we do not perform 
here.


asmlinkage int __ipipe_grab_irq(struct pt_regs *regs)
{
extern int ppc_spurious_interrupts;
ipipe_declare_cpuid;
int irq, first = 1;

if ((irq = ppc_md.get_irq(regs)) >= 0) {
__ipipe_handle_irq(irq, regs);
first = 0;
} else if (irq != -2 && first)
ppc_spurious_interrupts++;

ipipe_load_cpuid();

return (ipipe_percpu_domain[cpuid] == ipipe_root_domain &&
!test_bit(IPIPE_STALL_FLAG,
  &ipipe_root_domain->cpudata[cpuid].status));
}


Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--

Philippe.



Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT

2006-01-30 Thread Jan Kiszka
Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
 Jeroen Van den Keybus wrote:
 > Hello,
 > > > I'm currently not at a level to participate in your
 discussion. Although I'm
 > willing to supply you with stresstests, I would nevertheless like
 to learn
 > more from task migration as this debugging session proceeds. In
 order to do
 > so, please confirm the following statements or indicate where I
 went wrong.
 > I hope others may learn from this as well.
 > > xn_shadow_harden(): This is called whenever a Xenomai thread
 performs a
 > Linux (root domain) system call (notified by Adeos ?).
 xnshadow_harden() is called whenever a thread running in secondary
 mode (that is, running as a regular Linux thread, handled by Linux
 scheduler) is switching to primary mode (where it will run as a Xenomai
 thread, handled by Xenomai scheduler). Migrations occur for some system
 calls. More precisely, Xenomai skin system calls tables associates a
 few
 flags with each system call, and some of these flags cause migration of
 the caller when it issues the system call.

 Each Xenomai user-space thread has two contexts, a regular Linux
 thread context, and a Xenomai thread called "shadow" thread. Both
 contexts share the same stack and program counter, so that at any time,
 at least one of the two contexts is seen as suspended by the scheduler
 which handles it.

 Before xnshadow_harden is called, the Linux thread is running, and its
 shadow is seen in suspended state with XNRELAX bit by Xenomai
 scheduler. After xnshadow_harden, the Linux context is seen suspended
 with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
 running by Xenomai scheduler.

 The migrating thread
 > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
 > wake_up_interruptible_sync() call. Is this thread actually run or
 does it
 > merely put the thread in some Linux to-do list (I assumed the
 first case) ?

 Here, I am not sure, but it seems that when calling
 wake_up_interruptible_sync the woken up task is put in the current CPU
 runqueue, and this task (i.e. the gatekeeper), will not run until the
 current thread (i.e. the thread running xnshadow_harden) marks
 itself as
 suspended and calls schedule(). Maybe, marking the running thread as
>>>
>>>
>>>
>>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>> here - and a switch if the prio of the woken up task is higher.
>>>
>>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>> from wake_up_interruptible. As I understand it this _sync is just an
>>> optimisation hint for Linux to avoid needless scheduler runs.
>>>
>>
>> You could not guarantee the following execution sequence doing so
>> either, i.e.
>>
>> 1- current wakes up the gatekeeper
>> 2- current goes sleeping to exit the Linux runqueue in schedule()
>> 3- the gatekeeper resumes the shadow-side of the old current
>>
>> The point is all about making 100% sure that current is going to be
>> unlinked from the Linux runqueue before the gatekeeper processes the
>> resumption request, whatever event the kernel is processing
>> asynchronously in the meantime. This is the reason why, as you already
>> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>> CPU from the hardening thread whilst keeping it linked to the
>> runqueue: upon return from such preemption, the gatekeeper might have
>> run already,  hence the newly hardened thread ends up being seen as
>> runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>
>> We could rely on giving "current" the highest SCHED_FIFO priority in
>> xnshadow_harden() before waking up the gk, until the gk eventually
>> promotes it to the Xenomai scheduling mode and downgrades this
>> priority back to normal, but we would pay additional latencies induced
>> by each aborted rescheduling attempt that may occur during the atomic
>> path we want to enforce.
>>
>> The other way is to make sure that no in-kernel preemption of the
>> hardening task could occur after step 1) and until step 2) is
>> performed, given that we cannot currently call schedule() with
>> interrupts or preemption off. I'm on it.
>>
> 
> Could anyone interested in this issue test the following couple of patches?
> 
> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
> 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
> 
> Both patches are needed to fix the issue.
> 
> TIA,
> 

Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
re

Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 

You are so right!

> What does the kernel state before these lines?

[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0, 
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt is a 
shared one (Harrier chip, "Functional Exception"), that is used for both 
message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ 
handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), 
because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has 
left the interrupts turned off.


What I believe should be done, is

  1. When UART interrupt is received, disable further non-RT interrupts
 on this IRQ-line, pend interrupt to Linux.
  2. Handle RT interrupts on this IRQ line
  3. When Linux has finished the pended interrupt, reenable non-RT interrupts.

but I have neither been able to achieve this, nor to verify that it is the right 
thing to do...


Regards

Anders Blomdell




Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Philippe Gerum

Anders Blomdell wrote:
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the 
following if the interrupt handler takes too long (i.e. next interrupt 
gets generated before the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184


Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock for any 
given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt 
at hw level. So we seem to have:


spin_lock_irqsave(&desc->lock)

__ipipe_grab_irq
__ipipe_handle_irq
__ipipe_ack_irq
spin_lock...(&desc->lock)
deadlock.

The point is about having spinlock_irqsave only _virtually_ masking the interrupts 
by preventing their associated Linux handler from being called, but despite this, 
Adeos still actually acquires and acknowledges the incoming hw events before 
logging them, even if their associated action happen to be postponed until 
spinlock_irq_restore() is called.


To solve this, all spinlocks potentially touched by the ipipe's primary IRQ 
handler and/or the code it calls indirectly, _must_ be operated using the _hw() 
call variant all over the kernel, so that no hw IRQ can be taken while those 
spinlocks are held by Linux. Usually, only the spinlock(s) protecting the 
interrupt descriptors or the PIC hardware are concerned.



[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


Any ideas of where to look?

Regards

Anders Blomdell



___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core




--

Philippe.



Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Jan Kiszka
Anders Blomdell wrote:
> Jan Kiszka wrote:
>> Anders Blomdell wrote:
>>
>>> On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
>>> following if the interrupt handler takes too long (i.e. next interrupt
>>> gets generated before the previous one has finished)
>>>
>>> [   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
>>> [   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
>>> [   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
>>> [   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
>>> [   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
>>> [   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
>>> [   42.923029]  [] 0x0
>>> [   42.959695]  [c0038348] __do_IRQ+0x134/0x164
>>> [   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
>>> [   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
>>> [   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
>>> [   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
>>> [   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
>>> [   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
>>> [   43.411145]  [c0006524] default_idle+0x10/0x60
>>>
>>
>>
>> I think some probably important information is missing above this
>> back-trace. 
> You are so right!
> 
>> What does the kernel state before these lines?
> 
> [   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
> [   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0,
> .owner_cpu: 0
> [   42.511681] Call trace:
> [   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
> [   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
> [   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
> [   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
> [   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   42.923029]  [] 0x0
> [   42.959695]  [c0038348] __do_IRQ+0x134/0x164
> [   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
> [   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
> [   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
> [   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
> [   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
> [   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
> [   43.411145]  [c0006524] default_idle+0x10/0x60
> 
> 
> It might be that the problem is related to the fact that the interrupt
> is a shared one (Harrier chip, "Functional Exception"), that is used for
> both message-passing (should be RT) and UART (Linux, i.e. non-RT), my
> current IRQ handler always pends the interrupt to the linux domain
> (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when
> it wasn't a UART interrupt) has left the interrupts turned off.
> 
> What I believe should be done, is
> 
>   1. When UART interrupt is received, disable further non-RT interrupts
>  on this IRQ-line, pend interrupt to Linux.
>   2. Handle RT interrupts on this IRQ line
>   3. When Linux has finished the pended interrupt, reenable non-RT
> interrupts.
> 
> but I have neither been able to achieve this, nor to verify that it is
> the right thing to do...

Your approach is basically what I proposed some years back on rtai-dev
for handling unresolvable shared RT/NRT IRQs. I once successfully tested
such a setup with two network cards, one RT, the other Linux.

So when you are really doomed and cannot change the IRQ line of your RT
device, this is a kind of emergency workaround. Not nice and generic
(you have to write the stub for disabling the NRT IRQ source), but it
should work.


Anyway, I do not understand what made your spinlock recurs. This shared
IRQ scenario should only cause indeterminism to the RT driver (by
blocking the line until the Linux handler can release it), but it must
not trigger this bug.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Jan Kiszka wrote:

Anders Blomdell wrote:


Jan Kiszka wrote:


Anders Blomdell wrote:



On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the
following if the interrupt handler takes too long (i.e. next interrupt
gets generated before the previous one has finished)

[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60




I think some probably important information is missing above this
back-trace. 


You are so right!



What does the kernel state before these lines?


[   42.346643] BUG: spinlock recursion on CPU#0, swapper/0
[   42.415438]  lock: c01c943c, .magic: dead4ead, .owner: swapper/0,
.owner_cpu: 0
[   42.511681] Call trace:
[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184
[   42.660637]  [c000f388] __ipipe_ack_irq+0x88/0x130
[   42.723657]  [c000efe4] __ipipe_handle_irq+0x140/0x268
[   42.791259]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   42.854279]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   42.923029]  [] 0x0
[   42.959695]  [c0038348] __do_IRQ+0x134/0x164
[   43.015839]  [c000ed04] __ipipe_do_IRQ+0x2c/0x44
[   43.076567]  [c000eb08] __ipipe_sync_stage+0x1ec/0x228
[   43.144170]  [c0039420] ipipe_suspend_domain+0x7c/0xc4
[   43.211774]  [c000f0b0] __ipipe_handle_irq+0x20c/0x268
[   43.279377]  [c000f144] __ipipe_grab_irq+0x38/0xa4
[   43.342396]  [c0005058] __ipipe_ret_from_except+0x0/0xc
[   43.411145]  [c0006524] default_idle+0x10/0x60


It might be that the problem is related to the fact that the interrupt
is a shared one (Harrier chip, "Functional Exception"), that is used for
both message-passing (should be RT) and UART (Linux, i.e. non-RT), my
current IRQ handler always pends the interrupt to the linux domain
(RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when
it wasn't a UART interrupt) has left the interrupts turned off.

What I believe should be done, is

 1. When UART interrupt is received, disable further non-RT interrupts
on this IRQ-line, pend interrupt to Linux.
 2. Handle RT interrupts on this IRQ line
 3. When Linux has finished the pended interrupt, reenable non-RT
interrupts.

but I have neither been able to achieve this, nor to verify that it is
the right thing to do...



Your approach is basically what I proposed some years back on rtai-dev
for handling unresolvable shared RT/NRT IRQs. I once successfully tested
such a setup with two network cards, one RT, the other Linux.

So when you are really doomed and cannot change the IRQ line of your RT
device, this is a kind of emergency workaround. Not nice and generic
(you have to write the stub for disabling the NRT IRQ source), but it
should work.

I'm doomed, the interrupts live in the same chip...
The problem is that I have not found any good place to reenable the non-RT 
interrupts.



Anyway, I do not understand what made your spinlock recurs. This shared
IRQ scenario should only cause indeterminism to the RT driver (by
blocking the line until the Linux handler can release it), but it must
not trigger this bug.

OK, seems like  have two problems then, I'll try to hunt it down


/Anders



Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Anders Blomdell

Philippe Gerum wrote:

Anders Blomdell wrote:

On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the 
following if the interrupt handler takes too long (i.e. next interrupt 
gets generated before the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184



Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock 

more likely arch/ppc/kernel/*.c :-)

for any given IRQ without using the Adeos *_hw() spinlock variant that 
masks the interrupt at hw level. So we seem to have:


spin_lock_irqsave(&desc->lock)

__ipipe_grab_irq
__ipipe_handle_irq
__ipipe_ack_irq
spin_lock...(&desc->lock)
deadlock.

The point is about having spinlock_irqsave only _virtually_ masking the 
interrupts by preventing their associated Linux handler from being 
called, but despite this, Adeos still actually acquires and acknowledges 
the incoming hw events before logging them, even if their associated 
action happen to be postponed until spinlock_irq_restore() is called.


To solve this, all spinlocks potentially touched by the ipipe's primary 
IRQ handler and/or the code it calls indirectly, _must_ be operated 
using the _hw() call variant all over the kernel, so that no hw IRQ can 
be taken while those spinlocks are held by Linux. Usually, only the 
spinlock(s) protecting the interrupt descriptors or the PIC hardware are 
concerned.

So you will expect an addition to the ipipe patch then?

/Anders



Re: [Xenomai-core] [BUG] Interrupt problem on powerpc

2006-01-30 Thread Philippe Gerum

Anders Blomdell wrote:

Philippe Gerum wrote:


Anders Blomdell wrote:

On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the 
following if the interrupt handler takes too long (i.e. next 
interrupt gets generated before the previous one has finished)


[   42.543765]  [c00c2008] spin_bug+0xa8/0xc4
[   42.597617]  [c00c22d4] _raw_spin_lock+0x180/0x184




Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock 


more likely arch/ppc/kernel/*.c :-)



Gah... looks like I'm still confused by ia64 issues I'm chasing right now. (Why on 
earth do we need so many bits on our CPUs that only serve the purpose of raising 
so many problems?)


for any given IRQ without using the Adeos *_hw() spinlock variant that 
masks the interrupt at hw level. So we seem to have:


spin_lock_irqsave(&desc->lock)

__ipipe_grab_irq
__ipipe_handle_irq
__ipipe_ack_irq
spin_lock...(&desc->lock)
deadlock.

The point is about having spinlock_irqsave only _virtually_ masking 
the interrupts by preventing their associated Linux handler from being 
called, but despite this, Adeos still actually acquires and 
acknowledges the incoming hw events before logging them, even if their 
associated action happen to be postponed until spinlock_irq_restore() 
is called.


To solve this, all spinlocks potentially touched by the ipipe's 
primary IRQ handler and/or the code it calls indirectly, _must_ be 
operated using the _hw() call variant all over the kernel, so that no 
hw IRQ can be taken while those spinlocks are held by Linux. Usually, 
only the spinlock(s) protecting the interrupt descriptors or the PIC 
hardware are concerned.


So you will expect an addition to the ipipe patch then?



Yep. We first need to find out who's grabbing the shared spinlock using the 
vanilla Linux primitives.



/Anders




--

Philippe.



[Xenomai-core] [PATCH] rt_heap reminder

2006-01-30 Thread Stefan Kisdaroczi
Hi all,

as a reminder (userspace, native skin, shared heap) [1]:
API documentation: "If the heap is shared, this value can be either zero, or 
the same value given to rt_heap_create()."
This is not true. As the heapsize gets altered in rt_heap_create for page size 
alignment, the following call to rt_heap_alloc with the same value will fail.

Ex:
rt_heap_create( ..., ..., 1, ... )
rt_heap_alloc( ..., 1, ...,  ) -> This call fails

I suggest only accepting zero as a valid size for shared heaps.

about attached patch:
1) not tested
2) there are possible better names than H_ALL
3) the comments could be in a better english
4) i hope you get the idea

thx
kisda

[1] https://mail.gna.org/public/xenomai-core/2006-01/msg00177.html
Index: include/native/heap.h
===
--- include/native/heap.h	(Revision 465)
+++ include/native/heap.h	(Arbeitskopie)
@@ -32,6 +32,9 @@
 #define H_DMA0x100		/* Use memory suitable for DMA. */
 #define H_SHARED 0x200		/* Use mappable shared memory. */
 
+/* Operation flags. */
+#define H_ALL0x0	/* Entire heap space. */
+
 typedef struct rt_heap_info {
 
 int nwaiters;		/* !< Number of pending tasks. */
Index: ksrc/skins/native/heap.c
===
--- ksrc/skins/native/heap.c	(Revision 465)
+++ ksrc/skins/native/heap.c	(Arbeitskopie)
@@ -410,10 +410,9 @@
  * from.
  *
  * @param size The requested size in bytes of the block. If the heap
- * is shared, this value can be either zero, or the same value given
- * to rt_heap_create(). In any case, the same block covering the
- * entire heap space will always be returned to all callers of this
- * service.
+ * is shared, H_ALL should be passed, as always the same block
+ * covering the entire heap space will be returned to all callers of
+ * this service.
  *
  * @param timeout The number of clock ticks to wait for a block of
  * sufficient size to be available from a local heap (see
@@ -432,8 +431,7 @@
  * @return 0 is returned upon success. Otherwise:
  *
  * - -EINVAL is returned if @a heap is not a heap descriptor, or @a
- * heap is shared (i.e. H_SHARED mode) and @a size is non-zero but
- * does not match the actual heap size passed to rt_heap_create().
+ * heap is shared (i.e. H_SHARED mode) and @a size is not H_ALL.
  *
  * - -EIDRM is returned if @a heap is a deleted heap descriptor.
  *
@@ -503,12 +501,7 @@
 
 	if (!block)
 	{
-	/* It's ok to pass zero for size here, since the requested
-	   size is implicitely the whole heap space; but if
-	   non-zero is given, it must match the actual heap
-	   size. */
-
-	if (size > 0 && size != xnheap_size(&heap->heap_base))
+	if (size != H_ALL)
 		{
 		err = -EINVAL;
 		goto unlock_and_exit;


pgpsDlebpJPHk.pgp
Description: PGP signature