Re: [Xenomai-core] Missing IRQ end function on PowerPC
Philippe Gerum wrote: > Gilles Chanteperdrix wrote: >> Wolfgang Grandegger wrote: >> > Therefore we need a dedicated function to re-enable interrupts in >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is >> more > obvious. On non-PPC archs it would translate to *_irq_enable. >> I > realized, that *_irq_enable is used in various place/skins and >> therefore > I have not yet provided a patch. >> >> The function xnarch_irq_enable seems to be called in only two functions, >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. >> >> In any case, since I am not sure if this has to be done at the Adeos >> level or in Xenomai, we will wait for Philippe to come back and decide. >> > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I once > had. We do need to differentiate the mere enabling from the IRQ epilogue > at PIC level since Linux does it - i.e. we don't want to change the > semantics here. > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the > Linux naming scheme, and have the proper epilogue done from there on a > per-arch basis. > > Current uses of xnarch_enable_irq() should be reserved to the > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ > source at PIC level outside of any ISR context for such interrupt (*). > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > layer, since the HAL already controls the way interrupts are ended > actually; it just does it improperly on some platforms. > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended > to be used from the ISR too in order to revalidate the source at PIC level? > Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > > Philippe Gerum wrote: > > Gilles Chanteperdrix wrote: > >> Wolfgang Grandegger wrote: > >> > Therefore we need a dedicated function to re-enable interrupts in > >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is > >> more > obvious. On non-PPC archs it would translate to *_irq_enable. > >> I > realized, that *_irq_enable is used in various place/skins and > >> therefore > I have not yet provided a patch. > >> > >> The function xnarch_irq_enable seems to be called in only two functions, > >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. > >> > >> In any case, since I am not sure if this has to be done at the Adeos > >> level or in Xenomai, we will wait for Philippe to come back and decide. > >> > > > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I once > > had. We do need to differentiate the mere enabling from the IRQ epilogue > > at PIC level since Linux does it - i.e. we don't want to change the > > semantics here. > > > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the > > Linux naming scheme, and have the proper epilogue done from there on a > > per-arch basis. > > > > Current uses of xnarch_enable_irq() should be reserved to the > > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ > > source at PIC level outside of any ISR context for such interrupt (*). > > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > > layer, since the HAL already controls the way interrupts are ended > > actually; it just does it improperly on some platforms. > > > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended > > to be used from the ISR too in order to revalidate the source at PIC level? > > > > Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line > after an interrupt, and the documentation does not suggest this either. > I see no problem here. But RTDM needs a rtdm_irq_end() functions as well in case the user wants to reenable the interrupt outside the ISR, I think. Wolfgang. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Wolfgang Grandegger wrote: > >> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) >> >> Philippe Gerum wrote: >>> Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: > Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two > functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. >>> ->enable() and ->end() all mixed up illustrates a silly x86 bias I once >>> had. We do need to differentiate the mere enabling from the IRQ epilogue >>> at PIC level since Linux does it - i.e. we don't want to change the >>> semantics here. >>> >>> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the >>> Linux naming scheme, and have the proper epilogue done from there on a >>> per-arch basis. >>> >>> Current uses of xnarch_enable_irq() should be reserved to the >>> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ >>> source at PIC level outside of any ISR context for such interrupt (*). >>> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of >>> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos >>> layer, since the HAL already controls the way interrupts are ended >>> actually; it just does it improperly on some platforms. >>> >>> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended >>> to be used from the ISR too in order to revalidate the source at PIC > level? >> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line >> after an interrupt, and the documentation does not suggest this either. >> I see no problem here. > > But RTDM needs a rtdm_irq_end() functions as well in case the > user wants to reenable the interrupt outside the ISR, I think. If this is a valid use-case, it should be really straightforward to add this abstraction to RTDM. We should just document that rtdm_irq_end() shall not be invoked from IRQ context - to avoid breaking the chain in the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to re-enable the line from the handler. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Jan Kiszka wrote: Wolfgang Grandegger wrote: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Philippe Gerum wrote: Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: > Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. ->enable() and ->end() all mixed up illustrates a silly x86 bias I once had. We do need to differentiate the mere enabling from the IRQ epilogue at PIC level since Linux does it - i.e. we don't want to change the semantics here. I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the Linux naming scheme, and have the proper epilogue done from there on a per-arch basis. Current uses of xnarch_enable_irq() should be reserved to the non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ source at PIC level outside of any ISR context for such interrupt (*). XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of xnarch_enable_irq. I see no reason for this fix to leak to the Adeos layer, since the HAL already controls the way interrupts are ended actually; it just does it improperly on some platforms. (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended to be used from the ISR too in order to revalidate the source at PIC level? Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. But RTDM needs a rtdm_irq_end() functions as well in case the user wants to reenable the interrupt outside the ISR, I think. If this is a valid use-case, it should be really straightforward to add this abstraction to RTDM. We should just document that rtdm_irq_end() shall not be invoked from IRQ context - It's the other way around: ->end() would indeed be called from the ISR epilogue, and ->enable() would not. to avoid breaking the chain in the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to re-enable the line from the handler. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Jan Kiszka wrote: Philippe Gerum wrote: Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: > Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. ->enable() and ->end() all mixed up illustrates a silly x86 bias I once had. We do need to differentiate the mere enabling from the IRQ epilogue at PIC level since Linux does it - i.e. we don't want to change the semantics here. I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the Linux naming scheme, and have the proper epilogue done from there on a per-arch basis. Current uses of xnarch_enable_irq() should be reserved to the non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ source at PIC level outside of any ISR context for such interrupt (*). XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of xnarch_enable_irq. I see no reason for this fix to leak to the Adeos layer, since the HAL already controls the way interrupts are ended actually; it just does it improperly on some platforms. (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended to be used from the ISR too in order to revalidate the source at PIC level? Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. Ok, so no change would be needed here to implement what's described above. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Philippe Gerum wrote: > Jan Kiszka wrote: >> Wolfgang Grandegger wrote: >> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Philippe Gerum wrote: > Gilles Chanteperdrix wrote: > >> Wolfgang Grandegger wrote: >> > Therefore we need a dedicated function to re-enable interrupts in >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is >> more > obvious. On non-PPC archs it would translate to *_irq_enable. >> I > realized, that *_irq_enable is used in various place/skins and >> therefore > I have not yet provided a patch. >> >> The function xnarch_irq_enable seems to be called in only two >>> >>> functions, >>> >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is >> set. >> >> In any case, since I am not sure if this has to be done at the Adeos >> level or in Xenomai, we will wait for Philippe to come back and >> decide. >> > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I > once > had. We do need to differentiate the mere enabling from the IRQ > epilogue > at PIC level since Linux does it - i.e. we don't want to change the > semantics here. > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with > the > Linux naming scheme, and have the proper epilogue done from there on a > per-arch basis. > > Current uses of xnarch_enable_irq() should be reserved to the > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the > IRQ > source at PIC level outside of any ISR context for such interrupt (*). > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > layer, since the HAL already controls the way interrupts are ended > actually; it just does it improperly on some platforms. > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it > intended > to be used from the ISR too in order to revalidate the source at PIC >>> >>> level? >>> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. >>> >>> But RTDM needs a rtdm_irq_end() functions as well in case the >>> user wants to reenable the interrupt outside the ISR, I think. >> >> >> If this is a valid use-case, it should be really straightforward to add >> this abstraction to RTDM. We should just document that rtdm_irq_end() >> shall not be invoked from IRQ context - > > It's the other way around: ->end() would indeed be called from the ISR > epilogue, and ->enable() would not. I think we are talking about different things: Wolfgang was asking for an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the user API. You are now referring to the back-end which had to provide some end-mechanism to be used both by the nucleus' ISR epilogue and that rtdm_irq_end(), and that mechanism shall be told apart from the IRQ enable/disable API. Well, that's at least how I think to have got it... > > to avoid breaking the chain in >> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to >> re-enable the line from the handler. >> >> Jan >> >> > > Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: Jan Kiszka wrote: Hi, well, if I'm not totally wrong, we have a design problem in the RT-thread hardening path. I dug into the crash Jeroen reported and I'm quite sure that this is the reason. So that's the bad news. The good one is that we can at least work around it by switching off CONFIG_PREEMPT for Linux (this implicitly means that it's a 2.6-only issue). @Jeroen: Did you verify that your setup also works fine without CONFIG_PREEMPT? But let's start with two assumptions my further analysis is based on: [Xenomai] o Shadow threads have only one stack, i.e. one context. If the real-time part is active (this includes it is blocked on some xnsynch object or delayed), the original Linux task must NEVER EVER be executed, even if it will immediately fall asleep again. That's because the stack is in use by the real-time part at that time. And this condition is checked in do_schedule_event() [1]. [Linux] o A Linux task which has called set_current_state() will remain in the run-queue as long as it calls schedule() on its own. This means that it can be preempted (if CONFIG_PREEMPT is set) between set_current_state() and schedule() and then even be resumed again. Only the explicit call of schedule() will trigger deactivate_task() which will in turn remove current from the run-queue. Ok, if this is true, let's have a look at xnshadow_harden(): After grabbing the gatekeeper sem and putting itself in gk->thread, a task going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the gatekeeper [2]. This does not include a Linux reschedule due to the _sync version of wake_up_interruptible. What can happen now? 1) No interruption until we can called schedule() [3]. All fine as we will not be removed from the run-queue before the gatekeeper starts kicking our RT part, thus no conflict in using the thread's stack. 3) Interruption by a RT IRQ. This would just delay the path described above, even if some RT threads get executed. Once they are finished, we continue in xnshadow_harden() - given that the RT part does not trigger the following case: 3) Interruption by some Linux IRQ. This may cause other threads to become runnable as well, but the gatekeeper has the highest prio and will therefore be the next. The problem is that the rescheduling on Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT remove it from the Linux run-queue. And now we are in real troubles: The gatekeeper will kick off our RT part which will take over the thread's stack. As soon as the RT domain falls asleep and Linux takes over again, it will continue our non-RT part as well! Actually, this seems to be the reason for the panic in do_schedule_event(). Without CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME TIME now, thus violating my first assumption. The system gets fatally corrupted. Yep, that's it. And we may not lock out the interrupts before calling schedule to prevent that. Well, I would be happy if someone can prove me wrong here. The problem is that I don't see a solution because Linux does not provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm currently considering a hack to remove the migrating Linux thread manually from the run-queue, but this could easily break the Linux scheduler. Maybe the best way would be to provide atomic wakeup-and-schedule support into the Adeos patch for Linux tasks; previous attempts to fix this by circumventing the potential for preemption from outside of the scheduler code have all failed, and this bug is uselessly lingering for that reason. Having slept on this, I'm going to add a simple extension to the Linux scheduler available from Adeos, in order to get an atomic/unpreemptable path from the statement when the current task's state is changed for suspension (e.g. TASK_INTERRUPTIBLE), to the point where schedule() normally enters its atomic section, which looks like the sanest way to solve this issue, i.e. without gory hackery all over the place. Patch will follow later for testing this approach. Jan PS: Out of curiosity I also checked RTAI's migration mechanism in this regard. It's similar except for the fact that it does the gatekeeper's work in the Linux scheduler's tail (i.e. after the next context switch). And RTAI seems it suffers from the very same race. So this is either a fundamental issue - or I'm fundamentally wrong. [1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573 [2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461 [3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___
RE: [Xenomai-core] broken docs
Dear Xenomai workers, Would it be possible to have an updated API documentation for Xenomai 2.0.x ? (I mean with formal parameters in function prototypes) I tried to regenerate it with make generate-doc, but it seems that a SVN working dir is required. It would be great. Thanks a lot Daniel > -Message d'origine- > De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De > la part de Jan Kiszka > Envoyé : mercredi, 18. janvier 2006 20:03 > À : xenomai-core > Objet : [Xenomai-core] broken docs > > Hi, > > I noticed that the doxygen API output is partly broken. Under the > nucleus and native skin modules most functions became variables. I > haven't looked at the source yet, but I guess it should be resolvable, > especially as the RTDM docs are fine. This mail is to file the issue, > maybe I will have a look later - /maybe/. > > Moreover, I was lacking a reference to RT_MUTEX_INFO. Does this > structure just needs to be added to the correct doxygen group? I guess > there are other undocumented structures out there as well (RT_TASK_INFO, > ...?). > > Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Wolfgang Grandegger wrote: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Philippe Gerum wrote: Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. ->enable() and ->end() all mixed up illustrates a silly x86 bias I once had. We do need to differentiate the mere enabling from the IRQ epilogue at PIC level since Linux does it - i.e. we don't want to change the semantics here. I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the Linux naming scheme, and have the proper epilogue done from there on a per-arch basis. Current uses of xnarch_enable_irq() should be reserved to the non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ source at PIC level outside of any ISR context for such interrupt (*). XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of xnarch_enable_irq. I see no reason for this fix to leak to the Adeos layer, since the HAL already controls the way interrupts are ended actually; it just does it improperly on some platforms. (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended to be used from the ISR too in order to revalidate the source at PIC level? Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. But RTDM needs a rtdm_irq_end() functions as well in case the user wants to reenable the interrupt outside the ISR, I think. If this is a valid use-case, it should be really straightforward to add this abstraction to RTDM. We should just document that rtdm_irq_end() shall not be invoked from IRQ context - It's the other way around: ->end() would indeed be called from the ISR epilogue, and ->enable() would not. I think we are talking about different things: Wolfgang was asking for an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the user API. You are now referring to the back-end which had to provide some end-mechanism to be used both by the nucleus' ISR epilogue and that rtdm_irq_end(), and that mechanism shall be told apart from the IRQ enable/disable API. Well, that's at least how I think to have got it... My point was only about naming here: *_end() should be reserved for the epilogue stuff, hence be callable from ISR context. Aside of that, I'm ok with the abstraction you described though. to avoid breaking the chain in the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to re-enable the line from the handler. Jan Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] Missing IRQ end function on PowerPC
> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > > Philippe Gerum wrote: > > Jan Kiszka wrote: > >> Wolfgang Grandegger wrote: > >> > This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > > Philippe Gerum wrote: > > > Gilles Chanteperdrix wrote: > > > >> Wolfgang Grandegger wrote: > >> > Therefore we need a dedicated function to re-enable interrupts in > >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is > >> more > obvious. On non-PPC archs it would translate to *_irq_enable. > >> I > realized, that *_irq_enable is used in various place/skins and > >> therefore > I have not yet provided a patch. > >> > >> The function xnarch_irq_enable seems to be called in only two > >>> > >>> functions, > >>> > >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is > >> set. > >> > >> In any case, since I am not sure if this has to be done at the Adeos > >> level or in Xenomai, we will wait for Philippe to come back and > >> decide. > >> > > > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I > > once > > had. We do need to differentiate the mere enabling from the IRQ > > epilogue > > at PIC level since Linux does it - i.e. we don't want to change the > > semantics here. > > > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with > > the > > Linux naming scheme, and have the proper epilogue done from there on a > > per-arch basis. > > > > Current uses of xnarch_enable_irq() should be reserved to the > > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the > > IRQ > > source at PIC level outside of any ISR context for such interrupt (*). > > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > > layer, since the HAL already controls the way interrupts are ended > > actually; it just does it improperly on some platforms. > > > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it > > intended > > to be used from the ISR too in order to revalidate the source at PIC > >>> > >>> level? > >>> > Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line > after an interrupt, and the documentation does not suggest this either. > I see no problem here. > >>> > >>> But RTDM needs a rtdm_irq_end() functions as well in case the > >>> user wants to reenable the interrupt outside the ISR, I think. > >> > >> > >> If this is a valid use-case, it should be really straightforward to add > >> this abstraction to RTDM. We should just document that rtdm_irq_end() > >> shall not be invoked from IRQ context - > > > > It's the other way around: ->end() would indeed be called from the ISR > > epilogue, and ->enable() would not. > > I think we are talking about different things: Wolfgang was asking for > an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the > user API. You are now referring to the back-end which had to provide > some end-mechanism to be used both by the nucleus' ISR epilogue and that > rtdm_irq_end(), and that mechanism shall be told apart from the IRQ > enable/disable API. Well, that's at least how I think to have got it... Yep, I was thinking of deferred interrupt handling in a real-time task. Then rtdm_irq_end() would be required. > > > > to avoid breaking the chain in > >> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to > >> re-enable the line from the handler. > >> > >> Jan > >> > >> > > > > > > Jan > > > Wolfgang. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH] fix pthread cancellation in native skin
Jan Kiszka wrote: > Hi, > > Gilles' work on cancellation for the posix skin reminded me of this > issue I once discovered in the native skin: > > https://mail.gna.org/public/xenomai-core/2005-12/msg00014.html > > I found out that this can easily be fixed by switching the pthread of a > native task to PTHREAD_CANCEL_ASYNCHRONOUS. See attached patch. > > > At this chance I discovered that calling rt_task_delete for a task that > was created and started with T_SUSP mode but was not yet resumed, locks > up the system. More precisely: it raises a fatal warning when > XENO_OPT_DEBUG is on. Might be the case that it just works on system > without this switched on. Either this is a real bug, or the warning > needs to be fixed. (Deleting a task after rt_task_suspend works.) Actually, the fatal warning happens when starting with rt_task_start the task which was created with the T_SUSP bit. The task needs to wake-up in xnshadow_wait_barrier until it gets really suspended in primary mode by the final xnshadow_harden. This situation triggers the fatal, because the thread has the nucleus XNSUSP bit and is running in secondary mode. This is not the only situation where a thread with a nucleus suspension bit need to run shortly in secondary mode: it also occurs when suspending with xnpod_suspend_thread() a thread running in secondary mode; the thread receives the SIGCHLD signal and need to execute shortly with the suspension bit set in order to cause a migration to primary mode. So, the only case when we are sure that a user-space thread can not be scheduled by Linux seems to be when this thread does not have the XNRELAX bit. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
>> ... > I have not checked it yet but my presupposition that something as easy as : >> preempt_disable()>> wake_up_interruptible_sync();> schedule();>> preempt_enable();It's a no-go: "scheduling while atomic". One of my first attempts tosolve it. My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call schedule() while being non-preemptible. To this end, ACTIVE_PREEMPT is set up. The use of preempt_enable/disable() here is wrong. The only way to enter schedule() without being preemptible is viaACTIVE_PREEMPT. But the effect of that flag should be well-known now. Kind of Gordian knot. :( Maybe I have missed something so just for my curiosity : what does prevent the use of PREEMPT_ACTIVE here? We don't have a "preempted while atomic" message here as it seems to be a legal way to call schedule() with that flag being set up. >>> could work... err.. and don't blame me if no, it's some one else who has > written that nonsense :o)>> --> Best regards,> Dmitry Adamushko>Jan-- Best regards,Dmitry Adamushko ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Dmitry Adamushko wrote: >>> ... > >> I have not checked it yet but my presupposition that something as easy as >> : >>> preempt_disable() >>> >>> wake_up_interruptible_sync(); >>> schedule(); >>> >>> preempt_enable(); >> It's a no-go: "scheduling while atomic". One of my first attempts to >> solve it. > > > My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call > schedule() while being non-preemptible. > To this end, ACTIVE_PREEMPT is set up. > The use of preempt_enable/disable() here is wrong. > > > The only way to enter schedule() without being preemptible is via >> ACTIVE_PREEMPT. But the effect of that flag should be well-known now. >> Kind of Gordian knot. :( > > > Maybe I have missed something so just for my curiosity : what does prevent > the use of PREEMPT_ACTIVE here? > We don't have a "preempted while atomic" message here as it seems to be a > legal way to call schedule() with that flag being set up. When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from the run queue - independent of its current status. > > >>> could work... err.. and don't blame me if no, it's some one else who has >>> written that nonsense :o) >>> >>> -- >>> Best regards, >>> Dmitry Adamushko >>> >> Jan >> >> >> >> > > > -- > Best regards, > Dmitry Adamushko > > > > > > ___ > Xenomai-core mailing list > Xenomai-core@gna.org > https://mail.gna.org/listinfo/xenomai-core signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] broken docs
ROSSIER Daniel wrote: > Dear Xenomai workers, > > Would it be possible to have an updated API documentation for Xenomai > 2.0.x ? (I mean with formal parameters in function prototypes) > > I tried to regenerate it with make generate-doc, but it seems that a > SVN working dir is required. > > It would be great. I just had a "quick" look at the status of the documentation in SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express it politely) when it comes to tracking down bugs in your formatting. Something is broken in all modules except RTDM, and although I spent *a lot* of time in getting RTDM correctly formatted, I cannot tell what's wrong with the rest. This will require some long evenings of continuous patching the docs, recompiling, and checking the result. Any volunteers - I'm lacking the time? :-/ Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] broken docs
Jan Kiszka wrote: > ROSSIER Daniel wrote: > > Dear Xenomai workers, > > > > Would it be possible to have an updated API documentation for Xenomai > > 2.0.x ? (I mean with formal parameters in function prototypes) > > > > I tried to regenerate it with make generate-doc, but it seems that a > > SVN working dir is required. make generate-doc is needed for maintenance only. If you want to generate doxygen documentation, simply add --enable-dox-doc to Xenomai configure command line. > > > > It would be great. > > I just had a "quick" look at the status of the documentation in > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express > it politely) when it comes to tracking down bugs in your formatting. > Something is broken in all modules except RTDM, and although I spent *a > lot* of time in getting RTDM correctly formatted, I cannot tell what's > wrong with the rest. This will require some long evenings of > continuous patching the docs, recompiling, and checking the result. Any > volunteers - I'm lacking the time? :-/ Looking at the difference between RTDM documentation blocks and the other modules is that the other modules use the "fn" tag. Removing the "fn" tag from other modules documentation blocks seems to solve the issue. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] broken docs
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > ROSSIER Daniel wrote: > > > Dear Xenomai workers, > > > > > > Would it be possible to have an updated API documentation for Xenomai > > > 2.0.x ? (I mean with formal parameters in function prototypes) > > > > > > I tried to regenerate it with make generate-doc, but it seems that a > > > SVN working dir is required. > > make generate-doc is needed for maintenance only. If you want to > generate doxygen documentation, simply add --enable-dox-doc to Xenomai > configure command line. > > > > > > > It would be great. > > > > I just had a "quick" look at the status of the documentation in > > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express > > it politely) when it comes to tracking down bugs in your formatting. > > Something is broken in all modules except RTDM, and although I spent *a > > lot* of time in getting RTDM correctly formatted, I cannot tell what's > > wrong with the rest. This will require some long evenings of > > continuous patching the docs, recompiling, and checking the result. Any > > volunteers - I'm lacking the time? :-/ > > Looking at the difference between RTDM documentation blocks and the > other modules is that the other modules use the "fn" tag. Removing the > "fn" tag from other modules documentation blocks seems to solve the > issue. > Indeed, works. Amazingly blind I was. Anyway, it still needs some work to remove that stuff (I wonder what the "correct" usage of @fn is...) and to wrap functions without bodies via "#ifdef DOXYGEN_CPP" like RTDM does. At this chance, I would also suggest to replace all \tag by @tag for the sake of a unified style (and who knows what side effects mixing up both may have). Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
On 30/01/06, Jan Kiszka <[EMAIL PROTECTED]> wrote: Dmitry Adamushko wrote:>>> ...>>> I have not checked it yet but my presupposition that something as easy as>> :>>> preempt_disable()>> wake_up_interruptible_sync(); >>> schedule();>> preempt_enable();>> It's a no-go: "scheduling while atomic". One of my first attempts to>> solve it.>>> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call > schedule() while being non-preemptible.> To this end, ACTIVE_PREEMPT is set up.> The use of preempt_enable/disable() here is wrong.>>> The only way to enter schedule() without being preemptible is via >> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.>> Kind of Gordian knot. :(>>> Maybe I have missed something so just for my curiosity : what does prevent > the use of PREEMPT_ACTIVE here?> We don't have a "preempted while atomic" message here as it seems to be a> legal way to call schedule() with that flag being set up.When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from the run queue - independent of its current status. Err... that's exactly the reason I have explained in my first mail for this thread :) Blah.. I wish I was smoking something special before so I would point that as the reason of my forgetfulness. Actually, we could use PREEMPT_ACTIVE indeed + something else (probably another flag) to distinguish between a case when PREEMPT_ACTIVE is set by Linux and another case when it's set by xnshadow_harden(). xnshadow_harden() { struct task_struct *this_task = current; ... xnthread_t *thread = xnshadow_thread(this_task); if (!thread) return; ... gk->thread = thread; + add_preempt_count(PREEMPT_ACTIVE); // should be checked in schedule() + xnthread_set_flags(thread, XNATOMIC_TRANSIT); set_current_state(TASK_INTERRUPTIBLE); wake_up_interruptible_sync(&gk->waitq); + schedule(); + sub_preempt_count(PREEMPT_ACTIVE); ... } Then, something like the following code should be called from schedule() : void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq) { xnthread_t *thread = xnshadow_thread(task); if (!thread) return; if (xnthread_test_flags(thread, XNATOMIC_TRANSIT)) { xnthread_clear_flags(thread, XNATOMIC_TRANSIT); deactivate_task(task, rq); } } - schedule.c : ... switch_count = &prev->nivcsw; if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) switch_count = &prev->nvcsw; if (unlikely((prev->state & TASK_INTERRUPTIBLE) && unlikely(signal_pending(prev)) )) prev->state = TASK_RUNNING; else { if (prev->state == TASK_UNINTERRUPTIBLE) rq->nr_uninterruptible++; deactivate_task(prev, rq); } } // removes a task from the active queue if PREEMPT_ACTIVE + // XNATOMIC_TRANSIT + #ifdef CONFIG_IPIPE + ipipe_transit_cleanup(prev, rq); + #endif /* CONFIG_IPIPE */ ... Not very gracefully maybe, but could work or am I missing something important? -- Best regards,Dmitry Adamushko ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [BUG] Interrupt problem on powerpc
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. suspended is not needed, since the gatekeeper may have a high priority, and calling schedule() is enough. In any case, the waken up thread does not seem to be run immediately, so this rather look like the second case. Since in xnshadow_harden, the running thread marks itself as suspended before running wake_up_interruptible_sync, the gatekeeper will run when schedule() get called, which in turn, depend on the CONFIG_PREEMPT* configuration. In the non-preempt case, the current thread will be suspended and the gatekeeper will run when schedule() is explicitely called in xnshadow_harden(). In the preempt case, schedule gets called when the outermost spinlock is unlocked in wake_up_interruptible_sync(). > And how does it terminate: is only the system call migrated or is the thread > allowed to continue run (at a priority level equal to the Xenomai > priority level) until it hits something of the Xenomai API (or trivially: > explicitly go to RT using th
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: > On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the > following if the interrupt handler takes too long (i.e. next interrupt > gets generated before the previous one has finished) > > [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 > [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 > [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 > [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 > [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 42.923029] [] 0x0 > [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 > [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 > [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 > [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 > [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 > [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 43.411145] [c0006524] default_idle+0x10/0x60 > I think some probably important information is missing above this back-trace. What does the kernel state before these lines? Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [BUG?] dead code in ipipe_grab_irq
In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) >= 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 && first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain && !test_bit(IPIPE_STALL_FLAG, &ipipe_root_domain->cpudata[cpuid].status)); } Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. > Could anyone interested in this issue test the following couple of patches? > atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15 > atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 > Both patches are needed to fix the issue. > TIA, And now, Ladies and Gentlemen, with the patches attached. -- Philippe. --- 2.6.15-x86/kernel/sched.c 2006-01-07 15:18:31.0 +0100 +++ 2.6.15-ipipe/kernel/sched.c 2006-01-30 15:15:27.0 +0100 @@ -2963,7 +2963,7 @@ * Otherwise, whine if we are scheduling when we should not be. */ if (likely(!current->exit_state)) { - if (unlikely(in_atomic())) { + if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) { printk(KERN_ERR "scheduling while atomic: " "%s/0x%08x/%d\n", current->comm, preempt_count(), current->pid); @@ -2972,8 +2972,13 @@ } profile_hit(SCHED_PROFILING, __builtin_return_ad
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. Could anyone interested in this issue test the following couple of patches? atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15 atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 Both patches are needed to fix the issue. TIA, -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq
Anders Blomdell kirjoitti: In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. You're right. "first" doesn't need to be there at all, it's probably an old copy of something in the kernel. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) >= 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 && first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain && !test_bit(IPIPE_STALL_FLAG, &ipipe_root_domain->cpudata[cpuid].status)); } Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq
Heikki Lindholm wrote: Anders Blomdell kirjoitti: In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. You're right. "first" doesn't need to be there at all, it's probably an old copy of something in the kernel. Yep; used to be a while() loop in the original implementation we do not perform here. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) >= 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 && first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain && !test_bit(IPIPE_STALL_FLAG, &ipipe_root_domain->cpudata[cpuid].status)); } Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >> >>> Gilles Chanteperdrix wrote: >>> Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as >>> >>> >>> >>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already >>> here - and a switch if the prio of the woken up task is higher. >>> >>> BTW, an easy way to enforce the current trouble is to remove the "_sync" >>> from wake_up_interruptible. As I understand it this _sync is just an >>> optimisation hint for Linux to avoid needless scheduler runs. >>> >> >> You could not guarantee the following execution sequence doing so >> either, i.e. >> >> 1- current wakes up the gatekeeper >> 2- current goes sleeping to exit the Linux runqueue in schedule() >> 3- the gatekeeper resumes the shadow-side of the old current >> >> The point is all about making 100% sure that current is going to be >> unlinked from the Linux runqueue before the gatekeeper processes the >> resumption request, whatever event the kernel is processing >> asynchronously in the meantime. This is the reason why, as you already >> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the >> CPU from the hardening thread whilst keeping it linked to the >> runqueue: upon return from such preemption, the gatekeeper might have >> run already, hence the newly hardened thread ends up being seen as >> runnable by both the Linux and Xeno schedulers. Rainy day indeed. >> >> We could rely on giving "current" the highest SCHED_FIFO priority in >> xnshadow_harden() before waking up the gk, until the gk eventually >> promotes it to the Xenomai scheduling mode and downgrades this >> priority back to normal, but we would pay additional latencies induced >> by each aborted rescheduling attempt that may occur during the atomic >> path we want to enforce. >> >> The other way is to make sure that no in-kernel preemption of the >> hardening task could occur after step 1) and until step 2) is >> performed, given that we cannot currently call schedule() with >> interrupts or preemption off. I'm on it. >> > > Could anyone interested in this issue test the following couple of patches? > > atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for > 2.6.15 > atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 > > Both patches are needed to fix the issue. > > TIA, > Looks good. I tried Jeroen's test-case and I was not able to reproduce the crash anymore. I think it's time for a new ipipe-release. ;) At this chance: any comments on the panic-freeze extension for the tracer? I need to rework the Xenomai patch, but the ipipe side should be re
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! > What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, "Functional Exception"), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: > Jan Kiszka wrote: >> Anders Blomdell wrote: >> >>> On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the >>> following if the interrupt handler takes too long (i.e. next interrupt >>> gets generated before the previous one has finished) >>> >>> [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 >>> [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 >>> [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 >>> [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 >>> [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 >>> [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc >>> [ 42.923029] [] 0x0 >>> [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 >>> [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 >>> [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 >>> [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 >>> [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 >>> [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 >>> [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc >>> [ 43.411145] [c0006524] default_idle+0x10/0x60 >>> >> >> >> I think some probably important information is missing above this >> back-trace. > You are so right! > >> What does the kernel state before these lines? > > [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 > [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, > .owner_cpu: 0 > [ 42.511681] Call trace: > [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 > [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 > [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 > [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 > [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 42.923029] [] 0x0 > [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 > [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 > [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 > [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 > [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 > [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 43.411145] [c0006524] default_idle+0x10/0x60 > > > It might be that the problem is related to the fact that the interrupt > is a shared one (Harrier chip, "Functional Exception"), that is used for > both message-passing (should be RT) and UART (Linux, i.e. non-RT), my > current IRQ handler always pends the interrupt to the linux domain > (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when > it wasn't a UART interrupt) has left the interrupts turned off. > > What I believe should be done, is > > 1. When UART interrupt is received, disable further non-RT interrupts > on this IRQ-line, pend interrupt to Linux. > 2. Handle RT interrupts on this IRQ line > 3. When Linux has finished the pended interrupt, reenable non-RT > interrupts. > > but I have neither been able to achieve this, nor to verify that it is > the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(&desc->lock) __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(&desc->lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, "Functional Exception"), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. I'm doomed, the interrupts live in the same chip... The problem is that I have not found any good place to reenable the non-RT interrupts. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. OK, seems like have two problems then, I'll try to hunt it down /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Philippe Gerum wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock more likely arch/ppc/kernel/*.c :-) for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(&desc->lock) __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(&desc->lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. So you will expect an addition to the ipipe patch then? /Anders ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: Philippe Gerum wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock more likely arch/ppc/kernel/*.c :-) Gah... looks like I'm still confused by ia64 issues I'm chasing right now. (Why on earth do we need so many bits on our CPUs that only serve the purpose of raising so many problems?) for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(&desc->lock) __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(&desc->lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. So you will expect an addition to the ipipe patch then? Yep. We first need to find out who's grabbing the shared spinlock using the vanilla Linux primitives. /Anders -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] [PATCH] rt_heap reminder
Hi all, as a reminder (userspace, native skin, shared heap) [1]: API documentation: "If the heap is shared, this value can be either zero, or the same value given to rt_heap_create()." This is not true. As the heapsize gets altered in rt_heap_create for page size alignment, the following call to rt_heap_alloc with the same value will fail. Ex: rt_heap_create( ..., ..., 1, ... ) rt_heap_alloc( ..., 1, ..., ) -> This call fails I suggest only accepting zero as a valid size for shared heaps. about attached patch: 1) not tested 2) there are possible better names than H_ALL 3) the comments could be in a better english 4) i hope you get the idea thx kisda [1] https://mail.gna.org/public/xenomai-core/2006-01/msg00177.html Index: include/native/heap.h === --- include/native/heap.h (Revision 465) +++ include/native/heap.h (Arbeitskopie) @@ -32,6 +32,9 @@ #define H_DMA0x100 /* Use memory suitable for DMA. */ #define H_SHARED 0x200 /* Use mappable shared memory. */ +/* Operation flags. */ +#define H_ALL0x0 /* Entire heap space. */ + typedef struct rt_heap_info { int nwaiters; /* !< Number of pending tasks. */ Index: ksrc/skins/native/heap.c === --- ksrc/skins/native/heap.c (Revision 465) +++ ksrc/skins/native/heap.c (Arbeitskopie) @@ -410,10 +410,9 @@ * from. * * @param size The requested size in bytes of the block. If the heap - * is shared, this value can be either zero, or the same value given - * to rt_heap_create(). In any case, the same block covering the - * entire heap space will always be returned to all callers of this - * service. + * is shared, H_ALL should be passed, as always the same block + * covering the entire heap space will be returned to all callers of + * this service. * * @param timeout The number of clock ticks to wait for a block of * sufficient size to be available from a local heap (see @@ -432,8 +431,7 @@ * @return 0 is returned upon success. Otherwise: * * - -EINVAL is returned if @a heap is not a heap descriptor, or @a - * heap is shared (i.e. H_SHARED mode) and @a size is non-zero but - * does not match the actual heap size passed to rt_heap_create(). + * heap is shared (i.e. H_SHARED mode) and @a size is not H_ALL. * * - -EIDRM is returned if @a heap is a deleted heap descriptor. * @@ -503,12 +501,7 @@ if (!block) { - /* It's ok to pass zero for size here, since the requested - size is implicitely the whole heap space; but if - non-zero is given, it must match the actual heap - size. */ - - if (size > 0 && size != xnheap_size(&heap->heap_base)) + if (size != H_ALL) { err = -EINVAL; goto unlock_and_exit; pgpGNv4QOGFQq.pgp Description: PGP signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Jan Kiszka wrote: Philippe Gerum wrote: Philippe Gerum wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: Hello, I'm currently not at a level to participate in your discussion. Although I'm willing to supply you with stresstests, I would nevertheless like to learn more from task migration as this debugging session proceeds. In order to do so, please confirm the following statements or indicate where I went wrong. I hope others may learn from this as well. xn_shadow_harden(): This is called whenever a Xenomai thread performs a Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread (nRT) is marked INTERRUPTIBLE and run by the Linux kernel wake_up_interruptible_sync() call. Is this thread actually run or does it merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. Could anyone interested in this issue test the following couple of patches? atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15 atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 Both patches are needed to fix the issue. TIA, Looks good. I tried Jeroen's test-case and I was not able to reproduce the crash anymore. I think it's time for a new ipipe-release. ;) Looks like, indeed. At this chance: any comments on the panic-freeze extension for the tracer? I need to rework the Xenomai patch, but the ipipe side should be ready for merge. No issue with the ipipe side since it only touches the tracer support code. No issue either at first sight with the Xeno side, aside of the trace being frozen twice in do_schedule_event? (once in this routine, twice in xnpod_fatal); but maybe it's wanted to freeze the situation before the stack is dumped
Re: [Xenomai-core] Initialization of a nucleus pod
Germain Olivier wrote: Thank you for your response So rootcb isn't the "scheduler task". I was thinking it was this task which was determining what thread to run, depending of its parameters (priority, periodicity, scheduling mode). I go back to the code to understand how it work ... Use the simulator to understand the dynamics of this code: it brings you single-stepping of the entire Xenomai core over GDB, at source code level. Germain xnthread_init does part of the initialization. The low level part of rootcb (its xnarchtcb_t member) is initialized twice, first by the call to xnarch_init_tcb in xnthread_init, and then overriden by xnarch_init_root_tcb in xnpod_init. For any other thread than root, the thread would be given a stack and entry point by the call to xnarch_init_thread in xnpod_start_thread. But the root thread is Xenomai idle task, a placeholder for whatever task Linux is currenty running. At the time where xnpod_init is called, the root thread is the current context, so already has a stack and is already running. -- Gilles Chanteperdrix. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe.
Re: [Xenomai-core] Scheduling while atomic
Jan Kiszka wrote: Jan Kiszka wrote: ... [Update] While writing this mail and letting your test run for a while, I *did* get a hard lock-up. Hold on, digging deeper... And here are its last words, spoken via serial console: c31dfab0 0086 c30d1a90 c02a2500 c482a360 0001 0001 0020 c012e564 0022 0246 c30d1a90 c4866ce0 0033 c482 c482a360 c4866ca0 c48293a4 c48524e1 0002 Call Trace: [] __ipipe_dispatch_event+0x56/0xdd [] e100_hw_init+0x3ad/0xa81 [e100] [] xnpod_suspend_thread+0x714/0x76d [xeno_nucleus] [] xnsynch_sleep_on+0x76d/0x7a7 [xeno_nucleus] [] rt_sem_p+0xa6/0x10a [xeno_native] [] __rt_sem_p+0x5d/0x66 [xeno_native] [] hisyscall_event+0x1cb/0x2d3 [xeno_nucleus] [] __ipipe_dispatch_event+0x56/0xdd [] __ipipe_syscall_root+0x53/0xbe [] system_call+0x20/0x41 Xenomai: fatal: blocked thread main[863] rescheduled?! (status=0x300082, sig=0, prev=gatekeeper/0[809]) CPU PIDPRI TIMEOUT STAT NAME 0 0 30 000500080 ROOT 0 86430 000300180 task0 0 86529 000300288 task1 0 8631000300082 main Timer: oneshot [tickval=1 ns, elapsed=175144731477] c31e1f14 c4860572 c3188000 c31dfab0 00300082 c02a2500 0286 c02a2500 c030cbec c012e564 0022 c02a2500 c30d1a90 c30d1a90 0022 0001 c02a2500 c30d1a90 c08e4623 0028 c31e1fa0 c0266ed5 f610 c030cd80 Call Trace: [] __ipipe_dispatch_event+0x56/0xdd [] schedule+0x3ef/0x5ed [] gatekeeper_thread+0x0/0x179 [xeno_nucleus] [] gatekeeper_thread+0x9a/0x179 [xeno_nucleus] [] default_wake_function+0x0/0x12 [] kthread+0x68/0x95 [] kthread+0x0/0x95 [] kernel_thread_helper+0x5/0xb Any bells already ringing? Yes; the bad news is that this looks like the same bug than you reported recently, which I only partially fixed, it seems. xnshadow_harden() is still not working properly under certain preemption situation induced by CONFIG_PREEMPT, and the hardening thread is likely unexpectedly moved back to the Linux runqueue while transitioning to Xenomai. The good news is that it's a well identified issue, at least... Will try Gilles' patch now... Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe.
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Jan Kiszka wrote: Hi, well, if I'm not totally wrong, we have a design problem in the RT-thread hardening path. I dug into the crash Jeroen reported and I'm quite sure that this is the reason. So that's the bad news. The good one is that we can at least work around it by switching off CONFIG_PREEMPT for Linux (this implicitly means that it's a 2.6-only issue). @Jeroen: Did you verify that your setup also works fine without CONFIG_PREEMPT? But let's start with two assumptions my further analysis is based on: [Xenomai] o Shadow threads have only one stack, i.e. one context. If the real-time part is active (this includes it is blocked on some xnsynch object or delayed), the original Linux task must NEVER EVER be executed, even if it will immediately fall asleep again. That's because the stack is in use by the real-time part at that time. And this condition is checked in do_schedule_event() [1]. [Linux] o A Linux task which has called set_current_state() will remain in the run-queue as long as it calls schedule() on its own. This means that it can be preempted (if CONFIG_PREEMPT is set) between set_current_state() and schedule() and then even be resumed again. Only the explicit call of schedule() will trigger deactivate_task() which will in turn remove current from the run-queue. Ok, if this is true, let's have a look at xnshadow_harden(): After grabbing the gatekeeper sem and putting itself in gk->thread, a task going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the gatekeeper [2]. This does not include a Linux reschedule due to the _sync version of wake_up_interruptible. What can happen now? 1) No interruption until we can called schedule() [3]. All fine as we will not be removed from the run-queue before the gatekeeper starts kicking our RT part, thus no conflict in using the thread's stack. 3) Interruption by a RT IRQ. This would just delay the path described above, even if some RT threads get executed. Once they are finished, we continue in xnshadow_harden() - given that the RT part does not trigger the following case: 3) Interruption by some Linux IRQ. This may cause other threads to become runnable as well, but the gatekeeper has the highest prio and will therefore be the next. The problem is that the rescheduling on Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT remove it from the Linux run-queue. And now we are in real troubles: The gatekeeper will kick off our RT part which will take over the thread's stack. As soon as the RT domain falls asleep and Linux takes over again, it will continue our non-RT part as well! Actually, this seems to be the reason for the panic in do_schedule_event(). Without CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME TIME now, thus violating my first assumption. The system gets fatally corrupted. Yep, that's it. And we may not lock out the interrupts before calling schedule to prevent that. Well, I would be happy if someone can prove me wrong here. The problem is that I don't see a solution because Linux does not provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm currently considering a hack to remove the migrating Linux thread manually from the run-queue, but this could easily break the Linux scheduler. Maybe the best way would be to provide atomic wakeup-and-schedule support into the Adeos patch for Linux tasks; previous attempts to fix this by circumventing the potential for preemption from outside of the scheduler code have all failed, and this bug is uselessly lingering for that reason. Jan PS: Out of curiosity I also checked RTAI's migration mechanism in this regard. It's similar except for the fact that it does the gatekeeper's work in the Linux scheduler's tail (i.e. after the next context switch). And RTAI seems it suffers from the very same race. So this is either a fundamental issue - or I'm fundamentally wrong. [1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573 [2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461 [3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe.
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Philippe Gerum wrote: > Gilles Chanteperdrix wrote: >> Wolfgang Grandegger wrote: >> > Therefore we need a dedicated function to re-enable interrupts in >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is >> more > obvious. On non-PPC archs it would translate to *_irq_enable. >> I > realized, that *_irq_enable is used in various place/skins and >> therefore > I have not yet provided a patch. >> >> The function xnarch_irq_enable seems to be called in only two functions, >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. >> >> In any case, since I am not sure if this has to be done at the Adeos >> level or in Xenomai, we will wait for Philippe to come back and decide. >> > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I once > had. We do need to differentiate the mere enabling from the IRQ epilogue > at PIC level since Linux does it - i.e. we don't want to change the > semantics here. > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the > Linux naming scheme, and have the proper epilogue done from there on a > per-arch basis. > > Current uses of xnarch_enable_irq() should be reserved to the > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ > source at PIC level outside of any ISR context for such interrupt (*). > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > layer, since the HAL already controls the way interrupts are ended > actually; it just does it improperly on some platforms. > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended > to be used from the ISR too in order to revalidate the source at PIC level? > Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] Missing IRQ end function on PowerPC
> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > > Philippe Gerum wrote: > > Gilles Chanteperdrix wrote: > >> Wolfgang Grandegger wrote: > >> > Therefore we need a dedicated function to re-enable interrupts in > >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is > >> more > obvious. On non-PPC archs it would translate to *_irq_enable. > >> I > realized, that *_irq_enable is used in various place/skins and > >> therefore > I have not yet provided a patch. > >> > >> The function xnarch_irq_enable seems to be called in only two functions, > >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. > >> > >> In any case, since I am not sure if this has to be done at the Adeos > >> level or in Xenomai, we will wait for Philippe to come back and decide. > >> > > > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I once > > had. We do need to differentiate the mere enabling from the IRQ epilogue > > at PIC level since Linux does it - i.e. we don't want to change the > > semantics here. > > > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the > > Linux naming scheme, and have the proper epilogue done from there on a > > per-arch basis. > > > > Current uses of xnarch_enable_irq() should be reserved to the > > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ > > source at PIC level outside of any ISR context for such interrupt (*). > > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > > layer, since the HAL already controls the way interrupts are ended > > actually; it just does it improperly on some platforms. > > > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended > > to be used from the ISR too in order to revalidate the source at PIC level? > > > > Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line > after an interrupt, and the documentation does not suggest this either. > I see no problem here. But RTDM needs a rtdm_irq_end() functions as well in case the user wants to reenable the interrupt outside the ISR, I think. Wolfgang.
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Wolfgang Grandegger wrote: > >> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) >> >> Philippe Gerum wrote: >>> Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: > Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two > functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. >>> ->enable() and ->end() all mixed up illustrates a silly x86 bias I once >>> had. We do need to differentiate the mere enabling from the IRQ epilogue >>> at PIC level since Linux does it - i.e. we don't want to change the >>> semantics here. >>> >>> I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the >>> Linux naming scheme, and have the proper epilogue done from there on a >>> per-arch basis. >>> >>> Current uses of xnarch_enable_irq() should be reserved to the >>> non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ >>> source at PIC level outside of any ISR context for such interrupt (*). >>> XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of >>> xnarch_enable_irq. I see no reason for this fix to leak to the Adeos >>> layer, since the HAL already controls the way interrupts are ended >>> actually; it just does it improperly on some platforms. >>> >>> (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended >>> to be used from the ISR too in order to revalidate the source at PIC > level? >> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line >> after an interrupt, and the documentation does not suggest this either. >> I see no problem here. > > But RTDM needs a rtdm_irq_end() functions as well in case the > user wants to reenable the interrupt outside the ISR, I think. If this is a valid use-case, it should be really straightforward to add this abstraction to RTDM. We should just document that rtdm_irq_end() shall not be invoked from IRQ context - to avoid breaking the chain in the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to re-enable the line from the handler. Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Jan Kiszka wrote: Wolfgang Grandegger wrote: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Philippe Gerum wrote: Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: > Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. ->enable() and ->end() all mixed up illustrates a silly x86 bias I once had. We do need to differentiate the mere enabling from the IRQ epilogue at PIC level since Linux does it - i.e. we don't want to change the semantics here. I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the Linux naming scheme, and have the proper epilogue done from there on a per-arch basis. Current uses of xnarch_enable_irq() should be reserved to the non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ source at PIC level outside of any ISR context for such interrupt (*). XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of xnarch_enable_irq. I see no reason for this fix to leak to the Adeos layer, since the HAL already controls the way interrupts are ended actually; it just does it improperly on some platforms. (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended to be used from the ISR too in order to revalidate the source at PIC level? Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. But RTDM needs a rtdm_irq_end() functions as well in case the user wants to reenable the interrupt outside the ISR, I think. If this is a valid use-case, it should be really straightforward to add this abstraction to RTDM. We should just document that rtdm_irq_end() shall not be invoked from IRQ context - It's the other way around: ->end() would indeed be called from the ISR epilogue, and ->enable() would not. to avoid breaking the chain in the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to re-enable the line from the handler. Jan -- Philippe.
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Jan Kiszka wrote: Philippe Gerum wrote: Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: > Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. ->enable() and ->end() all mixed up illustrates a silly x86 bias I once had. We do need to differentiate the mere enabling from the IRQ epilogue at PIC level since Linux does it - i.e. we don't want to change the semantics here. I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the Linux naming scheme, and have the proper epilogue done from there on a per-arch basis. Current uses of xnarch_enable_irq() should be reserved to the non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ source at PIC level outside of any ISR context for such interrupt (*). XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of xnarch_enable_irq. I see no reason for this fix to leak to the Adeos layer, since the HAL already controls the way interrupts are ended actually; it just does it improperly on some platforms. (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended to be used from the ISR too in order to revalidate the source at PIC level? Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. Ok, so no change would be needed here to implement what's described above. Jan -- Philippe.
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Philippe Gerum wrote: > Jan Kiszka wrote: >> Wolfgang Grandegger wrote: >> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Philippe Gerum wrote: > Gilles Chanteperdrix wrote: > >> Wolfgang Grandegger wrote: >> > Therefore we need a dedicated function to re-enable interrupts in >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is >> more > obvious. On non-PPC archs it would translate to *_irq_enable. >> I > realized, that *_irq_enable is used in various place/skins and >> therefore > I have not yet provided a patch. >> >> The function xnarch_irq_enable seems to be called in only two >>> >>> functions, >>> >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is >> set. >> >> In any case, since I am not sure if this has to be done at the Adeos >> level or in Xenomai, we will wait for Philippe to come back and >> decide. >> > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I > once > had. We do need to differentiate the mere enabling from the IRQ > epilogue > at PIC level since Linux does it - i.e. we don't want to change the > semantics here. > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with > the > Linux naming scheme, and have the proper epilogue done from there on a > per-arch basis. > > Current uses of xnarch_enable_irq() should be reserved to the > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the > IRQ > source at PIC level outside of any ISR context for such interrupt (*). > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > layer, since the HAL already controls the way interrupts are ended > actually; it just does it improperly on some platforms. > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it > intended > to be used from the ISR too in order to revalidate the source at PIC >>> >>> level? >>> Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. >>> >>> But RTDM needs a rtdm_irq_end() functions as well in case the >>> user wants to reenable the interrupt outside the ISR, I think. >> >> >> If this is a valid use-case, it should be really straightforward to add >> this abstraction to RTDM. We should just document that rtdm_irq_end() >> shall not be invoked from IRQ context - > > It's the other way around: ->end() would indeed be called from the ISR > epilogue, and ->enable() would not. I think we are talking about different things: Wolfgang was asking for an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the user API. You are now referring to the back-end which had to provide some end-mechanism to be used both by the nucleus' ISR epilogue and that rtdm_irq_end(), and that mechanism shall be told apart from the IRQ enable/disable API. Well, that's at least how I think to have got it... > > to avoid breaking the chain in >> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to >> re-enable the line from the handler. >> >> Jan >> >> > > Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: Jan Kiszka wrote: Hi, well, if I'm not totally wrong, we have a design problem in the RT-thread hardening path. I dug into the crash Jeroen reported and I'm quite sure that this is the reason. So that's the bad news. The good one is that we can at least work around it by switching off CONFIG_PREEMPT for Linux (this implicitly means that it's a 2.6-only issue). @Jeroen: Did you verify that your setup also works fine without CONFIG_PREEMPT? But let's start with two assumptions my further analysis is based on: [Xenomai] o Shadow threads have only one stack, i.e. one context. If the real-time part is active (this includes it is blocked on some xnsynch object or delayed), the original Linux task must NEVER EVER be executed, even if it will immediately fall asleep again. That's because the stack is in use by the real-time part at that time. And this condition is checked in do_schedule_event() [1]. [Linux] o A Linux task which has called set_current_state() will remain in the run-queue as long as it calls schedule() on its own. This means that it can be preempted (if CONFIG_PREEMPT is set) between set_current_state() and schedule() and then even be resumed again. Only the explicit call of schedule() will trigger deactivate_task() which will in turn remove current from the run-queue. Ok, if this is true, let's have a look at xnshadow_harden(): After grabbing the gatekeeper sem and putting itself in gk->thread, a task going for RT then marks itself TASK_INTERRUPTIBLE and wakes up the gatekeeper [2]. This does not include a Linux reschedule due to the _sync version of wake_up_interruptible. What can happen now? 1) No interruption until we can called schedule() [3]. All fine as we will not be removed from the run-queue before the gatekeeper starts kicking our RT part, thus no conflict in using the thread's stack. 3) Interruption by a RT IRQ. This would just delay the path described above, even if some RT threads get executed. Once they are finished, we continue in xnshadow_harden() - given that the RT part does not trigger the following case: 3) Interruption by some Linux IRQ. This may cause other threads to become runnable as well, but the gatekeeper has the highest prio and will therefore be the next. The problem is that the rescheduling on Linux IRQ exit will PREEMPT our task in xnshadow_harden(), it will NOT remove it from the Linux run-queue. And now we are in real troubles: The gatekeeper will kick off our RT part which will take over the thread's stack. As soon as the RT domain falls asleep and Linux takes over again, it will continue our non-RT part as well! Actually, this seems to be the reason for the panic in do_schedule_event(). Without CONFIG_XENO_OPT_DEBUG and this check, we will run both parts AT THE SAME TIME now, thus violating my first assumption. The system gets fatally corrupted. Yep, that's it. And we may not lock out the interrupts before calling schedule to prevent that. Well, I would be happy if someone can prove me wrong here. The problem is that I don't see a solution because Linux does not provide an atomic wake-up + schedule-out under CONFIG_PREEMPT. I'm currently considering a hack to remove the migrating Linux thread manually from the run-queue, but this could easily break the Linux scheduler. Maybe the best way would be to provide atomic wakeup-and-schedule support into the Adeos patch for Linux tasks; previous attempts to fix this by circumventing the potential for preemption from outside of the scheduler code have all failed, and this bug is uselessly lingering for that reason. Having slept on this, I'm going to add a simple extension to the Linux scheduler available from Adeos, in order to get an atomic/unpreemptable path from the statement when the current task's state is changed for suspension (e.g. TASK_INTERRUPTIBLE), to the point where schedule() normally enters its atomic section, which looks like the sanest way to solve this issue, i.e. without gory hackery all over the place. Patch will follow later for testing this approach. Jan PS: Out of curiosity I also checked RTAI's migration mechanism in this regard. It's similar except for the fact that it does the gatekeeper's work in the Linux scheduler's tail (i.e. after the next context switch). And RTAI seems it suffers from the very same race. So this is either a fundamental issue - or I'm fundamentally wrong. [1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1573 [2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L461 [3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L481 ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe.
RE: [Xenomai-core] broken docs
Dear Xenomai workers, Would it be possible to have an updated API documentation for Xenomai 2.0.x ? (I mean with formal parameters in function prototypes) I tried to regenerate it with make generate-doc, but it seems that a SVN working dir is required. It would be great. Thanks a lot Daniel > -Message d'origine- > De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De > la part de Jan Kiszka > Envoyé : mercredi, 18. janvier 2006 20:03 > À : xenomai-core > Objet : [Xenomai-core] broken docs > > Hi, > > I noticed that the doxygen API output is partly broken. Under the > nucleus and native skin modules most functions became variables. I > haven't looked at the source yet, but I guess it should be resolvable, > especially as the RTDM docs are fine. This mail is to file the issue, > maybe I will have a look later - /maybe/. > > Moreover, I was lacking a reference to RT_MUTEX_INFO. Does this > structure just needs to be added to the correct doxygen group? I guess > there are other undocumented structures out there as well (RT_TASK_INFO, > ...?). > > Jan
Re: [Xenomai-core] Missing IRQ end function on PowerPC
> This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > > Philippe Gerum wrote: > > Jan Kiszka wrote: > >> Wolfgang Grandegger wrote: > >> > This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > > Philippe Gerum wrote: > > > Gilles Chanteperdrix wrote: > > > >> Wolfgang Grandegger wrote: > >> > Therefore we need a dedicated function to re-enable interrupts in > >> the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is > >> more > obvious. On non-PPC archs it would translate to *_irq_enable. > >> I > realized, that *_irq_enable is used in various place/skins and > >> therefore > I have not yet provided a patch. > >> > >> The function xnarch_irq_enable seems to be called in only two > >>> > >>> functions, > >>> > >> xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is > >> set. > >> > >> In any case, since I am not sure if this has to be done at the Adeos > >> level or in Xenomai, we will wait for Philippe to come back and > >> decide. > >> > > > > ->enable() and ->end() all mixed up illustrates a silly x86 bias I > > once > > had. We do need to differentiate the mere enabling from the IRQ > > epilogue > > at PIC level since Linux does it - i.e. we don't want to change the > > semantics here. > > > > I would go for adding xnarch_end_irq -> rthal_irq_end to stick with > > the > > Linux naming scheme, and have the proper epilogue done from there on a > > per-arch basis. > > > > Current uses of xnarch_enable_irq() should be reserved to the > > non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the > > IRQ > > source at PIC level outside of any ISR context for such interrupt (*). > > XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of > > xnarch_enable_irq. I see no reason for this fix to leak to the Adeos > > layer, since the HAL already controls the way interrupts are ended > > actually; it just does it improperly on some platforms. > > > > (*) Jan, does rtdm_irq_enable() have the same meaning, or is it > > intended > > to be used from the ISR too in order to revalidate the source at PIC > >>> > >>> level? > >>> > Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line > after an interrupt, and the documentation does not suggest this either. > I see no problem here. > >>> > >>> But RTDM needs a rtdm_irq_end() functions as well in case the > >>> user wants to reenable the interrupt outside the ISR, I think. > >> > >> > >> If this is a valid use-case, it should be really straightforward to add > >> this abstraction to RTDM. We should just document that rtdm_irq_end() > >> shall not be invoked from IRQ context - > > > > It's the other way around: ->end() would indeed be called from the ISR > > epilogue, and ->enable() would not. > > I think we are talking about different things: Wolfgang was asking for > an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the > user API. You are now referring to the back-end which had to provide > some end-mechanism to be used both by the nucleus' ISR epilogue and that > rtdm_irq_end(), and that mechanism shall be told apart from the IRQ > enable/disable API. Well, that's at least how I think to have got it... Yep, I was thinking of deferred interrupt handling in a real-time task. Then rtdm_irq_end() would be required. > > > > to avoid breaking the chain in > >> the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to > >> re-enable the line from the handler. > >> > >> Jan > >> > >> > > > > > > Jan > > > Wolfgang.
Re: [Xenomai-core] Missing IRQ end function on PowerPC
Jan Kiszka wrote: Philippe Gerum wrote: Jan Kiszka wrote: Wolfgang Grandegger wrote: This is an OpenPGP/MIME signed message (RFC 2440 and 3156) Philippe Gerum wrote: Gilles Chanteperdrix wrote: Wolfgang Grandegger wrote: Therefore we need a dedicated function to re-enable interrupts in the > ISR. We could name it *_end_irq, but maybe *_enable_isr_irq is more > obvious. On non-PPC archs it would translate to *_irq_enable. I > realized, that *_irq_enable is used in various place/skins and therefore > I have not yet provided a patch. The function xnarch_irq_enable seems to be called in only two functions, xintr_enable and xnintr_irq_handler when the flag XN_ISR_ENABLE is set. In any case, since I am not sure if this has to be done at the Adeos level or in Xenomai, we will wait for Philippe to come back and decide. ->enable() and ->end() all mixed up illustrates a silly x86 bias I once had. We do need to differentiate the mere enabling from the IRQ epilogue at PIC level since Linux does it - i.e. we don't want to change the semantics here. I would go for adding xnarch_end_irq -> rthal_irq_end to stick with the Linux naming scheme, and have the proper epilogue done from there on a per-arch basis. Current uses of xnarch_enable_irq() should be reserved to the non-epilogue case, like xnintr_enable() i.e. forcibly unmasking the IRQ source at PIC level outside of any ISR context for such interrupt (*). XN_ISR_ENABLE would trigger a call to xnarch_end_irq, instead of xnarch_enable_irq. I see no reason for this fix to leak to the Adeos layer, since the HAL already controls the way interrupts are ended actually; it just does it improperly on some platforms. (*) Jan, does rtdm_irq_enable() have the same meaning, or is it intended to be used from the ISR too in order to revalidate the source at PIC level? Nope, rtdm_irq_enable() was never intended to re-enable an IRQ line after an interrupt, and the documentation does not suggest this either. I see no problem here. But RTDM needs a rtdm_irq_end() functions as well in case the user wants to reenable the interrupt outside the ISR, I think. If this is a valid use-case, it should be really straightforward to add this abstraction to RTDM. We should just document that rtdm_irq_end() shall not be invoked from IRQ context - It's the other way around: ->end() would indeed be called from the ISR epilogue, and ->enable() would not. I think we are talking about different things: Wolfgang was asking for an equivalent of RTDM_IRQ_ENABLE outside the IRQ handler. That's the user API. You are now referring to the back-end which had to provide some end-mechanism to be used both by the nucleus' ISR epilogue and that rtdm_irq_end(), and that mechanism shall be told apart from the IRQ enable/disable API. Well, that's at least how I think to have got it... My point was only about naming here: *_end() should be reserved for the epilogue stuff, hence be callable from ISR context. Aside of that, I'm ok with the abstraction you described though. to avoid breaking the chain in the shared-IRQ scenario. RTDM_IRQ_ENABLE must remain the way to re-enable the line from the handler. Jan Jan -- Philippe.
Re: [Xenomai-core] [PATCH] fix pthread cancellation in native skin
Jan Kiszka wrote: > Hi, > > Gilles' work on cancellation for the posix skin reminded me of this > issue I once discovered in the native skin: > > https://mail.gna.org/public/xenomai-core/2005-12/msg00014.html > > I found out that this can easily be fixed by switching the pthread of a > native task to PTHREAD_CANCEL_ASYNCHRONOUS. See attached patch. > > > At this chance I discovered that calling rt_task_delete for a task that > was created and started with T_SUSP mode but was not yet resumed, locks > up the system. More precisely: it raises a fatal warning when > XENO_OPT_DEBUG is on. Might be the case that it just works on system > without this switched on. Either this is a real bug, or the warning > needs to be fixed. (Deleting a task after rt_task_suspend works.) Actually, the fatal warning happens when starting with rt_task_start the task which was created with the T_SUSP bit. The task needs to wake-up in xnshadow_wait_barrier until it gets really suspended in primary mode by the final xnshadow_harden. This situation triggers the fatal, because the thread has the nucleus XNSUSP bit and is running in secondary mode. This is not the only situation where a thread with a nucleus suspension bit need to run shortly in secondary mode: it also occurs when suspending with xnpod_suspend_thread() a thread running in secondary mode; the thread receives the SIGCHLD signal and need to execute shortly with the suspension bit set in order to cause a migration to primary mode. So, the only case when we are sure that a user-space thread can not be scheduled by Linux seems to be when this thread does not have the XNRELAX bit. -- Gilles Chanteperdrix.
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
>> ... > I have not checked it yet but my presupposition that something as easy as : >> preempt_disable()>> wake_up_interruptible_sync();> schedule();>> preempt_enable();It's a no-go: "scheduling while atomic". One of my first attempts tosolve it. My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call schedule() while being non-preemptible. To this end, ACTIVE_PREEMPT is set up. The use of preempt_enable/disable() here is wrong. The only way to enter schedule() without being preemptible is viaACTIVE_PREEMPT. But the effect of that flag should be well-known now. Kind of Gordian knot. :( Maybe I have missed something so just for my curiosity : what does prevent the use of PREEMPT_ACTIVE here? We don't have a "preempted while atomic" message here as it seems to be a legal way to call schedule() with that flag being set up. >>> could work... err.. and don't blame me if no, it's some one else who has > written that nonsense :o)>> --> Best regards,> Dmitry Adamushko>Jan-- Best regards,Dmitry Adamushko
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Dmitry Adamushko wrote: >>> ... > >> I have not checked it yet but my presupposition that something as easy as >> : >>> preempt_disable() >>> >>> wake_up_interruptible_sync(); >>> schedule(); >>> >>> preempt_enable(); >> It's a no-go: "scheduling while atomic". One of my first attempts to >> solve it. > > > My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call > schedule() while being non-preemptible. > To this end, ACTIVE_PREEMPT is set up. > The use of preempt_enable/disable() here is wrong. > > > The only way to enter schedule() without being preemptible is via >> ACTIVE_PREEMPT. But the effect of that flag should be well-known now. >> Kind of Gordian knot. :( > > > Maybe I have missed something so just for my curiosity : what does prevent > the use of PREEMPT_ACTIVE here? > We don't have a "preempted while atomic" message here as it seems to be a > legal way to call schedule() with that flag being set up. When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from the run queue - independent of its current status. > > >>> could work... err.. and don't blame me if no, it's some one else who has >>> written that nonsense :o) >>> >>> -- >>> Best regards, >>> Dmitry Adamushko >>> >> Jan >> >> >> >> > > > -- > Best regards, > Dmitry Adamushko > > > > > > ___ > Xenomai-core mailing list > Xenomai-core@gna.org > https://mail.gna.org/listinfo/xenomai-core signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] broken docs
ROSSIER Daniel wrote: > Dear Xenomai workers, > > Would it be possible to have an updated API documentation for Xenomai > 2.0.x ? (I mean with formal parameters in function prototypes) > > I tried to regenerate it with make generate-doc, but it seems that a > SVN working dir is required. > > It would be great. I just had a "quick" look at the status of the documentation in SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express it politely) when it comes to tracking down bugs in your formatting. Something is broken in all modules except RTDM, and although I spent *a lot* of time in getting RTDM correctly formatted, I cannot tell what's wrong with the rest. This will require some long evenings of continuous patching the docs, recompiling, and checking the result. Any volunteers - I'm lacking the time? :-/ Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] broken docs
Jan Kiszka wrote: > ROSSIER Daniel wrote: > > Dear Xenomai workers, > > > > Would it be possible to have an updated API documentation for Xenomai > > 2.0.x ? (I mean with formal parameters in function prototypes) > > > > I tried to regenerate it with make generate-doc, but it seems that a > > SVN working dir is required. make generate-doc is needed for maintenance only. If you want to generate doxygen documentation, simply add --enable-dox-doc to Xenomai configure command line. > > > > It would be great. > > I just had a "quick" look at the status of the documentation in > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express > it politely) when it comes to tracking down bugs in your formatting. > Something is broken in all modules except RTDM, and although I spent *a > lot* of time in getting RTDM correctly formatted, I cannot tell what's > wrong with the rest. This will require some long evenings of > continuous patching the docs, recompiling, and checking the result. Any > volunteers - I'm lacking the time? :-/ Looking at the difference between RTDM documentation blocks and the other modules is that the other modules use the "fn" tag. Removing the "fn" tag from other modules documentation blocks seems to solve the issue. -- Gilles Chanteperdrix.
Re: [Xenomai-core] broken docs
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: > > ROSSIER Daniel wrote: > > > Dear Xenomai workers, > > > > > > Would it be possible to have an updated API documentation for Xenomai > > > 2.0.x ? (I mean with formal parameters in function prototypes) > > > > > > I tried to regenerate it with make generate-doc, but it seems that a > > > SVN working dir is required. > > make generate-doc is needed for maintenance only. If you want to > generate doxygen documentation, simply add --enable-dox-doc to Xenomai > configure command line. > > > > > > > It would be great. > > > > I just had a "quick" look at the status of the documentation in > > SVN-trunk (2.1). Unfortunately, doxygen is a terrible tool (to express > > it politely) when it comes to tracking down bugs in your formatting. > > Something is broken in all modules except RTDM, and although I spent *a > > lot* of time in getting RTDM correctly formatted, I cannot tell what's > > wrong with the rest. This will require some long evenings of > > continuous patching the docs, recompiling, and checking the result. Any > > volunteers - I'm lacking the time? :-/ > > Looking at the difference between RTDM documentation blocks and the > other modules is that the other modules use the "fn" tag. Removing the > "fn" tag from other modules documentation blocks seems to solve the > issue. > Indeed, works. Amazingly blind I was. Anyway, it still needs some work to remove that stuff (I wonder what the "correct" usage of @fn is...) and to wrap functions without bodies via "#ifdef DOXYGEN_CPP" like RTDM does. At this chance, I would also suggest to replace all \tag by @tag for the sake of a unified style (and who knows what side effects mixing up both may have). Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
On 30/01/06, Jan Kiszka <[EMAIL PROTECTED]> wrote: Dmitry Adamushko wrote:>>> ...>>> I have not checked it yet but my presupposition that something as easy as>> :>>> preempt_disable()>> wake_up_interruptible_sync(); >>> schedule();>> preempt_enable();>> It's a no-go: "scheduling while atomic". One of my first attempts to>> solve it.>>> My fault. I meant the way preempt_schedule() and preempt_irq_schedule() call > schedule() while being non-preemptible.> To this end, ACTIVE_PREEMPT is set up.> The use of preempt_enable/disable() here is wrong.>>> The only way to enter schedule() without being preemptible is via >> ACTIVE_PREEMPT. But the effect of that flag should be well-known now.>> Kind of Gordian knot. :(>>> Maybe I have missed something so just for my curiosity : what does prevent > the use of PREEMPT_ACTIVE here?> We don't have a "preempted while atomic" message here as it seems to be a> legal way to call schedule() with that flag being set up.When PREEMPT_ACTIVE is set, task gets /preempted/ but not removed from the run queue - independent of its current status. Err... that's exactly the reason I have explained in my first mail for this thread :) Blah.. I wish I was smoking something special before so I would point that as the reason of my forgetfulness. Actually, we could use PREEMPT_ACTIVE indeed + something else (probably another flag) to distinguish between a case when PREEMPT_ACTIVE is set by Linux and another case when it's set by xnshadow_harden(). xnshadow_harden() { struct task_struct *this_task = current; ... xnthread_t *thread = xnshadow_thread(this_task); if (!thread) return; ... gk->thread = thread; + add_preempt_count(PREEMPT_ACTIVE); // should be checked in schedule() + xnthread_set_flags(thread, XNATOMIC_TRANSIT); set_current_state(TASK_INTERRUPTIBLE); wake_up_interruptible_sync(&gk->waitq); + schedule(); + sub_preempt_count(PREEMPT_ACTIVE); ... } Then, something like the following code should be called from schedule() : void ipipe_transit_cleanup(struct task_struct *task, runqueue_t *rq) { xnthread_t *thread = xnshadow_thread(task); if (!thread) return; if (xnthread_test_flags(thread, XNATOMIC_TRANSIT)) { xnthread_clear_flags(thread, XNATOMIC_TRANSIT); deactivate_task(task, rq); } } - schedule.c : ... switch_count = &prev->nivcsw; if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) switch_count = &prev->nvcsw; if (unlikely((prev->state & TASK_INTERRUPTIBLE) && unlikely(signal_pending(prev)) )) prev->state = TASK_RUNNING; else { if (prev->state == TASK_UNINTERRUPTIBLE) rq->nr_uninterruptible++; deactivate_task(prev, rq); } } // removes a task from the active queue if PREEMPT_ACTIVE + // XNATOMIC_TRANSIT + #ifdef CONFIG_IPIPE + ipipe_transit_cleanup(prev, rq); + #endif /* CONFIG_IPIPE */ ... Not very gracefully maybe, but could work or am I missing something important? -- Best regards,Dmitry Adamushko
[Xenomai-core] [BUG] Interrupt problem on powerpc
On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: > On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the > following if the interrupt handler takes too long (i.e. next interrupt > gets generated before the previous one has finished) > > [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 > [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 > [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 > [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 > [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 42.923029] [] 0x0 > [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 > [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 > [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 > [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 > [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 > [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 43.411145] [c0006524] default_idle+0x10/0x60 > I think some probably important information is missing above this back-trace. What does the kernel state before these lines? Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. suspended is not needed, since the gatekeeper may have a high priority, and calling schedule() is enough. In any case, the waken up thread does not seem to be run immediately, so this rather look like the second case. Since in xnshadow_harden, the running thread marks itself as suspended before running wake_up_interruptible_sync, the gatekeeper will run when schedule() get called, which in turn, depend on the CONFIG_PREEMPT* configuration. In the non-preempt case, the current thread will be suspended and the gatekeeper will run when schedule() is explicitely called in xnshadow_harden(). In the preempt case, schedule gets called when the outermost spinlock is unlocked in wake_up_interruptible_sync(). > And how does it terminate: is only the system call migrated or is the thread > allowed to continue run (at a priority level equal to the Xenomai > priority level) until it hits something of the Xenomai API (or trivially: > explicitly go to RT using th
[Xenomai-core] [BUG?] dead code in ipipe_grab_irq
In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) >= 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 && first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain && !test_bit(IPIPE_STALL_FLAG, &ipipe_root_domain->cpudata[cpuid].status)); } Regards Anders Blomdell
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. Could anyone interested in this issue test the following couple of patches? atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15 atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 Both patches are needed to fix the issue. TIA, -- Philippe.
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already here - and a switch if the prio of the woken up task is higher. BTW, an easy way to enforce the current trouble is to remove the "_sync" from wake_up_interruptible. As I understand it this _sync is just an optimisation hint for Linux to avoid needless scheduler runs. You could not guarantee the following execution sequence doing so either, i.e. 1- current wakes up the gatekeeper 2- current goes sleeping to exit the Linux runqueue in schedule() 3- the gatekeeper resumes the shadow-side of the old current The point is all about making 100% sure that current is going to be unlinked from the Linux runqueue before the gatekeeper processes the resumption request, whatever event the kernel is processing asynchronously in the meantime. This is the reason why, as you already noticed, preempt_schedule_irq() nicely breaks our toy by stealing the CPU from the hardening thread whilst keeping it linked to the runqueue: upon return from such preemption, the gatekeeper might have run already, hence the newly hardened thread ends up being seen as runnable by both the Linux and Xeno schedulers. Rainy day indeed. We could rely on giving "current" the highest SCHED_FIFO priority in xnshadow_harden() before waking up the gk, until the gk eventually promotes it to the Xenomai scheduling mode and downgrades this priority back to normal, but we would pay additional latencies induced by each aborted rescheduling attempt that may occur during the atomic path we want to enforce. The other way is to make sure that no in-kernel preemption of the hardening task could occur after step 1) and until step 2) is performed, given that we cannot currently call schedule() with interrupts or preemption off. I'm on it. > Could anyone interested in this issue test the following couple of patches? > atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for 2.6.15 > atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 > Both patches are needed to fix the issue. > TIA, And now, Ladies and Gentlemen, with the patches attached. -- Philippe. --- 2.6.15-x86/kernel/sched.c 2006-01-07 15:18:31.0 +0100 +++ 2.6.15-ipipe/kernel/sched.c 2006-01-30 15:15:27.0 +0100 @@ -2963,7 +2963,7 @@ * Otherwise, whine if we are scheduling when we should not be. */ if (likely(!current->exit_state)) { - if (unlikely(in_atomic())) { + if (unlikely(!(current->state & TASK_ATOMICSWITCH) && in_atomic())) { printk(KERN_ERR "scheduling while atomic: " "%s/0x%08x/%d\n", current->comm, preempt_count(), current->pid); @@ -2972,8 +2972,13 @@ } profile_hit(SCHED_PROFILING, __builtin_return_ad
Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq
Anders Blomdell kirjoitti: In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. You're right. "first" doesn't need to be there at all, it's probably an old copy of something in the kernel. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) >= 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 && first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain && !test_bit(IPIPE_STALL_FLAG, &ipipe_root_domain->cpudata[cpuid].status)); } Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [BUG?] dead code in ipipe_grab_irq
Heikki Lindholm wrote: Anders Blomdell kirjoitti: In the following code (ppc), shouldn't first be either declared static or deleted? To me it looks like first is always equal to one when the else clause is evaluated. You're right. "first" doesn't need to be there at all, it's probably an old copy of something in the kernel. Yep; used to be a while() loop in the original implementation we do not perform here. asmlinkage int __ipipe_grab_irq(struct pt_regs *regs) { extern int ppc_spurious_interrupts; ipipe_declare_cpuid; int irq, first = 1; if ((irq = ppc_md.get_irq(regs)) >= 0) { __ipipe_handle_irq(irq, regs); first = 0; } else if (irq != -2 && first) ppc_spurious_interrupts++; ipipe_load_cpuid(); return (ipipe_percpu_domain[cpuid] == ipipe_root_domain && !test_bit(IPIPE_STALL_FLAG, &ipipe_root_domain->cpudata[cpuid].status)); } Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe.
Re: [Xenomai-core] [BUG] racy xnshadow_harden under CONFIG_PREEMPT
Philippe Gerum wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >> >>> Gilles Chanteperdrix wrote: >>> Jeroen Van den Keybus wrote: > Hello, > > > I'm currently not at a level to participate in your discussion. Although I'm > willing to supply you with stresstests, I would nevertheless like to learn > more from task migration as this debugging session proceeds. In order to do > so, please confirm the following statements or indicate where I went wrong. > I hope others may learn from this as well. > > xn_shadow_harden(): This is called whenever a Xenomai thread performs a > Linux (root domain) system call (notified by Adeos ?). xnshadow_harden() is called whenever a thread running in secondary mode (that is, running as a regular Linux thread, handled by Linux scheduler) is switching to primary mode (where it will run as a Xenomai thread, handled by Xenomai scheduler). Migrations occur for some system calls. More precisely, Xenomai skin system calls tables associates a few flags with each system call, and some of these flags cause migration of the caller when it issues the system call. Each Xenomai user-space thread has two contexts, a regular Linux thread context, and a Xenomai thread called "shadow" thread. Both contexts share the same stack and program counter, so that at any time, at least one of the two contexts is seen as suspended by the scheduler which handles it. Before xnshadow_harden is called, the Linux thread is running, and its shadow is seen in suspended state with XNRELAX bit by Xenomai scheduler. After xnshadow_harden, the Linux context is seen suspended with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as running by Xenomai scheduler. The migrating thread > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel > wake_up_interruptible_sync() call. Is this thread actually run or does it > merely put the thread in some Linux to-do list (I assumed the first case) ? Here, I am not sure, but it seems that when calling wake_up_interruptible_sync the woken up task is put in the current CPU runqueue, and this task (i.e. the gatekeeper), will not run until the current thread (i.e. the thread running xnshadow_harden) marks itself as suspended and calls schedule(). Maybe, marking the running thread as >>> >>> >>> >>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already >>> here - and a switch if the prio of the woken up task is higher. >>> >>> BTW, an easy way to enforce the current trouble is to remove the "_sync" >>> from wake_up_interruptible. As I understand it this _sync is just an >>> optimisation hint for Linux to avoid needless scheduler runs. >>> >> >> You could not guarantee the following execution sequence doing so >> either, i.e. >> >> 1- current wakes up the gatekeeper >> 2- current goes sleeping to exit the Linux runqueue in schedule() >> 3- the gatekeeper resumes the shadow-side of the old current >> >> The point is all about making 100% sure that current is going to be >> unlinked from the Linux runqueue before the gatekeeper processes the >> resumption request, whatever event the kernel is processing >> asynchronously in the meantime. This is the reason why, as you already >> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the >> CPU from the hardening thread whilst keeping it linked to the >> runqueue: upon return from such preemption, the gatekeeper might have >> run already, hence the newly hardened thread ends up being seen as >> runnable by both the Linux and Xeno schedulers. Rainy day indeed. >> >> We could rely on giving "current" the highest SCHED_FIFO priority in >> xnshadow_harden() before waking up the gk, until the gk eventually >> promotes it to the Xenomai scheduling mode and downgrades this >> priority back to normal, but we would pay additional latencies induced >> by each aborted rescheduling attempt that may occur during the atomic >> path we want to enforce. >> >> The other way is to make sure that no in-kernel preemption of the >> hardening task could occur after step 1) and until step 2) is >> performed, given that we cannot currently call schedule() with >> interrupts or preemption off. I'm on it. >> > > Could anyone interested in this issue test the following couple of patches? > > atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for > 2.6.15 > atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2 > > Both patches are needed to fix the issue. > > TIA, > Looks good. I tried Jeroen's test-case and I was not able to reproduce the crash anymore. I think it's time for a new ipipe-release. ;) At this chance: any comments on the panic-freeze extension for the tracer? I need to rework the Xenomai patch, but the ipipe side should be re
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! > What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, "Functional Exception"), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Regards Anders Blomdell
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(&desc->lock) __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(&desc->lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 Any ideas of where to look? Regards Anders Blomdell ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core -- Philippe.
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: > Jan Kiszka wrote: >> Anders Blomdell wrote: >> >>> On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the >>> following if the interrupt handler takes too long (i.e. next interrupt >>> gets generated before the previous one has finished) >>> >>> [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 >>> [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 >>> [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 >>> [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 >>> [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 >>> [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc >>> [ 42.923029] [] 0x0 >>> [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 >>> [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 >>> [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 >>> [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 >>> [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 >>> [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 >>> [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc >>> [ 43.411145] [c0006524] default_idle+0x10/0x60 >>> >> >> >> I think some probably important information is missing above this >> back-trace. > You are so right! > >> What does the kernel state before these lines? > > [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 > [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, > .owner_cpu: 0 > [ 42.511681] Call trace: > [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 > [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 > [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 > [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 > [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 42.923029] [] 0x0 > [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 > [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 > [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 > [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 > [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 > [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 > [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc > [ 43.411145] [c0006524] default_idle+0x10/0x60 > > > It might be that the problem is related to the fact that the interrupt > is a shared one (Harrier chip, "Functional Exception"), that is used for > both message-passing (should be RT) and UART (Linux, i.e. non-RT), my > current IRQ handler always pends the interrupt to the linux domain > (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when > it wasn't a UART interrupt) has left the interrupts turned off. > > What I believe should be done, is > > 1. When UART interrupt is received, disable further non-RT interrupts > on this IRQ-line, pend interrupt to Linux. > 2. Handle RT interrupts on this IRQ line > 3. When Linux has finished the pended interrupt, reenable non-RT > interrupts. > > but I have neither been able to achieve this, nor to verify that it is > the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. Jan signature.asc Description: OpenPGP digital signature
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Jan Kiszka wrote: Anders Blomdell wrote: Jan Kiszka wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 I think some probably important information is missing above this back-trace. You are so right! What does the kernel state before these lines? [ 42.346643] BUG: spinlock recursion on CPU#0, swapper/0 [ 42.415438] lock: c01c943c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 [ 42.511681] Call trace: [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 [ 42.660637] [c000f388] __ipipe_ack_irq+0x88/0x130 [ 42.723657] [c000efe4] __ipipe_handle_irq+0x140/0x268 [ 42.791259] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 42.854279] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 42.923029] [] 0x0 [ 42.959695] [c0038348] __do_IRQ+0x134/0x164 [ 43.015839] [c000ed04] __ipipe_do_IRQ+0x2c/0x44 [ 43.076567] [c000eb08] __ipipe_sync_stage+0x1ec/0x228 [ 43.144170] [c0039420] ipipe_suspend_domain+0x7c/0xc4 [ 43.211774] [c000f0b0] __ipipe_handle_irq+0x20c/0x268 [ 43.279377] [c000f144] __ipipe_grab_irq+0x38/0xa4 [ 43.342396] [c0005058] __ipipe_ret_from_except+0x0/0xc [ 43.411145] [c0006524] default_idle+0x10/0x60 It might be that the problem is related to the fact that the interrupt is a shared one (Harrier chip, "Functional Exception"), that is used for both message-passing (should be RT) and UART (Linux, i.e. non-RT), my current IRQ handler always pends the interrupt to the linux domain (RTDM_IRQ_PROPAGATE), because all other attempts (RTDM_IRQ_ENABLE when it wasn't a UART interrupt) has left the interrupts turned off. What I believe should be done, is 1. When UART interrupt is received, disable further non-RT interrupts on this IRQ-line, pend interrupt to Linux. 2. Handle RT interrupts on this IRQ line 3. When Linux has finished the pended interrupt, reenable non-RT interrupts. but I have neither been able to achieve this, nor to verify that it is the right thing to do... Your approach is basically what I proposed some years back on rtai-dev for handling unresolvable shared RT/NRT IRQs. I once successfully tested such a setup with two network cards, one RT, the other Linux. So when you are really doomed and cannot change the IRQ line of your RT device, this is a kind of emergency workaround. Not nice and generic (you have to write the stub for disabling the NRT IRQ source), but it should work. I'm doomed, the interrupts live in the same chip... The problem is that I have not found any good place to reenable the non-RT interrupts. Anyway, I do not understand what made your spinlock recurs. This shared IRQ scenario should only cause indeterminism to the RT driver (by blocking the line until the Linux handler can release it), but it must not trigger this bug. OK, seems like have two problems then, I'll try to hunt it down /Anders
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Philippe Gerum wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock more likely arch/ppc/kernel/*.c :-) for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(&desc->lock) __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(&desc->lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. So you will expect an addition to the ipipe patch then? /Anders
Re: [Xenomai-core] [BUG] Interrupt problem on powerpc
Anders Blomdell wrote: Philippe Gerum wrote: Anders Blomdell wrote: On a PrPMC800 (PPC 7410 processor) withe Xenomai-2.1-rc2, I get the following if the interrupt handler takes too long (i.e. next interrupt gets generated before the previous one has finished) [ 42.543765] [c00c2008] spin_bug+0xa8/0xc4 [ 42.597617] [c00c22d4] _raw_spin_lock+0x180/0x184 Someone (in arch/ppc64/kernel/*.c?) is spinlocking+irqsave desc->lock more likely arch/ppc/kernel/*.c :-) Gah... looks like I'm still confused by ia64 issues I'm chasing right now. (Why on earth do we need so many bits on our CPUs that only serve the purpose of raising so many problems?) for any given IRQ without using the Adeos *_hw() spinlock variant that masks the interrupt at hw level. So we seem to have: spin_lock_irqsave(&desc->lock) __ipipe_grab_irq __ipipe_handle_irq __ipipe_ack_irq spin_lock...(&desc->lock) deadlock. The point is about having spinlock_irqsave only _virtually_ masking the interrupts by preventing their associated Linux handler from being called, but despite this, Adeos still actually acquires and acknowledges the incoming hw events before logging them, even if their associated action happen to be postponed until spinlock_irq_restore() is called. To solve this, all spinlocks potentially touched by the ipipe's primary IRQ handler and/or the code it calls indirectly, _must_ be operated using the _hw() call variant all over the kernel, so that no hw IRQ can be taken while those spinlocks are held by Linux. Usually, only the spinlock(s) protecting the interrupt descriptors or the PIC hardware are concerned. So you will expect an addition to the ipipe patch then? Yep. We first need to find out who's grabbing the shared spinlock using the vanilla Linux primitives. /Anders -- Philippe.
[Xenomai-core] [PATCH] rt_heap reminder
Hi all, as a reminder (userspace, native skin, shared heap) [1]: API documentation: "If the heap is shared, this value can be either zero, or the same value given to rt_heap_create()." This is not true. As the heapsize gets altered in rt_heap_create for page size alignment, the following call to rt_heap_alloc with the same value will fail. Ex: rt_heap_create( ..., ..., 1, ... ) rt_heap_alloc( ..., 1, ..., ) -> This call fails I suggest only accepting zero as a valid size for shared heaps. about attached patch: 1) not tested 2) there are possible better names than H_ALL 3) the comments could be in a better english 4) i hope you get the idea thx kisda [1] https://mail.gna.org/public/xenomai-core/2006-01/msg00177.html Index: include/native/heap.h === --- include/native/heap.h (Revision 465) +++ include/native/heap.h (Arbeitskopie) @@ -32,6 +32,9 @@ #define H_DMA0x100 /* Use memory suitable for DMA. */ #define H_SHARED 0x200 /* Use mappable shared memory. */ +/* Operation flags. */ +#define H_ALL0x0 /* Entire heap space. */ + typedef struct rt_heap_info { int nwaiters; /* !< Number of pending tasks. */ Index: ksrc/skins/native/heap.c === --- ksrc/skins/native/heap.c (Revision 465) +++ ksrc/skins/native/heap.c (Arbeitskopie) @@ -410,10 +410,9 @@ * from. * * @param size The requested size in bytes of the block. If the heap - * is shared, this value can be either zero, or the same value given - * to rt_heap_create(). In any case, the same block covering the - * entire heap space will always be returned to all callers of this - * service. + * is shared, H_ALL should be passed, as always the same block + * covering the entire heap space will be returned to all callers of + * this service. * * @param timeout The number of clock ticks to wait for a block of * sufficient size to be available from a local heap (see @@ -432,8 +431,7 @@ * @return 0 is returned upon success. Otherwise: * * - -EINVAL is returned if @a heap is not a heap descriptor, or @a - * heap is shared (i.e. H_SHARED mode) and @a size is non-zero but - * does not match the actual heap size passed to rt_heap_create(). + * heap is shared (i.e. H_SHARED mode) and @a size is not H_ALL. * * - -EIDRM is returned if @a heap is a deleted heap descriptor. * @@ -503,12 +501,7 @@ if (!block) { - /* It's ok to pass zero for size here, since the requested - size is implicitely the whole heap space; but if - non-zero is given, it must match the actual heap - size. */ - - if (size > 0 && size != xnheap_size(&heap->heap_base)) + if (size != H_ALL) { err = -EINVAL; goto unlock_and_exit; pgpsDlebpJPHk.pgp Description: PGP signature