Am 12.11.2010 09:48, Philippe Gerum wrote:
> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
>> Am 09.11.2010 10:36, Philippe Gerum wrote:
>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>>>> The following patches implements the teardown approach. The basic idea
>>>>>>>> is:
>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>>>>>> providing the revised ipipe_control_irq call.
>>>>>>>> - fix the SMP race when detaching interrupts.
>>>>>>>
>>>>>>> Looks good.
>>>>>>
>>>>>> This actually causes one regression: I've just learned that people are
>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>>>>>> non-root contexts or while hard IRQs are disable. The latter requirement
>>>>>> would be violated by this fix now.
>>>>>
>>>>> What we could do is handle this corner-case in the ipipe directly, going
>>>>> for a nop when IRQs are off on a per-arch basis only to please those
>>>>> users,
>>>>
>>>> Don't we disable hard IRQs also then the root domain is the only
>>>> registered one? I'm worried about pushing regressions around, then to
>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
>>>> for powerpc).
>>>
>>> The idea is to provide an ad hoc ipipe service for this, to be used by
>>> the HAL. A service that would check the controller for the target IRQ,
>>> and handle MSI ones conditionally. For sure, we just can't put those
>>> conditionally bluntly into the chip mask handler and expect the kernel
>>> to be happy.
>>>
>>> In fact, we already have __ipipe_enable/disable_irq from the internal
>>> Adeos interface avail, but they are mostly wrappers for now. We could
>>> make them a bit more smart, and handle the MSI issue as well. We would
>>> then tell the HAL to switch to using those arch-agnostic helpers
>>> generally, instead of peeking directly into the chip controller structs
>>> like today.
>>
>> This belongs to I-pipe, like we already have ipipe_end, just properly
>> wrapped to avoid descriptor access. That's specifically important if we
>> want to emulate MSI masking in software. I've the generic I-pipe
>> infrastructure ready, but the backend, so far consisting of x86 MSI
>> hardening, unfortunately needs to be rewritten.
>>
>>>
>>> If that ipipe "feature" is not detected by the HAL, then we would
>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
>>> leave the SMP race window open, but since we need recent ipipes to get
>>> it plugged already anyway (for the revised ipipe_control_irq), we would
>>> still remain in the current situation:
>>> - old patches? no SMP race fix, no regression
>>> - new patches? SMP race fix avail, no regression
>>
>> Sounds good.
> 
> Now that I slept on it, I find the approach of working around pipeline
> limitations this way, to be incorrect.
> 
> Basically, the issue is that we still don't have 100% reliable handling
> of MSI interrupts (actually, we only have partial handling, and solely
> for x86), but this is no reason to introduce code in the pipeline
> interface which would perpetuate this fact. I see this as a "all or
> nothing" issue: either MSI is fully handled and there shall be no
> restriction on applying common operations such as masking/unmasking on
> the related IRQs, or it is not, and we should not export "conditionally
> working" APIs.
> 
> In the latter case, the responsibility to rely on MSI support belongs to
> the user, which then should know about the pending restrictions, and
> decides for himself whether to use MSI. So I'm heading to this solution
> instead:
> 
> - when detaching the last handler for a given IRQ, instead of forcibly
> disabling the IRQ line, the nucleus would just make sure that such IRQ
> is already in a disabled state, and bail out on error if not (probably
> with a kernel warning to make the issue obvious).

Fiddling with the IRQ "line" state is a workaround for the missing
synchronize_irq service in Xenomai/I-pipe. If we had this, all this
disabling become unneeded.

> 
> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
> so that xnintr_detach can determine whether the call is legit. Of
> course, this also means that any attempt to take sideways to
> enable/disable nucleus managed interrupts at PIC level would break that
> logic, but doing so would be the root bug anyway.
> 
> The advantage of doing so would be three-fold:
> 
> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
> support is half working, half broken. We need to fix it properly, so
> that we can use it 100% reliably, from whatever context commonly allowed
> for enabling/disabling IRQs (and not "from root domain with IRQs on"
> only). Typically, I fail to see how one would cope with such limitation,
> if a real-time handler detects that some device is going wild and really
> needs to shut it down before the whole system crashes.

MSIs are edge-triggered. Only broken hardware continuously sending bogus
messages can theoretically cause troubles. In practice (ie. in absence
of broken HW), we see a single spurious IRQ at worst.

> 
> - we enforce the API usage requirement to disable an interrupt line with
> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
> it, which is common sense anyway.

That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
(aka MSIs). No-go IMHO.

> 
> - absolutely no change for people who currently rely on partial MSI
> support, provided they duly disable IRQ lines before detaching their
> last handler via the appropriate RTDM interface.
> 
> Can we deal on this?
> 

Nope, don't think so. The only option I see (besides using my original
proposal of a dummy handler for deregistering - still much simpler than
the current patches) is to emulate MSI masking in the same run, thus
providing solutions for both issues.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Reply via email to