Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Philippe Gerum wrote:
>>>>>>> Jan Kiszka wrote:
>>>>>>>> commit 728fc8970e2032b3280971788f1223f3ad82d80d
>>>>>>>> Author: Jan Kiszka <jan.kis...@siemens.com>
>>>>>>>> Date:   Thu Jan 15 11:10:24 2009 +0100
>>>>>>>>     xnpipe: Fix racy callback handlers
>>>>>>>>     Invocation of input, output and alloc handler must take place under
>>>>>>>>     nklock to properly synchronize with xnpipe_disconnect. Change all
>>>>>>>>     callers to comply with this policy.
>>>>>>> That one is under investigation. I agree on the bug report (thanks 
>>>>>>> btw), but I
>>>>>>> disagree on the fix. Basically, we can't run all hooks under nklock. For
>>>>>>> instance, the alloc_handler may issue kmalloc() calls when issued from 
>>>>>>> the Linux
>>>>>>> write endpoint.
>>>>>> You mean it /could/? Because no in-tree user (ie. native) calls
>>>>>> rt-unsafe services from its alloc_handler.
>>>>> When you export a public interface, it is better not to make it 
>>>>> incompatible
>>>>> unless there is no other way to fix a situation. Doing so is last resort 
>>>>> for me.
>>>> OTH, there is nothing documented yet about those callback handlers or
>>>> xnpipe_connect. So we could only break _assumptions_ about this
>>>> interface.
>>> Actually, we would break existing implementations, but I understand your 
>>> point.
>>>  But, of course, I would be happy if we could continue to keep
>>>> the critical section length short. I just don't see how to achieve this
>>>> without significant restrictions on the callback handlers and their use
>>>> cases.
>>> That is because the semantics of those callouts is not that smart. If we 
>>> have to
>>> break the API to solve the locking issue, I want to get the semantics fixed 
>>> in
>>> the same move (which may help a lot in solving the locking issue as well), 
>>> so we
>>> don't end up with two subsequent breakage.
>> As suspected, the callback management in the message pipe code had very 
>> serious
>> issues, including the xnpipe_disconnect / xnpipe_release race. At least:
>> - the latency would simply skyrocket when the input queue (i.e. userland to
>> kernel) is flushed from xnpipe_cleanup_user_conn, because the latter holds 
>> the
>> nklock with interrupts off when doing so. I've tested this case on v2.4.x, 
>> the
>> results on a latency test when the pipe test code is running in parallel are 
>> not
>> pretty (i.e. > 220 us latency spot).
>> - message buffers waiting in the output queue while the latter is flushed 
>> would
>> NOT be returned to the memory pool when an input handler is present, which is
>> the case with the native API that provides one. All the confusion went from 
>> the
>> output notification and free buffer logic that were combined into the output
>> handler. This went unnoticed probably because messages sent via the write()
>> endpoint are immediately consumed by the receiving side in a RT task that
>> preempts, but still, the bug is there.
>> The disconnect vs release race can be fixed in a (nearly) lockless way using 
>> a
>> lingering close sequence, that prevents xnpipe_disconnect() to wipe out the
>> extra state (e.g. RT_PIPE struct) until xnpipe_release() does not need it
>> anymore. In short, if the userland side is connected, xnpipe_disconnect() 
>> will
>> defer the actual release of that extra state to xnpipe_release(); otherwise, 
>> it
>> will discard it immediately. A new callback, namely .release has been added 
>> for
>> that purpose.
> Just had a short glance so far. Looks good - except that there is still
> a use (cookie, now xstate, from xnpipe_state) after release
> (xnpipe_minor_free). Can't we push the minor_free after calling the
> release handler?

We should indeed postpone this just in case the upper layer indexes the extra
state on the minor value. We can also simplify a few things doing so.

--- ksrc/nucleus/pipe.c (revision 4565)
+++ ksrc/nucleus/pipe.c (working copy)
@@ -77,11 +77,9 @@

 static inline void xnpipe_minor_free(int minor)
-       if (minor < 0 || minor >= XNPIPE_NDEVS)
-               return;
-       __clrbits(xnpipe_bitmap[minor / BITS_PER_LONG],
-                 1UL << (minor % BITS_PER_LONG));
+       /* May be called with nklock free. */
+       clrbits(xnpipe_bitmap[minor / BITS_PER_LONG],
+               1UL << (minor % BITS_PER_LONG));

 static inline void xnpipe_enqueue_wait(struct xnpipe_state *state, int mask)
@@ -413,9 +411,9 @@
                __setbits(state->status, XNPIPE_KERN_LCLOSE);
                xnlock_put_irqrestore(&nklock, s);
        } else {
-               xnpipe_minor_free(minor);
                xnlock_put_irqrestore(&nklock, s);
+               xnpipe_minor_free(minor);

        if (need_sched)
@@ -621,9 +619,9 @@
                __clrbits((__state)->status, XNPIPE_USER_CONN);         \
                if (testbits((__state)->status, XNPIPE_KERN_LCLOSE)) {  \
                        clrbits((__state)->status, XNPIPE_KERN_LCLOSE); \
-                       xnpipe_minor_free(xnminor_from_state(__state)); \
                        xnlock_put_irqrestore(&nklock, (__s));          \
                        (__state)->ops.release((__state)->xstate);      \
+                       xnpipe_minor_free(xnminor_from_state(__state)); \
                        xnlock_get_irqsave(&nklock, (__s));             \
                }                                                       \
        } while(0)

>> The internal xnpipe API had to be fixed with respect to the new set of 
>> callbacks
>> it accepts, but the overall logic remained the same. The handlers were simply
>> made more consistent with what the message pipe machinery expects from them.
>> There is no user visible change.
>> I have committed the changes that fix all the issues above; the new code
>> survived significant stress testing here, which only means that this seems 
>> to be
>> a reasonable framework to start working with.
> Will see that I can quickly integrate this in our environment to broaden
> the test base.

Great, thanks.

>> Given the severity of the bugs, I definitely plan to backport those changes 
>> to
>> v2.4.x at some point (the earlier, the better), since we could not reasonably
>> live with such issues in a stable version.
> Yep, would be good.
> Thanks,
> Jan


Xenomai-core mailing list

Reply via email to