Hi everybody,

I have got a few synch-related problems while adding the code for supporting the rt shared irqs to
the nucleus layer. But at first let's take a look at some adeos code that, well, possibly has the
same problem.

let's say the primary domain is interested in irq = TEST_IRQ.

CPU 0:

-  TEST_IRQ occurs.

-  ipipe_handle_irq()    The local interrupts are off on entry.

    testbit(IPIPE_HANDLE_FLAG, ipd->irqs[irq].control) shows whether a given domain is interested in handling the irq.

    Then cpu-local data is mainly used, e.g. cpudata->irq_hits[irq]++ and proper changes of pending_lo/hi
CPU 1:

-  ... -> rthal_irq_release() ->  ipipe_virtualize_irq(TEST_IRQ, ... handler = NULL, ...)

-  Here, __ipipe_pipe_lock + interrupts off.

    o  ipipe_virtualize_irq() drops the IPIPE_HANDLE_FLAG and sets up ipd->irqs[irq].handler to NULL.
    First observations, at the same time the TEST_IRQ still can be marked as pending
    (i.e. some_cpudata->irq_pending_lo/hi and irq_hits)!

CPU 0:

-  actually, ipipe_handle_irq() now (if not yet done before) may be calling __ipipe_set_irq_bit(ipd,cpuid,irq)
    but noone is interested in this irq == TEST_IRQ already. But no matter,
    the fact is that cpudata->irq_* of a given cpu denotes a pending irq, let's go further.

-  later on, ipipe_sync_stage() is called for a given domain and cpu.

    It handles all irqs which are marked for the domain and cpu. So it's based only on the check of
    cpudata->irq_(pending_(hi,lo), hits) fields.
    Let's recall that TEST_IRQ is marked here as well...
    In some way (ipipe_call_root_*irq_handler() or directly ipd->irqs[irq].handler()) the isr handler is called
    and boom! it's NULL.

Have I missed something that prevents this?


In a sense, the synch. problem I have mentioned at the beginning of this mail reminds this scenario.

The draft patch is enclosed just to elustrate an idea.

There are 2 problems:

1) we probably don't want to hold any lock while processing the handlers list (xnintr_intr_handler(): for all shirq->handlers).

Here we may use the approach used by linux in manage.c::free_irq() vs. handle.c::__do_IRQ() that calls handle_IRQ_event() without the desc->lock being locked.
The magic is in free_irq() : it removes the "action" item from the list but then falls into busy waiting until the IRQ_INPROGRESS flag is removed. And only then it deletes the "action" item.
At that time, the "action" item is already not on the list but still points to the valid tail of the list so the iteration may be proceeded even if the current item == "deleted" one.
Blah, just I guess the presupposition here is that the operation of deletion (free_irq():: *pp = action->next) is atomic.

2) xnintr_irq_handler() gets a cookie which is == xnshared_irq_t* (see xnintr_attach) that may already be invalid at that time or, and that's a problem, become invalid during the execution of xnintr_irq_handler().
To prevent that, we could add a flag like IRQ_INPROGRESS but

either we have to set/remove it on the adeos layer before the control is passed to the xnintr_irq_handler() (to be sure that cookie is not xnfree()'d. xnintr_detach() will do busy waiting)

or we may set/remove the flag in the xnintr_irq_handler() but have to ignore the passed "cookie" and
get it as cookie =  rthal_irq_cookie(ipd,irq). Mmm, not very gracefully I'd say.

Ok, it's enough for the New Year's Eve.

Happy New Year to everybody! I wish you all the best for the New Year :o)

Best regards,
Dmitry Adamushko

Attachment: nucleus_intr.c.patch
Description: Binary data

Attachment: nucleus_intr.h.patch
Description: Binary data

Xenomai-core mailing list

Reply via email to