On 2011-06-18 14:09, Gilles Chanteperdrix wrote: > On 06/18/2011 12:21 PM, Jan Kiszka wrote: >> On 2011-06-17 20:55, Gilles Chanteperdrix wrote: >>> On 06/17/2011 07:03 PM, Jan Kiszka wrote: >>>> On 2011-06-17 18:53, Gilles Chanteperdrix wrote: >>>>> On 06/17/2011 04:38 PM, GIT version control wrote: >>>>>> Module: xenomai-jki >>>>>> Branch: for-upstream >>>>>> Commit: 7203b1a66ca0825d5bcda1c3abab9ca048177914 >>>>>> URL: >>>>>> http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=7203b1a66ca0825d5bcda1c3abab9ca048177914 >>>>>> >>>>>> Author: Jan Kiszka <jan.kis...@siemens.com> >>>>>> Date: Fri Jun 17 09:46:19 2011 +0200 >>>>>> >>>>>> nucleus: Fix interrupt handler tails >>>>>> >>>>>> Our current interrupt handlers assume that they leave over the same task >>>>>> and CPU they entered. But commit f6af9b831c broke this assumption: >>>>>> xnpod_schedule invoked from the handler tail can now actually trigger a >>>>>> domain migration, and that can also include a CPU migration. This causes >>>>>> subtle corruptions as invalid xnstat_exectime_t objects may be restored >>>>>> and - even worse - we may improperly flush XNHTICK of the old CPU, >>>>>> leaving Linux timer-wise dead there (as happened to us). >>>>>> >>>>>> Fix this by moving XNHTICK replay and exectime accounting before the >>>>>> scheduling point. Note that this introduces a tiny imprecision in the >>>>>> accounting. >>>>> >>>>> I am not sure I understand why moving the XNHTICK replay is needed: if >>>>> we switch to secondary mode, the HTICK is handled by xnpod_schedule >>>>> anyway, or am I missing something? >>>> >>>> The replay can work on an invalid sched (after CPU migration in >>>> secondary mode). We could reload the sched, but just moving the replay >>>> is simpler. >>> >>> But does it not remove the purpose of this delayed replay? >> >> Hmm, yes, in the corner case of coalesced timed RT task wakeup and host >> tick over a root thread. Well, then we actually have to reload sched and >> keep the ordering to catch that as well. >> >>> >>> Note that if you want to reload the sched, you also have to shut >>> interrupts off, because upon return from xnpod_schedule after migration, >>> interrupts are on. >> >> That would be another severe bug if we left an interrupt handler with >> hard IRQs enabled - the interrupt tail code of ipipe would break. >> >> Fortunately, only xnpod_suspend_thread re-enables IRQs and returns. >> xnpod_schedule also re-enables but then terminates the context (in >> xnshadow_exit). So we are safe. > > I do not think we are, at least on platforms where context switches > happen with irqs on.
Can you sketch a problematic path? Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core