On Thu, 2009-05-14 at 14:52 +0200, Gilles Chanteperdrix wrote: > Philippe Gerum wrote: > > On Thu, 2009-05-14 at 12:20 +0200, Jan Kiszka wrote: > >> Philippe Gerum wrote: > >>> On Wed, 2009-05-13 at 18:10 +0200, Jan Kiszka wrote: > >>>> Philippe Gerum wrote: > >>>>> On Wed, 2009-05-13 at 17:28 +0200, Jan Kiszka wrote: > >>>>>> Philippe Gerum wrote: > >>>>>>> On Wed, 2009-05-13 at 15:18 +0200, Jan Kiszka wrote: > >>>>>>>> Gilles Chanteperdrix wrote: > >>>>>>>>> Jan Kiszka wrote: > >>>>>>>>>> Hi Gilles, > >>>>>>>>>> > >>>>>>>>>> I'm currently facing a nasty effect with switchtest over latest > >>>>>>>>>> git head > >>>>>>>>>> (only tested this so far): running it inside my test VM (ie. with > >>>>>>>>>> frequent excessive latencies) I get a stalled Linux timer IRQ quite > >>>>>>>>>> quickly. System is otherwise still responsive, Xenomai timers are > >>>>>>>>>> still > >>>>>>>>>> being delivered, other Linux IRQs too. switchtest complained about > >>>>>>>>>> > >>>>>>>>>> "Warning: Linux is compiled to use FPU in kernel-space." > >>>>>>>>>> > >>>>>>>>>> when it was started. Kernels are 2.6.28.9/ipipe-x86-2.2-07 and > >>>>>>>>>> 2.6.29.3/ipipe-x86-2.3-01 (LTTng patched in, but unused), both > >>>>>>>>>> show the > >>>>>>>>>> same effect. > >>>>>>>>>> > >>>>>>>>>> Seen this before? > >>>>>>>>> The warning about Linux being compiled to use FPU in kernel-space > >>>>>>>>> means > >>>>>>>>> that you enabled soft RAID or compiled for K7, Geode, or any other > >>>>>>>> RAID is on (ordinary server config). > >>>>>>>> > >>>>>>>>> configuration using 3DNow for such simple operations as memcpy. It > >>>>>>>>> is > >>>>>>>>> harmless, it simply means that switchtest can not use fpu in > >>>>>>>>> kernel-space. > >>>>>>>>> > >>>>>>>>> The bug you have is probably the same as the one described here, > >>>>>>>>> which I > >>>>>>>>> am able to reproduce on my atom: > >>>>>>>>> https://mail.gna.org/public/xenomai-help/2009-04/msg00200.html > >>>>>>>>> > >>>>>>>>> Unfortunately, I for one am working on ARM issues and am not > >>>>>>>>> available > >>>>>>>>> to debug x86 issues. I think Philippe is busy too... > >>>>>>>> OK, looks like I got the same flu here. > >>>>>>>> > >>>>>>>> Philippe, did you find out any more details in the meantime? Then I'm > >>>>>>>> afraid I have to pick this up. > >>>>>>> No, I did not resume this task yet. Working from the powerpc side of > >>>>>>> the > >>>>>>> universe here. > >>>>>> Hoho, don't think this rain here over x86 would have never made it down > >>>>>> to ARM or PPC land! ;) > >>>>>> > >>>>>> Martin, could you check if this helps you, too? > >>>>>> > >>>>>> Jan > >>>>>> > >>>>>> (as usual, ready to be pulled from 'for-upstream') > >>>>>> > >>>>>> ---------> > >>>>>> > >>>>>> Host IRQs may not only be triggered from non-root domains. > >>>>> Are you sure of this? I can't find any spot where this assumption would > >>>>> be wrong. host_pend() is basically there to relay RT timer ticks and > >>>>> device IRQs, and this only happens on behalf of the pipeline head. At > >>>>> least, this is how rthal_irq_host_pend() should be used in any case. If > >>>>> you did find a spot where this interface is being called from the lower > >>>>> stage, then this is the root bug to fix. > >>>> I haven't studied the I-pipe trace /wrt this in details yet, but I could > >>>> imagine that some shadow task is interrupted in primary mode by the > >>>> timer IRQ and then leaves the handler in secondary mode due to whatever > >>>> events between schedule-out and in at the end of xnintr_clock_handler. > >>>> > >>> You need a thread context to move to secondary, I just can't see how > >>> such scenario would be possible. > >> Here is the trace of events: > >> > >> => Shadow task starts migration to secondary > >> => in xnpod_suspend_thread, nklock is briefly released before > >> xnpod_schedule > > > > Which is the root bug. Blame on me; this recent change in -head breaks a > > basic rule a lot of code is based on: a self-suspending thread may not > > be preempted while scheduling out, i.e. suspension and rescheduling must > > be atomically performed. xnshadow_relax() counts on this too. > > Actually, I think the idea was mine in the first place... Maybe we can > specify a special flag to xnpod_suspend_thread to ask fo the atomic > suspension (maybe reuse XNATOMIC ?). >
I don't think so. We really need the basic assumption to hold in any case, because this is expected by most of the callers, and this micro-optimization is not worth the risk of introducing a race if misused. -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core