Philippe Gerum wrote:
> On Thu, 2009-05-14 at 14:52 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2009-05-14 at 12:20 +0200, Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> On Wed, 2009-05-13 at 18:10 +0200, Jan Kiszka wrote:
>>>>>> Philippe Gerum wrote:
>>>>>>> On Wed, 2009-05-13 at 17:28 +0200, Jan Kiszka wrote:
>>>>>>>> Philippe Gerum wrote:
>>>>>>>>> On Wed, 2009-05-13 at 15:18 +0200, Jan Kiszka wrote:
>>>>>>>>>> Gilles Chanteperdrix wrote:
>>>>>>>>>>> Jan Kiszka wrote:
>>>>>>>>>>>> Hi Gilles,
>>>>>>>>>>>> I'm currently facing a nasty effect with switchtest over latest 
>>>>>>>>>>>> git head
>>>>>>>>>>>> (only tested this so far): running it inside my test VM (ie. with
>>>>>>>>>>>> frequent excessive latencies) I get a stalled Linux timer IRQ quite
>>>>>>>>>>>> quickly. System is otherwise still responsive, Xenomai timers are 
>>>>>>>>>>>> still
>>>>>>>>>>>> being delivered, other Linux IRQs too. switchtest complained about
>>>>>>>>>>>>     "Warning: Linux is compiled to use FPU in kernel-space."
>>>>>>>>>>>> when it was started. Kernels are and
>>>>>>>>>>>> (LTTng patched in, but unused), both 
>>>>>>>>>>>> show the
>>>>>>>>>>>> same effect.
>>>>>>>>>>>> Seen this before?
>>>>>>>>>>> The warning about Linux being compiled to use FPU in kernel-space 
>>>>>>>>>>> means
>>>>>>>>>>> that you enabled soft RAID or compiled for K7, Geode, or any other
>>>>>>>>>> RAID is on (ordinary server config).
>>>>>>>>>>> configuration using 3DNow for such simple operations as memcpy. It 
>>>>>>>>>>> is
>>>>>>>>>>> harmless, it simply means that switchtest can not use fpu in 
>>>>>>>>>>> kernel-space.
>>>>>>>>>>> The bug you have is probably the same as the one described here, 
>>>>>>>>>>> which I
>>>>>>>>>>> am able to reproduce on my atom:
>>>>>>>>>>> Unfortunately, I for one am working on ARM issues and am not 
>>>>>>>>>>> available
>>>>>>>>>>> to debug x86 issues. I think Philippe is busy too...
>>>>>>>>>> OK, looks like I got the same flu here.
>>>>>>>>>> Philippe, did you find out any more details in the meantime? Then I'm
>>>>>>>>>> afraid I have to pick this up.
>>>>>>>>> No, I did not resume this task yet. Working from the powerpc side of 
>>>>>>>>> the
>>>>>>>>> universe here.
>>>>>>>> Hoho, don't think this rain here over x86 would have never made it down
>>>>>>>> to ARM or PPC land! ;)
>>>>>>>> Martin, could you check if this helps you, too?
>>>>>>>> Jan
>>>>>>>> (as usual, ready to be pulled from 'for-upstream')
>>>>>>>> --------->
>>>>>>>> Host IRQs may not only be triggered from non-root domains.
>>>>>>> Are you sure of this? I can't find any spot where this assumption would
>>>>>>> be wrong. host_pend() is basically there to relay RT timer ticks and
>>>>>>> device IRQs, and this only happens on behalf of the pipeline head. At
>>>>>>> least, this is how rthal_irq_host_pend() should be used in any case. If
>>>>>>> you did find a spot where this interface is being called from the lower
>>>>>>> stage, then this is the root bug to fix.
>>>>>> I haven't studied the I-pipe trace /wrt this in details yet, but I could
>>>>>> imagine that some shadow task is interrupted in primary mode by the
>>>>>> timer IRQ and then leaves the handler in secondary mode due to whatever
>>>>>> events between schedule-out and in at the end of xnintr_clock_handler.
>>>>> You need a thread context to move to secondary, I just can't see how
>>>>> such scenario would be possible.
>>>> Here is the trace of events:
>>>> => Shadow task starts migration to secondary
>>>> => in xnpod_suspend_thread, nklock is briefly released before
>>>>    xnpod_schedule
>>> Which is the root bug. Blame on me; this recent change in -head breaks a
>>> basic rule a lot of code is based on: a self-suspending thread may not
>>> be preempted while scheduling out, i.e. suspension and rescheduling must
>>> be atomically performed. xnshadow_relax() counts on this too.
>> Actually, I think the idea was mine in the first place... Maybe we can
>> specify a special flag to xnpod_suspend_thread to ask fo the atomic
>> suspension (maybe reuse XNATOMIC ?).
> I don't think so. We really need the basic assumption to hold in any
> case, because this is expected by most of the callers, and this
> micro-optimization is not worth the risk of introducing a race if
> misused.

Well, I tend to disagree. The assumption that the thread is suspended
from the point of view of the scheduler still holds even when the nklock
is released, and it is what callers like rt_cond_wait are expecting. The
assumptions of xnshadow_relax do not seem to me like a common assumption.

To go further, maybe forwarding the tick in xnintr_clock_handler
epilogue is wrong in the first place, precisely because the current
thread where xnintr_clock_handler happens may be scheduled out for a
long time by the clock handler itself.


Xenomai-core mailing list

Reply via email to