Jan Kiszka wrote:
> Philippe Gerum wrote:
>> Philippe Gerum wrote:
>>> Gilles Chanteperdrix wrote:
>>>
>>>> Jan Kiszka wrote:
>>>>  > Hi,
>>>>  >  > my colleagues and I need some hint where to continue our search
>>>> for the
>>>>  > cause of a weird cleanup issue:
>>>>  >  > An application of our robotics framework sometimes terminates
>>>> (though
>>>>  > successfully) in a way that the system timer IRQ no longer arrives
>>>>  > afterwards or no re-program takes place anymore. All other Linux IRQs
>>>>  > are fine (Ethernet, keyboard, etc.). I cannot provide an easy test
>>>> case
>>>>  > yet as besides the framework some expensive gyroscope and the 16550A
>>>>  > driver are involved.
>>>>
>>>> I observed a similar issue when xnpod_stop_timer was called when
>>>> shutting down the posix skin. I assumed that the problem was that
>>>> xnpod_shutdown already called xnpod_stop_timer, so xnpod_stop_timer (and
>>>> in particular xnarch_stop_timer) ended up being called twice.
>>>>
>>> Err, sorry. Forget about my previous reply: xnarch_stop_timer is _not_
>>> protected by the XNTIMED flag, but only the last part of the
>>> housekeeping chores performed upon stopping the systimer are. IOW,
>>> this is a latent bug, and xnpod_stop_timer should be fixed.
>>>
>> Commit 884 should do that.
>>
> 
> Sorry for replying late: nope, this has no influence on our issue.
> 
> Well, someone put that damn piece of hardware on my desk, saying: "It
> doesn't work." What he did not say is that there are multiple issues
> contained :-/. I found and fixed (patch will follow) a severe bug in the
> 16550A driver, but the strange timer issue stays (though it's still
> tricky to reproduce).
> 
> The point is - and that's likely why your patch doesn't help - that we
> do not stop the system timer, i.e. unload all skins. We just terminate
> an application. I did some research but failed to find a test case (only
> our software "manages" to trigger this). Actually, it seems the hardware
> timer is no longer working, because also other RT-tasks no longer time
> out. Moreover, I checked nkpod->htimer.status, but it remains 0 all the
> time. I need more time...
> 

Attached is an ipipe-freeze of the frozen system. It's taken at the time
the main thread of the terminating application has successfully
rt_task_join'ed the last remaining RT-thread. I took 2000 trace points
before and after that point and additionally instrumented
rthal_timer_program_shot() (special trace 0x01, the argument is the
delay). The interesting stuff happens around 600 us after the freeze: it
seems the scheduled Linux timer arrives then but doesn't get much
attention beyond from ipipe.

Any idea what to look for next? I have a "perfect" test system now,
though I still see no light at the end of the tunnel how to export it to
other boxes.

Enough for today.

Jan


PS: This trace was taken over 2.6.15 to exclude any issues with the new
2.6.16. Both kernels show the same effect.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to