Jan Kiszka wrote: > Philippe Gerum wrote: >> Philippe Gerum wrote: >>> Gilles Chanteperdrix wrote: >>> >>>> Jan Kiszka wrote: >>>> > Hi, >>>> > > my colleagues and I need some hint where to continue our search >>>> for the >>>> > cause of a weird cleanup issue: >>>> > > An application of our robotics framework sometimes terminates >>>> (though >>>> > successfully) in a way that the system timer IRQ no longer arrives >>>> > afterwards or no re-program takes place anymore. All other Linux IRQs >>>> > are fine (Ethernet, keyboard, etc.). I cannot provide an easy test >>>> case >>>> > yet as besides the framework some expensive gyroscope and the 16550A >>>> > driver are involved. >>>> >>>> I observed a similar issue when xnpod_stop_timer was called when >>>> shutting down the posix skin. I assumed that the problem was that >>>> xnpod_shutdown already called xnpod_stop_timer, so xnpod_stop_timer (and >>>> in particular xnarch_stop_timer) ended up being called twice. >>>> >>> Err, sorry. Forget about my previous reply: xnarch_stop_timer is _not_ >>> protected by the XNTIMED flag, but only the last part of the >>> housekeeping chores performed upon stopping the systimer are. IOW, >>> this is a latent bug, and xnpod_stop_timer should be fixed. >>> >> Commit 884 should do that. >> > > Sorry for replying late: nope, this has no influence on our issue. > > Well, someone put that damn piece of hardware on my desk, saying: "It > doesn't work." What he did not say is that there are multiple issues > contained :-/. I found and fixed (patch will follow) a severe bug in the > 16550A driver, but the strange timer issue stays (though it's still > tricky to reproduce). > > The point is - and that's likely why your patch doesn't help - that we > do not stop the system timer, i.e. unload all skins. We just terminate > an application. I did some research but failed to find a test case (only > our software "manages" to trigger this). Actually, it seems the hardware > timer is no longer working, because also other RT-tasks no longer time > out. Moreover, I checked nkpod->htimer.status, but it remains 0 all the > time. I need more time... >
Attached is an ipipe-freeze of the frozen system. It's taken at the time the main thread of the terminating application has successfully rt_task_join'ed the last remaining RT-thread. I took 2000 trace points before and after that point and additionally instrumented rthal_timer_program_shot() (special trace 0x01, the argument is the delay). The interesting stuff happens around 600 us after the freeze: it seems the scheduled Linux timer arrives then but doesn't get much attention beyond from ipipe. Any idea what to look for next? I have a "perfect" test system now, though I still see no light at the end of the tunnel how to export it to other boxes. Enough for today. Jan PS: This trace was taken over 2.6.15 to exclude any issues with the new 2.6.16. Both kernels show the same effect.
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomaiemail@example.com https://mail.gna.org/listinfo/xenomai-core