the watchdog is currently broken in trunk ("zombie [...] would not
die..."). In fact, it should also be broken in older versions, but only
recent thread termination rework made this visible.

When a Xenomai CPU hog is caught by the watchdog, xnpod_delete_thread is
invoked, causing the current thread to be set in zombie state and
scheduled out. But as its Linux mate still exist, hell breaks loose once
Linux tries to get rid of it (the Xenomai zombie is scheduled in again).
In short: calling xnpod_delete_thread(<self>) for a shadow thread is not
working, probably never worked cleanly.

There are basically two approaches to fix it: The first one is to find a
different way to kill (or only suspend?) the current shadow thread when
the watchdog strikes. The second one brought me to another issue: Raise
SIGKILL for the current thread and make sure that it can be processed by
Linux (e.g. via xnpod_suspend_thread(<cpu-hog>). Unfortunately, there is
no way to force a shadow thread into secondary mode to handle pending
Linux signals unless that thread issues a syscall once in a while. And
that raises the question if we shouldn't improve this as well while we
are on it.

Granted, non-broken Xenomai user space threads always issue frequent
syscalls, otherwise the system would starve (and the watchdog would come
around). On the other hand, delaying signals till syscall prologues is
different from plain Linux behaviour...

Comments, ideas?


Attachment: signature.asc
Description: OpenPGP digital signature

Xenomai-core mailing list

Reply via email to