someone just dumped another kernel oops on my desk which points to the
xnpipe subsystem: xnpipe_release was called, invoking
__pipe_input_handler ie. the registered input_handler of the native pipe
services. And that handler tried to call an invalid monitor callback
(but no one ever set a monitor handler).
So I looked at how the nucleus deals with input_handler registration,
deregistration, and invocation where it also passes some cookie that
points to the native pipe object here. Looks like that code is racy /wrt
concurrent cleanup of kernel and user side.
What is the intended locking policy when dereferencing the tuple of
xnpipe_state_t.input_handler and xnpipe_state_t.cookie? Sometimes the
handler is called under nklock, sometimes only both values are obtained
and then nklock is dropped before invoking the handler (which is bogus
as cookie may become invalid in the meantime). Even worse,
xnpipe_release does not care at all about locking when calling
input_handler as its last duty - and that's obviously where my customer
just caught an oops.
In other words: Can't we always hold nklock while checking for
input_handler != NULL and then invoking it with the corresponding
cookie? This will just require us to properly clear input_handler on
Siemens AG, Corporate Technology, CT SE 26
Corporate Competence Center Embedded Linux
Xenomai-core mailing list