I was reminded while reading the 'options LUA' discussion about a feature that I wished our kernel had. I'll sketch it out here and hope that it's interesting enough to someone who has enough spare time that they can go and program it. :-)
It would be Nice(TM) if the kernel linker understood weak/strong aliases in the kernel and in kernel modules so that using weak aliases one could provide a stub implementation for an optional subsystem in the kernel, and using strong aliases a loadable module could provide a full-fledged implementation. Taking bpf(4) as an example, the kernel could provide weak aliases to stub routines for, e.g., bpf_attach(), bpf_mtap(), etc: __weak_alias(bpf_attach, voidop); __weak_alias(bpf_mtap, voidop); . . . and the BPF kernel module could provide strong aliases to the actual implementation: __strong_alias(bpf_attach, bpf_attach_impl); __strong_alias(bpf_mtap, bpf_mtap_impl); . . . There are a couple of reasons, I think, to prefer this to the scheme that bpf(4) uses now to provide a stub implementation and a modular "real" implementation. One, using the aliases scheme lets the kernel patch in direct calls, so we could avoid indirect calls through the bpf_ops vector. Also, it's not necessary to create an operations vector like bpf_ops for every module that we want to provide stub/real implementations for. A rough idea for how to implement this in the kernel linker is this: when the kernel linker finds a strong alias bpf_attach -> bpf_attach_impl in a kernel module that overrides an existing weak alias bpf_attach -> voidop, it can "push" the old alias onto a stack corresponding to the symbol 'bpf_attach', push(aliases['bpf_attach'], voidop). When it unloads the kernel module, it re-assigns the alias: bpf_attach -> pop(aliases['bpf_attach']). Let's say for now that the height of this stack is just 1 or 0. Of course, you don't want to unload a kernel module while the kernel is in it. That is, you don't want for the text of, say, bpf_attach() to go away while the kernel cv_wait()s inside it. I believe you can handle that using the entrance/exit-counting scheme for softc's that I've described earlier (Subject: kicking everybody out of the softc) in conjunction with a new modcmd, MODULE_CMD_CATCH, that tells a module to change its behavior while the kernel unloads it. Roughly, unloading a kernel module would go something like this: 1 Prepare the module to catch new threads as they try to the enter the module, modcmd(MODULE_CMD_CATCH). Preparation may entail creating a mutex/condvar pair. Threads that subsequently enter the module may have to acquire the mutex on the way in, signal the condvar and release the mutex on the way out. 2 Re-link the kernel's stub implementations (e.g., bpf_attach -> voidop). In this way, no more threads may enter the module, so we can hope for the next step to finish. 3 Module-specific cleanup, modcmd(MODULE_CMD_FINI), may acquire a mutex installed in step 1, and wait for every thread to quit the module---i.e., entrance count equals exit count---using a condvar installed in step 2. Sometimes this step may fail. Putting things back the way they were should be possible, but it could be tricky. 4 Finish unlinking the module. Reclaim the module's text/data memory. Taking this a step further, suppose we want to layer one implementation on another. I.e., some module is stubbed out in the kernel. We load a module that provides implementation A. Then we load another module providing implementation B that refines implementation A. Or vice versa: we load implementation B, first, implementation A second, and A refines B. I've been contemplating this in the context of bus_space(9): one module may provide some debug instrumentation such as an mmap(2)-able ring buffer of bus_space(9) access records looking sort of like [I/O read | I/O write | memory read | ..., address, width, value]. A second module may provide advanced I/O exception handling. And a third module may re-order or delay reads and writes between bus barriers in order to simulate important corner-cases of bus operation. Any module may refine either the behavior of the previously-loaded modules or the behavior of the default implementation. For example, let's consider modules that override bus_space_read_4(). Say the default implementation is in _bus_space_read_4: __weak_alias(bus_space_read_4, _bus_space_read_4) The module with debug instrumentation, bus_space_debug.kmod, has a weak alias, bus_space_read_4, for its implementation called debug_bus_space_read_4, __weak_alias(bus_space_read_4, debug_bus_space_read_4) It also reserves a private symbol for calling the implementation that it overrides. Call that symbol super_bus_space_read_4. The module with exception handling, bus_space_xh.kmod, has xh_bus_space_read_4, __weak_alias(bus_space_read_4, xh_bus_space_read_4) and likewise reserves a private symbol for calling the implementation it overrode, also called super_bus_space_read_4. If we load the modules bus_space_debug.kmod and bus_space_xh.kmod in that order, then a call to bus_space_read_4 gets the xh_bus_space_read_4 implementation, which does its work and calls (through its symbol super_bus_space_read_4) debug_bus_space_read_4, which does its work and calls (through its super_bus_space_read_4) the default implementation, _bus_space_read_4. I think that to implement loading/unloading modules that refine each other in this way, you could also use the aliases[symbol] stacks, but they would grow taller than 0 or 1 items. It is strange to use a weak alias to override a weak alias (why should a loadable module's weak alias override the kernel's weak alias?); it may be necessary to have a new kind of alias or else some meta-information about each alias so that there is no ambiguity about what the kernel linker should do. Dave -- David Young [email protected] Urbana, IL (217) 721-9981
