Hey folks, I sometimes (like once every other month) see a kernel crash on sparc64 that looks like the kernel jumped through a bogus function pointer. At the time of the crash, the stack seems to be corrupted, at least ddb backtrace does not work. Register window information seems to point at a call stack from the vnode cleaner thread via VCALL() -> VOCALL and the issue might be anything from a compiler bug to random memory corruption or just some use-after-free vnode/filesystem bug.
To catch it earlier (and have better ddb data available) I would like to avoid the deadly jump and have the VOCALL macro verify integrity of the ops vector. I thought about a MD macro to KASSERT general function pointers (like on sparc* check for % 4 == 0, test for >= kernel_start and maybe <= the end of the module map region), and for DEBUG kernels that makro could even do a pmap_extract(). Additionally (and maybe the easiest start) we could add some magic pointer value to the start (and end?) of the ops vector and have VOCALL verify that magic value. Is that a sane idea? Could we easily identify all ops vectors? Martin P.S.: in sys/sys/vnode.h, the comments about VOCALL not being used seem to be exagerated, or I misread them.
