On Mon, Jun 22, 2020 at 9:12 PM <[email protected]> wrote:
> Are SSE instructions allowed in the AMD64 kernel? Is #ifdef __SSE__ > a sufficient guard? > > I have a rasops32 putchar with SSE that is 2x faster. > As Bryan and Patrick noted: it's possible, but there are restrictions and costs. The main restriction is that the code must not permit a context-switch between the fpu_kernel_enter() and fpu_kernel_exit() calls. No taking an rwlock or calling any of the sleep functions, for example. If you're using more than the minimal level of SSE which is already required by the kernel (for lfence, etc) then you should also check whether the necessary extension bits are present in curcpu()->ci_feature_* and fall back to the current code if not present. The cost is that if the thread doing this isn't a system thread, then the first fpu_kernel_enter() call after the userspace->kernel transition has to save and reset the FPU registers (XSAVEOPT + XRSTOR on newish CPUs). Every fpu_kernel_exit(), regardless of thread type, resets them again (XRSTOR). If the restriction isn't a problem and the cost of those is worth the gain, then sure, go for it. We already do it for AES stuff in the kernel, for example. c.f. /usr/src/sys/arch/amd64/amd64/aesni.c Philip Guenther
