[patch 12/24] Immediate Values - Architecture Independent Code
Immediate values are used as read mostly variables that are rarely updated. They use code patching to modify the values inscribed in the instruction stream. It provides a way to save precious cache lines that would otherwise have to be used by these variables. There is a generic _imv_read() version, which uses standard global variables, and optimized per architecture imv_read() implementations, which use a load immediate to remove a data cache hit. When the immediate values functionnality is disabled in the kernel, it falls back to global variables. It adds a new rodata section "__imv" to place the pointers to the enable value. Immediate values activation functions sits in kernel/immediate.c. Immediate values refer to the memory address of a previously declared integer. This integer holds the information about the state of the immediate values associated, and must be accessed through the API found in linux/immediate.h. At module load time, each immediate value is checked to see if it must be enabled. It would be the case if the variable they refer to is exported from another module and already enabled. In the early stages of start_kernel(), the immediate values are updated to reflect the state of the variable they refer to. * Why should this be merged * It improves performances on heavy memory I/O workloads. An interesting result shows the potential this infrastructure has by showing the slowdown a simple system call such as getppid() suffers when it is used under heavy user-space cache trashing: Random walk L1 and L2 trashing surrounding a getppid() call: (note: in this test, do_syscal_trace was taken at each system call, see Documentation/immediate.txt in these patches for details) - No memory pressure : getppid() takes 1573 cycles - With memory pressure : getppid() takes 15589 cycles We therefore have a slowdown of 10 times just to get the kernel variables from memory. Another test on the same architecture (Intel P4) measured the memory latency to be 559 cycles. Therefore, each cache line removed from the hot path would improve the syscall time of 3.5% in these conditions. Changelog: - section __imv is already SHF_ALLOC - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0. So the if (immediateindex) is unnecessary here. - Remove module_mutex usage: depend on functions implemented in module.c for that. - Does not update tainted module's immediate values. - remove imv_*_t types, add DECLARE_IMV() and DEFINE_IMV(). - imv_read() becomes imv_read(var) because of this. - Adding a new EXPORT_IMV_SYMBOL(_GPL). - remove imv_if(). Should use if (unlikely(imv_read(var))) instead. - Wait until we have gcc support before we add the imv_if macro, since its form may have to change. - Dont't declare the __imv section in vmlinux.lds.h, just put the content in the rodata section. - Simplify interface : remove imv_set_early, keep track of kernel boot status internally. - Remove the ALIGN(8) before the __imv section. It is packed now. - Uses an IPI busy-loop on each CPU with interrupts disabled as a simple, architecture agnostic, update mechanism. - Use imv_* instead of immediate_*. Signed-off-by: Mathieu Desnoyers <[EMAIL PROTECTED]> CC: Rusty Russell <[EMAIL PROTECTED]> --- include/asm-generic/vmlinux.lds.h |3 include/linux/immediate.h | 94 +++ include/linux/module.h| 16 +++ init/main.c |8 + kernel/Makefile |1 kernel/immediate.c| 187 ++ kernel/module.c | 50 +- 7 files changed, 358 insertions(+), 1 deletion(-) Index: linux-2.6-lttng/include/linux/immediate.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6-lttng/include/linux/immediate.h 2007-11-28 09:32:04.0 -0500 @@ -0,0 +1,94 @@ +#ifndef _LINUX_IMMEDIATE_H +#define _LINUX_IMMEDIATE_H + +/* + * Immediate values, can be updated at runtime and save cache lines. + * + * (C) Copyright 2007 Mathieu Desnoyers <[EMAIL PROTECTED]> + * + * This file is released under the GPLv2. + * See the file COPYING for more details. + */ + +#ifdef CONFIG_IMMEDIATE + +struct __imv { + unsigned long var; /* Pointer to the identifier variable of the +* immediate value +*/ + unsigned long imv; /* +* Pointer to the memory location of the +* immediate value within the instruction. +*/ + unsigned char size; /* Type size. */ +} __attribute__ ((packed)); + +#include + +/** + * imv_set - set immediate variable (with locking) + * @name: immediate value name + * @i: required value + * + * Sets the value of @name, taking the module_mutex if required by + * the architecture. + */ +#define
[patch 12/24] Immediate Values - Architecture Independent Code
Immediate values are used as read mostly variables that are rarely updated. They use code patching to modify the values inscribed in the instruction stream. It provides a way to save precious cache lines that would otherwise have to be used by these variables. There is a generic _imv_read() version, which uses standard global variables, and optimized per architecture imv_read() implementations, which use a load immediate to remove a data cache hit. When the immediate values functionnality is disabled in the kernel, it falls back to global variables. It adds a new rodata section __imv to place the pointers to the enable value. Immediate values activation functions sits in kernel/immediate.c. Immediate values refer to the memory address of a previously declared integer. This integer holds the information about the state of the immediate values associated, and must be accessed through the API found in linux/immediate.h. At module load time, each immediate value is checked to see if it must be enabled. It would be the case if the variable they refer to is exported from another module and already enabled. In the early stages of start_kernel(), the immediate values are updated to reflect the state of the variable they refer to. * Why should this be merged * It improves performances on heavy memory I/O workloads. An interesting result shows the potential this infrastructure has by showing the slowdown a simple system call such as getppid() suffers when it is used under heavy user-space cache trashing: Random walk L1 and L2 trashing surrounding a getppid() call: (note: in this test, do_syscal_trace was taken at each system call, see Documentation/immediate.txt in these patches for details) - No memory pressure : getppid() takes 1573 cycles - With memory pressure : getppid() takes 15589 cycles We therefore have a slowdown of 10 times just to get the kernel variables from memory. Another test on the same architecture (Intel P4) measured the memory latency to be 559 cycles. Therefore, each cache line removed from the hot path would improve the syscall time of 3.5% in these conditions. Changelog: - section __imv is already SHF_ALLOC - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0. So the if (immediateindex) is unnecessary here. - Remove module_mutex usage: depend on functions implemented in module.c for that. - Does not update tainted module's immediate values. - remove imv_*_t types, add DECLARE_IMV() and DEFINE_IMV(). - imv_read(var) becomes imv_read(var) because of this. - Adding a new EXPORT_IMV_SYMBOL(_GPL). - remove imv_if(). Should use if (unlikely(imv_read(var))) instead. - Wait until we have gcc support before we add the imv_if macro, since its form may have to change. - Dont't declare the __imv section in vmlinux.lds.h, just put the content in the rodata section. - Simplify interface : remove imv_set_early, keep track of kernel boot status internally. - Remove the ALIGN(8) before the __imv section. It is packed now. - Uses an IPI busy-loop on each CPU with interrupts disabled as a simple, architecture agnostic, update mechanism. - Use imv_* instead of immediate_*. Signed-off-by: Mathieu Desnoyers [EMAIL PROTECTED] CC: Rusty Russell [EMAIL PROTECTED] --- include/asm-generic/vmlinux.lds.h |3 include/linux/immediate.h | 94 +++ include/linux/module.h| 16 +++ init/main.c |8 + kernel/Makefile |1 kernel/immediate.c| 187 ++ kernel/module.c | 50 +- 7 files changed, 358 insertions(+), 1 deletion(-) Index: linux-2.6-lttng/include/linux/immediate.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6-lttng/include/linux/immediate.h 2007-11-28 09:32:04.0 -0500 @@ -0,0 +1,94 @@ +#ifndef _LINUX_IMMEDIATE_H +#define _LINUX_IMMEDIATE_H + +/* + * Immediate values, can be updated at runtime and save cache lines. + * + * (C) Copyright 2007 Mathieu Desnoyers [EMAIL PROTECTED] + * + * This file is released under the GPLv2. + * See the file COPYING for more details. + */ + +#ifdef CONFIG_IMMEDIATE + +struct __imv { + unsigned long var; /* Pointer to the identifier variable of the +* immediate value +*/ + unsigned long imv; /* +* Pointer to the memory location of the +* immediate value within the instruction. +*/ + unsigned char size; /* Type size. */ +} __attribute__ ((packed)); + +#include asm/immediate.h + +/** + * imv_set - set immediate variable (with locking) + * @name: immediate value name + * @i: required value + * + * Sets the value of @name, taking the module_mutex if required by + * the architecture. + */