Re: [PATCH 10/11] x86/xip: resolve alternative instructions at build
On Mon, Mar 23, 2015 at 09:33:02AM +0100, Borislav Petkov wrote: > On Mon, Mar 23, 2015 at 12:46:39AM -0700, Jim Kukunas wrote: > > Since the .text section can't be updated at run-time, remove the > > .alternatives sections and update the .text at build time. To pick the > > proper instructions, Kconfig options are exposed for each X86_FEATURE > > that needed to be resolved. Each X86_FEATURE gets a corresponding > > CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option > > is set, a resolver macro is setup to either generate that instruction, > > or the fallback. The resolver needs to be defined for each FEATURE, and > > the proper one is chosen via preprocessor string pasting. > > > > This approach is horrific and ugly. > > You said it. > > And with XIP enabled - whatever that means, your announce message could > explain a lot more and more verbosely what this whole patchset is all > about - this kernel is not going to be generic anymore but it will be > destined only for the machine it is being built for, correct? Please see my response to Ingo for more information about the patchset. Yes, regardless of how it's implemented, selecting alternatives at build time will produce a non-generic kernel (unless all alternative instructions are disabled and just the fallbacks are used). > If that is so, how are distros supposed to ship one kernel with XIP or > is this some obscure feature distros won't have to enable anyway? XIP isn't a general feature that distros are going to be enabling. It's designed for a very specific usage where people are building very custom kernels. > Concerning this particular patch, I'd suggest a switch which simply > disables alternatives patching at run time so that you don't add this > ifdeffery to alternative.c I'll look into this. > Btw, why do you even have to disable the alternatives? I see this in > your patch 1/11: > > + location in RAM. As a result, when the kernel is located in storage > + that is addressable by the CPU, the kernel text and read-only data > + segments are never loaded into memory, thereby using less RAM. > > is this it? To save some memory? Probably embedded, maybe some light > bulb running linux... Yuck. Alternatives are disabled because the kernel text will be read-only. For example, consider the kernel image being stored in and executing from ROM. Cutting the kernel text and read-only data out of RAM really helps Linux scale down to smaller systems. So yes, it's for embedded. Linux has a lot to offer in that area. Thanks. -- Jim Kukunas Intel Open Source Technology Center pgpnw8LNDbWIa.pgp Description: PGP signature
Re: [RFC] x86 XIP
On Mon, Mar 23, 2015 at 09:07:14AM +0100, Ingo Molnar wrote: > * Jim Kukunas wrote: > > > > > Hi Folks, > > > > This patchset introduces eXecute-In-Place (XIP) support for x86. > > [...] > > So we'd need a lot better high level description than this: In future patch revisions, I'll update my coverletter to include the information below. > - a bit of background description: what are the advantages of having >the kernel image in non-RAM (flash), etc. Currently for tiny memory-constrained embedded systems, the kernel configuration is usually stripped down in order to reduce the kernel's RAM footprint, freeing up more precious memory for user space and allowing the kernel to fit into smaller systems. With XIP, the kernel's text and read-only data sections are never loaded into RAM, thereby reducing the kernel's memory usage. Also, since a significant portion of the kernel is never loaded into RAM, a larger kernel configuration can be used without bloating memory usage. I haven't done any performance analysis yet, but it's probably safe to say that executing from storage will negatively affect performance. > - on what hardware/bootloaders is or will be XIP supported? With regards to supported hardware, these patches aren't targeting any specific platform. As mentioned in the coverletter, there are current limits on the supported configurations (32-bit only, no SMP, no PAE), but these are not technical limits ... I just need to implement support for them. With regards to supported bootloaders, I've been testing with a small bootloader that I wrote specifically for XIP. Which other bootloaders I add support to will depend on the feedback/requests that I get. > Also, there should probably be some fail-safe mechanism included: such > as to check whether caching attributes (MTRRs, PAT) are properly set > for the XIP area (at minimum to not be uncacheable). Good idea. I'll add that into the next revision. Thanks. -- Jim Kukunas Intel Open Source Technology Center pgp9Hs_FSZajB.pgp Description: PGP signature
Re: [RFC] x86 XIP
On Mon, Mar 23, 2015 at 09:07:14AM +0100, Ingo Molnar wrote: * Jim Kukunas james.t.kuku...@linux.intel.com wrote: Hi Folks, This patchset introduces eXecute-In-Place (XIP) support for x86. [...] So we'd need a lot better high level description than this: In future patch revisions, I'll update my coverletter to include the information below. - a bit of background description: what are the advantages of having the kernel image in non-RAM (flash), etc. Currently for tiny memory-constrained embedded systems, the kernel configuration is usually stripped down in order to reduce the kernel's RAM footprint, freeing up more precious memory for user space and allowing the kernel to fit into smaller systems. With XIP, the kernel's text and read-only data sections are never loaded into RAM, thereby reducing the kernel's memory usage. Also, since a significant portion of the kernel is never loaded into RAM, a larger kernel configuration can be used without bloating memory usage. I haven't done any performance analysis yet, but it's probably safe to say that executing from storage will negatively affect performance. - on what hardware/bootloaders is or will be XIP supported? With regards to supported hardware, these patches aren't targeting any specific platform. As mentioned in the coverletter, there are current limits on the supported configurations (32-bit only, no SMP, no PAE), but these are not technical limits ... I just need to implement support for them. With regards to supported bootloaders, I've been testing with a small bootloader that I wrote specifically for XIP. Which other bootloaders I add support to will depend on the feedback/requests that I get. Also, there should probably be some fail-safe mechanism included: such as to check whether caching attributes (MTRRs, PAT) are properly set for the XIP area (at minimum to not be uncacheable). Good idea. I'll add that into the next revision. Thanks. -- Jim Kukunas Intel Open Source Technology Center pgp9Hs_FSZajB.pgp Description: PGP signature
Re: [PATCH 10/11] x86/xip: resolve alternative instructions at build
On Mon, Mar 23, 2015 at 09:33:02AM +0100, Borislav Petkov wrote: On Mon, Mar 23, 2015 at 12:46:39AM -0700, Jim Kukunas wrote: Since the .text section can't be updated at run-time, remove the .alternatives sections and update the .text at build time. To pick the proper instructions, Kconfig options are exposed for each X86_FEATURE that needed to be resolved. Each X86_FEATURE gets a corresponding CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option is set, a resolver macro is setup to either generate that instruction, or the fallback. The resolver needs to be defined for each FEATURE, and the proper one is chosen via preprocessor string pasting. This approach is horrific and ugly. You said it. And with XIP enabled - whatever that means, your announce message could explain a lot more and more verbosely what this whole patchset is all about - this kernel is not going to be generic anymore but it will be destined only for the machine it is being built for, correct? Please see my response to Ingo for more information about the patchset. Yes, regardless of how it's implemented, selecting alternatives at build time will produce a non-generic kernel (unless all alternative instructions are disabled and just the fallbacks are used). If that is so, how are distros supposed to ship one kernel with XIP or is this some obscure feature distros won't have to enable anyway? XIP isn't a general feature that distros are going to be enabling. It's designed for a very specific usage where people are building very custom kernels. Concerning this particular patch, I'd suggest a switch which simply disables alternatives patching at run time so that you don't add this ifdeffery to alternative.c I'll look into this. Btw, why do you even have to disable the alternatives? I see this in your patch 1/11: + location in RAM. As a result, when the kernel is located in storage + that is addressable by the CPU, the kernel text and read-only data + segments are never loaded into memory, thereby using less RAM. is this it? To save some memory? Probably embedded, maybe some light bulb running linux... Yuck. Alternatives are disabled because the kernel text will be read-only. For example, consider the kernel image being stored in and executing from ROM. Cutting the kernel text and read-only data out of RAM really helps Linux scale down to smaller systems. So yes, it's for embedded. Linux has a lot to offer in that area. Thanks. -- Jim Kukunas Intel Open Source Technology Center pgpnw8LNDbWIa.pgp Description: PGP signature
[RFC] x86 XIP
Hi Folks, This patchset introduces eXecute-In-Place (XIP) support for x86. Right now only minimal configurations are supported (32-bit only, no SMP, no PAE, and so on). My goal is to increase the number of supported configurations in the future based on what functionality is requested. This patchset only supports storage configurations where the kernel text and read-only data will always be readable. I didn't create a special Makefile target for building xip images, like how ARM has xipImage. Instead, I'm just using the basic vmlinux ELF executable. The kernel must be built with CONFIG_XIP_BASE set to the physical address of the vmlinux file. Additionally, since the .text section is read-only, all of the alternative instructions need to be resolved at build-time. To accomplish this, the cpu features to enable are selected through a series of Kconfig options. In order to boot, the bootloader just needs to fill out the zero page (whose address startup_32() expects in esi), switch to 32-bit protected mode and then jump into startup_32(), which will be at CONFIG_XIP_BASE plus one page. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/11] x86/xip: copy writable sections into RAM
Loads all writable and non-zero sections into their VMA. Signed-off-by: Jim Kukunas --- arch/x86/include/asm/sections.h | 4 arch/x86/kernel/head_32.S | 22 ++ arch/x86/kernel/vmlinux.lds.S | 4 3 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h index 0a52424..9535e95 100644 --- a/arch/x86/include/asm/sections.h +++ b/arch/x86/include/asm/sections.h @@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[]; extern char __end_rodata_hpage_align[]; #endif +#ifdef CONFIG_XIP_KERNEL +extern char phys_sdata[]; +#endif + #endif /* _ASM_X86_SECTIONS_H */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index f36bd42..80f344a 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -86,6 +86,28 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE) */ __HEAD ENTRY(startup_32) + +#ifdef CONFIG_XIP_KERNEL + /* +* Copy writable sections into RAM +*/ + + movl %esi, %ebp # Preserve pointer to zero-page + + leal pa(_sdata), %edi + leal phys_edata, %ecx + leal phys_sdata, %esi + subl %esi, %ecx + + cld +xip_data_cp: + lodsb + stosb + loop xip_data_cp + + movl %ebp, %esi +#endif + movl pa(stack_start),%ecx /* test KEEP_SEGMENTS flag to see if the bootloader is asking diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 414a1ac..59a9edb 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -133,6 +133,9 @@ SECTIONS RO_DATA(PAGE_SIZE) X64_ALIGN_DEBUG_RODATA_END + phys_sdata = LOADADDR(.data); + phys_edata = phys_sdata + (_end_nonzero - _sdata); + /* Data */ .data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN { /* Start of data section */ @@ -319,6 +322,7 @@ SECTIONS NOSAVE_DATA } #endif + _end_nonzero = .; /* BSS */ . = ALIGN(PAGE_SIZE); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/11] x86/xip: add XIP_KERNEL and XIP_BASE options
The CONFIG_XIP_KERNEL Kconfig option enables eXecute-In-Place (XIP) support. When XIP_KERNEL is set, XIP_BASE points to the physical address of the vmlinux ELF file. Signed-off-by: Jim Kukunas --- arch/x86/Kconfig | 19 +++ 1 file changed, 19 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b7d31ca..f5fa02c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -294,6 +294,25 @@ config ZONE_DMA If unsure, say Y. +config XIP_KERNEL + bool "eXecute-In-Place (XIP) support" if (X86_32 && EXPERT && EMBEDDED) + depends on !MODULES && !X86_PAE && !SMP + default n + help + With this option enabled, the text and any read-only segments of + the kernel are not copied from their initial location to their usual + location in RAM. As a result, when the kernel is located in storage + that is addressable by the CPU, the kernel text and read-only data + segments are never loaded into memory, thereby using less RAM. + + Only enable this option if you know what you're doing. + +config XIP_BASE + hex "Physical address of XIP kernel" if XIP_KERNEL + default "0xFF80" + help + The physical address for the beginning of the vmlinux file. + config SMP bool "Symmetric multi-processing support" ---help--- -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/11] x86/xip: after paging trampoline, discard PMDs above _brk
In the likely case that XIP_BASE is above PAGE_OFFSET, we want to discard any early identity mappings. So rather than keeping every PMD above PAGE_OFFSET, only copy the ones from PAGE_OFFSET to the last PMD of _end. At this point, the linear address space should look normal. Signed-off-by: Jim Kukunas --- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/kernel/setup.c| 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a0c35bf..5eaba7d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -660,6 +660,13 @@ static inline int pgd_none(pgd_t pgd) #define KERNEL_PGD_BOUNDARYpgd_index(PAGE_OFFSET) #define KERNEL_PGD_PTRS(PTRS_PER_PGD - KERNEL_PGD_BOUNDARY) +#ifdef CONFIG_XIP_KERNEL +#define BOOT_PGD_COPY_PTRS \ + ((pgd_index((unsigned long)_end) - pgd_index(PAGE_OFFSET)) + 4) +#else +#define BOOT_PGD_COPY_PTRS KERNEL_PGD_PTRS +#endif + #ifndef __ASSEMBLY__ extern int direct_gbpages; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index e2d85c4..d276ebf 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -894,7 +894,7 @@ void __init setup_arch(char **cmdline_p) */ clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY, initial_page_table + KERNEL_PGD_BOUNDARY, - KERNEL_PGD_PTRS); + BOOT_PGD_COPY_PTRS); load_cr3(swapper_pg_dir); /* -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/11] x86/xip: make e820_add_kernel_range() a NOP
e820_add_kernel_range() checks whether the kernel text is present in the e820 map, and marked as usable RAM. If not, it modifies the e820 map accordingly. For XIP, that is unnecessary since the kernel text won't be loaded in RAM. Signed-off-by: Jim Kukunas --- arch/x86/kernel/setup.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d276ebf..74fc6c8 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -787,6 +787,7 @@ static void __init trim_bios_range(void) } /* called before trim_bios_range() to spare extra sanitize */ +#ifndef CONFIG_XIP_KERNEL static void __init e820_add_kernel_range(void) { u64 start = __pa_symbol(_text); @@ -806,6 +807,11 @@ static void __init e820_add_kernel_range(void) e820_remove_range(start, size, E820_RAM, 0); e820_add_region(start, size, E820_RAM); } +#else +static void __init e820_add_kernel_range(void) +{ +} +#endif static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/11] x86/xip: in setup_arch(), handle resource physical addr
set code_resources to proper physical addr in setup_arch() Signed-off-by: Jim Kukunas --- arch/x86/kernel/setup.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 74fc6c8..f044453 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -986,9 +986,16 @@ void __init setup_arch(char **cmdline_p) mpx_mm_init(_mm); +#ifndef CONFIG_XIP_KERNEL code_resource.start = __pa_symbol(_text); code_resource.end = __pa_symbol(_etext)-1; + data_resource.start = _pa(_sdata)-1; +#else + code_resource.start = CONFIG_XIP_BASE; + code_resource.end = (phys_addr_t)phys_sdata-1; data_resource.start = __pa_symbol(_etext); +#endif + data_resource.end = __pa_symbol(_edata)-1; bss_resource.start = __pa_symbol(__bss_start); bss_resource.end = __pa_symbol(__bss_stop)-1; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/11] x86/xip: snip the kernel text out of the memory mapping
If the kernel tries to create an identity region for a memory range that spans the kernel text, split it into two pieces, skipping the text section. Otherwise, this will setup the standard text mapping, which will point to the normal RAM location for text instead of the XIP_BASE location. Signed-off-by: Jim Kukunas --- arch/x86/mm/init.c | 78 ++ 1 file changed, 78 insertions(+) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index a110efc..07b20c6 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -391,6 +391,82 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn) return false; } +#ifdef CONFIG_XIP_KERNEL +/* + * Cut the .text virtual address out of mem range b/c the mapping + * is already correctly setup + */ +static inline void snip_xip_text(struct map_range *mr, int *nr_range) +{ + int i; + + for (i = 0; i < *nr_range; i++) { + long diff; + + if (mr[i].start <= CONFIG_PHYSICAL_START && + mr[i].end <= CONFIG_PHYSICAL_START) + continue; + if (mr[i].start >= __pa_symbol(_sdata)) + continue; + + diff = mr[i].start - CONFIG_PHYSICAL_START; + if (diff < 0) { /* range starts below .text and includes it */ + diff = mr[i].end - __pa_symbol(_sdata); + + /* shorten segment so it ends just before .text */ + mr[i].end = CONFIG_PHYSICAL_START; + + /* if segment goes past .text, add 2nd segment*/ + if (diff > 0) { + /* move next section down 1 */ + if (i + 1 < *nr_range) { + memmove([i + 1], [i + 2], + sizeof(struct map_range[ + *nr_range - i - 2])); + } + mr[i + 1].start = __pa_symbol(_sdata); + mr[i + 1].end = mr[i + 1].start + diff; + mr[i + 1].page_size_mask = 0; + *nr_range = *nr_range + 1; + i++; + } + } else if (diff == 0) { + diff = mr[i].end - __pa_symbol(_sdata); + if (diff > 0) { + mr[i].start = __pa_symbol(_sdata); + mr[i].end = mr[i].start + diff; + mr[i].page_size_mask = 0; + } else { + /* delete this range */ + memmove([i + 1], [i], sizeof( + struct map_range[*nr_range - i - 1])); + *nr_range = *nr_range - 1; + i--; + } + } else if (diff > 0) { + long ediff = mr[i].end - __pa_symbol(_sdata); + + if (ediff > 0) { + mr[i].start = __pa_symbol(_sdata); + mr[i].end = mr[i].start + ediff; + mr[i].page_size_mask = 0; + } else { + /* delete this range */ + memmove([i + 1], [i], sizeof( + struct map_range[*nr_range - i - 1])); + *nr_range = *nr_range - 1; + i--; + } + } + break; + } +} +#else +static inline void snip_xip_text(struct map_range *mr, int *mr_range) +{ +} +#endif + /* * Setup the direct mapping of the physical memory at PAGE_OFFSET. * This runs before bootmem is initialized and gets pages directly from @@ -409,6 +485,8 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, memset(mr, 0, sizeof(mr)); nr_range = split_mem_range(mr, 0, start, end); + snip_xip_text(mr, _range); + for (i = 0; i < nr_range; i++) ret = kernel_physical_mapping_init(mr[i].start, mr[i].end, mr[i].page_size_mask); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/11] x86/xip: XIP boot trampoline page tables
Constructs the trampoline page tables for early XIP boot. Signed-off-by: Jim Kukunas --- arch/x86/kernel/head_32.S | 85 +++ 1 file changed, 85 insertions(+) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 80f344a..642d73b 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -227,6 +227,90 @@ xip_data_cp: movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ +#ifdef CONFIG_XIP_KERNEL + movl $pa(__brk_base), %edi + movl $pa(initial_page_table), %edx + + movl $PTE_IDENT_ATTR, %eax /* EAX holds identity mapping addr */ + movl $__PAGE_OFFSET + PTE_IDENT_ATTR, %ebx /* EBX holds kernel addr */ + +.Lxip_mapping: +/* Allocate or Load Identity PDE */ + leal -PTE_IDENT_ATTR(%eax), %ebp + andl $0xFFC0, %ebp + shrl $20, %ebp + movl (%edx, %ebp), %ecx + + test %ecx, %ecx + jnz .Lskip_ident_pde_alloc + leal PDE_IDENT_ATTR(%edi), %ecx + addl $4096, %edi + movl %ecx, (%edx, %ebp) + +.Lskip_ident_pde_alloc: + leal -PDE_IDENT_ATTR(%ecx), %ecx + leal -PTE_IDENT_ATTR(%eax), %ebp + andl $0x3FF000, %ebp + shrl $10, %ebp + movl %eax, (%ecx, %ebp) + +/* Allocate or Load PAGE_OFFSET PDE */ + leal -PTE_IDENT_ATTR(%ebx), %ebp + andl $0xFFC0, %ebp + shrl $20, %ebp + movl (%edx, %ebp), %ecx + + test %ecx, %ecx + jnz .Lskip_offset_pde_alloc + leal PDE_IDENT_ATTR(%edi), %ecx + addl $4096, %edi + movl %ecx, (%edx, %ebp) + +.Lskip_offset_pde_alloc: + leal -PDE_IDENT_ATTR(%ecx), %ecx + leal -PTE_IDENT_ATTR(%ebx), %ebp + andl $0x3FF000, %ebp + shrl $10, %ebp + movl %eax, (%ecx, %ebp) + + addl $4096, %eax + addl $4096, %ebx + + cmpl $CONFIG_PHYSICAL_START + PTE_IDENT_ATTR, %eax + je .Lsetup_text_addr + + cmpl $phys_sdata + PTE_IDENT_ATTR, %eax + je .Lsetup_data_addr + + cmpl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax + je .Ldone + + jmp .Lxip_mapping + +.Lsetup_text_addr: + movl $CONFIG_XIP_BASE + 4096 + PTE_IDENT_ATTR, %eax + movl $_text, %ebx + addl $PTE_IDENT_ATTR, %ebx + jmp .Lxip_mapping + +.Lsetup_data_addr: + movl $pa(_sdata), %eax + addl $PTE_IDENT_ATTR, %eax + movl $_sdata, %ebx + addl $PTE_IDENT_ATTR, %ebx + jmp .Lxip_mapping +.Ldone: + addl $__PAGE_OFFSET, %edi + movl %edi, pa(_brk_end) + movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax + shrl $12, %eax + movl %eax, pa(max_pfn_mapped) + + movl $pa(initial_pg_fixmap) + PTE_IDENT_ATTR, %eax + movl %eax, pa(initial_page_table + 0xFFC) + +#else + page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(__brk_base), %edi @@ -257,6 +341,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20); movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax movl %eax,pa(initial_page_table+0xffc) #endif +#endif #ifdef CONFIG_PARAVIRT /* This is can only trip for a broken bootloader... */ -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/11] x86/xip: resolve alternative instructions at build
Since the .text section can't be updated at run-time, remove the .alternatives sections and update the .text at build time. To pick the proper instructions, Kconfig options are exposed for each X86_FEATURE that needed to be resolved. Each X86_FEATURE gets a corresponding CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option is set, a resolver macro is setup to either generate that instruction, or the fallback. The resolver needs to be defined for each FEATURE, and the proper one is chosen via preprocessor string pasting. This approach is horrific and ugly. A better approach might be to add an additional build step that, after generating the vmlinux file, goes through the alternatives section and performs the fixups on the file. At the very least, a script like mkcapflags.sh could generate the resolver functions automatically. But since it's adding Kconfig options, it would need to run unconditionally before any of the config related Makefile targets. Signed-off-by: Jim Kukunas --- arch/x86/Kconfig | 45 + arch/x86/include/asm/alternative-xip.h | 161 + arch/x86/include/asm/alternative.h | 5 + arch/x86/kernel/alternative.c | 7 ++ arch/x86/kernel/cpu/bugs.c | 2 + arch/x86/kernel/setup.c| 2 + arch/x86/kernel/smpboot.c | 2 + arch/x86/kernel/vmlinux.lds.S | 2 + arch/x86/vdso/vma.c| 2 + 9 files changed, 228 insertions(+) create mode 100644 arch/x86/include/asm/alternative-xip.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f5fa02c..dff781d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -313,6 +313,51 @@ config XIP_BASE help The physical address for the beginning of the vmlinux file. +menu "XIP Alternative Instructions" + depends on XIP_KERNEL + + config XIP_ENABLE_X86_FEATURE_POPCNT + bool "Enable POPCNT alternative instructions" + default n + + config XIP_ENABLE_X86_BUG_11AP + bool "Enable 11AP alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_XMM2 + bool "Enable XMM2 alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_MFENCE_RDTSC + bool "Enable MFENCE_RDTSC alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_LFENCE_RDTSC + bool "Enable LFENCE_RDTSC alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_3DNOW + bool "Enable 3DNOW alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_XMM + bool "Enabled XMM alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_CLFLUSHOPT + bool "Enable CLFLUSHOPT alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_XSAVEOPT + bool "Enable XSAVEOPT alternative instructions" + default n + + config XIP_ENABLE_X86_FEATURE_XSAVES + bool "Enable XSAVES alternative instructions" + default n + +endmenu + config SMP bool "Symmetric multi-processing support" ---help--- diff --git a/arch/x86/include/asm/alternative-xip.h b/arch/x86/include/asm/alternative-xip.h new file mode 100644 index 000..84f544e --- /dev/null +++ b/arch/x86/include/asm/alternative-xip.h @@ -0,0 +1,161 @@ +#ifndef _ASM_X86_ALTERNATIVE_XIP_H +#define _ASM_X86_ALTERNATIVE_XIP_H + +/* + * Alternative instruction fixup for XIP + * + * Copyright (C) 2014 Intel Corporation + * Author: Jim Kukunas + * + * Since the kernel text is executing from storage and is + * read-only, we can't update the opcodes in-flight. Instead, + * resolve the alternatives at build time through preprocessor + * (ab)use. + */ + +#ifdef CONFIG_SMP +#define LOCK_PREFIX "\n\tlock; " +#else +#define LOCK_PREFIX "" +#endif + +extern int poke_int3_handler(struct pt_regs *regs); + +/* TODO hook up to something like mkcapflags.sh */ +/* Unfortunately, each X86_FEATURE will need a corresponding define like this */ +#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_POPCNT +#define RESOLVE_X86_FEATURE_POPCNT(old, new) new +#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) new2 +#else +#define RESOLVE_X86_FEATURE_POPCNT(old, new) old +#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) \ + resolve1(old, new1) +#endif + +#ifdef CONFIG_XIP_ENABLE_X86_BUG_11AP +#define RESOLVE_X86_BUG_11AP(old, new) new +#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) new2 +#else +#define RESOLVE_X86_BUG_11AP(old, new) old +#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) \ + resolve1(old, new1) +#endif + +#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_XMM2 +#define RESOLVE_X86_FEATURE_XMM2(old, new) new +#define RESOLVE_2_X86_FEATURE_XMM2(old, new
[PATCH 11/11] x86/xip: update _va() and _pa() macros
For obtaining the physical address, we always take the slow path of slow_virt_to_phys(). In the future, we should probably special case data addresses to avoid walking the page table. For obtaining a virtual address, this patch introduces a slow path of slow_xip_phys_to_virt(). Signed-off-by: Jim Kukunas --- arch/x86/include/asm/page.h | 15 +++ arch/x86/mm/pageattr.c | 11 +++ 2 files changed, 26 insertions(+) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 802dde3..b54c7be 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -6,6 +6,7 @@ #ifdef __KERNEL__ #include +#include #ifdef CONFIG_X86_64 #include @@ -37,8 +38,15 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr) #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE + +#ifdef CONFIG_XIP_KERNEL /* TODO special case text translations */ +#define __pa(x)slow_virt_to_phys((void *)(x)) +#define __pa_nodebug slow_virt_to_phys((void *)(x)) +#else #define __pa(x)__phys_addr((unsigned long)(x)) #define __pa_nodebug(x)__phys_addr_nodebug((unsigned long)(x)) +#endif + /* __pa_symbol should be used for C visible symbols. This seems to be the official gcc blessed way to do such arithmetic. */ /* @@ -51,7 +59,14 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, #define __pa_symbol(x) \ __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x))) + +#ifdef CONFIG_XIP_KERNEL +extern unsigned long slow_xip_phys_to_virt(phys_addr_t); + +#define __va(x)((void *)slow_xip_phys_to_virt((phys_addr_t)x)) +#else #define __va(x)((void *)((unsigned long)(x)+PAGE_OFFSET)) +#endif #define __boot_va(x) __va(x) #define __boot_pa(x) __pa(x) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 536ea2f..ca9e2ca 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -383,6 +383,17 @@ static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address, return lookup_address(address, level); } +#ifdef CONFIG_XIP_KERNEL +unsigned long slow_xip_phys_to_virt(phys_addr_t x) +{ + if (x >= CONFIG_XIP_BASE && x <= (phys_addr_t)phys_sdata) { + unsigned long off = x - CONFIG_XIP_BASE; + return PAGE_OFFSET + off; + } + return x + PAGE_OFFSET; +} +#endif + /* * Lookup the PMD entry for a virtual address. Return a pointer to the entry * or NULL if not present. -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/11] x86/xip: reserve memblock for only data
Nothing is loaded at the usual spot for .text, starting at CONFIG_PHYSICAL_START, so we don't reserve it. Additionally, the physical address of the _text isn't going to be physically contiguous with _data. Signed-off-by: Jim Kukunas --- arch/x86/kernel/setup.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 98dc931..e2d85c4 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -869,8 +869,13 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p) void __init setup_arch(char **cmdline_p) { +#ifdef CONFIG_XIP_KERNEL + memblock_reserve(__pa_symbol(_sdata), + (unsigned long)__bss_stop - (unsigned long)_sdata); +#else memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text); + #endif early_reserve_initrd(); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/11] x86/xip: Update address of sections in linker script
In order to update the LMA for each section, according to CONFIG_XIP_BASE, this patch uses the preprocessor to change the arguments passed to the AT keyword. Each LMA is updated to that symbol's physical address. The text section is aligned to a page so that the ELF header at the beginning of XIP_BASE isn't mapped into the linear address space. Also the initial location counter is incremented to account for the ELF header. Signed-off-by: Jim Kukunas --- arch/x86/include/asm/boot.h | 4 arch/x86/kernel/vmlinux.lds.S | 17 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index 4fa687a..a128c71 100644 --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -10,6 +10,10 @@ + (CONFIG_PHYSICAL_ALIGN - 1)) \ & ~(CONFIG_PHYSICAL_ALIGN - 1)) +#ifdef CONFIG_XIP_KERNEL +#define PHYS_XIP_OFFSET (CONFIG_XIP_BASE - (LOAD_OFFSET + LOAD_PHYSICAL_ADDR)) +#endif + /* Minimum kernel alignment, as a power of two */ #ifdef CONFIG_X86_64 #define MIN_KERNEL_ALIGN_LG2 PMD_SHIFT diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 00bf300..414a1ac 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -20,6 +20,15 @@ #define LOAD_OFFSET __START_KERNEL_map #endif +#ifdef CONFIG_XIP_KERNEL +#define AT(x) AT(x + LOAD_OFFSET + PHYS_XIP_OFFSET) +#define TEXT_ALIGN ALIGN(0x1000) +#define DATA_ALIGN ALIGN(0x20) +#else +#define TEXT_ALIGN +#define DATA_ALIGN +#endif + #include #include #include @@ -89,8 +98,12 @@ SECTIONS phys_startup_64 = startup_64 - LOAD_OFFSET; #endif +#ifdef CONFIG_XIP_KERNEL + . += SIZEOF_HEADERS; +#endif + /* Text and read-only data */ - .text : AT(ADDR(.text) - LOAD_OFFSET) { + .text : AT(ADDR(.text) - LOAD_OFFSET) TEXT_ALIGN { _text = .; /* bootstrapping code */ HEAD_TEXT @@ -121,7 +134,7 @@ SECTIONS X64_ALIGN_DEBUG_RODATA_END /* Data */ - .data : AT(ADDR(.data) - LOAD_OFFSET) { + .data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN { /* Start of data section */ _sdata = .; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/11] x86/xip: XIP boot trampoline page tables
Constructs the trampoline page tables for early XIP boot. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/kernel/head_32.S | 85 +++ 1 file changed, 85 insertions(+) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 80f344a..642d73b 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -227,6 +227,90 @@ xip_data_cp: movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8) #else /* Not PAE */ +#ifdef CONFIG_XIP_KERNEL + movl $pa(__brk_base), %edi + movl $pa(initial_page_table), %edx + + movl $PTE_IDENT_ATTR, %eax /* EAX holds identity mapping addr */ + movl $__PAGE_OFFSET + PTE_IDENT_ATTR, %ebx /* EBX holds kernel addr */ + +.Lxip_mapping: +/* Allocate or Load Identity PDE */ + leal -PTE_IDENT_ATTR(%eax), %ebp + andl $0xFFC0, %ebp + shrl $20, %ebp + movl (%edx, %ebp), %ecx + + test %ecx, %ecx + jnz .Lskip_ident_pde_alloc + leal PDE_IDENT_ATTR(%edi), %ecx + addl $4096, %edi + movl %ecx, (%edx, %ebp) + +.Lskip_ident_pde_alloc: + leal -PDE_IDENT_ATTR(%ecx), %ecx + leal -PTE_IDENT_ATTR(%eax), %ebp + andl $0x3FF000, %ebp + shrl $10, %ebp + movl %eax, (%ecx, %ebp) + +/* Allocate or Load PAGE_OFFSET PDE */ + leal -PTE_IDENT_ATTR(%ebx), %ebp + andl $0xFFC0, %ebp + shrl $20, %ebp + movl (%edx, %ebp), %ecx + + test %ecx, %ecx + jnz .Lskip_offset_pde_alloc + leal PDE_IDENT_ATTR(%edi), %ecx + addl $4096, %edi + movl %ecx, (%edx, %ebp) + +.Lskip_offset_pde_alloc: + leal -PDE_IDENT_ATTR(%ecx), %ecx + leal -PTE_IDENT_ATTR(%ebx), %ebp + andl $0x3FF000, %ebp + shrl $10, %ebp + movl %eax, (%ecx, %ebp) + + addl $4096, %eax + addl $4096, %ebx + + cmpl $CONFIG_PHYSICAL_START + PTE_IDENT_ATTR, %eax + je .Lsetup_text_addr + + cmpl $phys_sdata + PTE_IDENT_ATTR, %eax + je .Lsetup_data_addr + + cmpl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax + je .Ldone + + jmp .Lxip_mapping + +.Lsetup_text_addr: + movl $CONFIG_XIP_BASE + 4096 + PTE_IDENT_ATTR, %eax + movl $_text, %ebx + addl $PTE_IDENT_ATTR, %ebx + jmp .Lxip_mapping + +.Lsetup_data_addr: + movl $pa(_sdata), %eax + addl $PTE_IDENT_ATTR, %eax + movl $_sdata, %ebx + addl $PTE_IDENT_ATTR, %ebx + jmp .Lxip_mapping +.Ldone: + addl $__PAGE_OFFSET, %edi + movl %edi, pa(_brk_end) + movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax + shrl $12, %eax + movl %eax, pa(max_pfn_mapped) + + movl $pa(initial_pg_fixmap) + PTE_IDENT_ATTR, %eax + movl %eax, pa(initial_page_table + 0xFFC) + +#else + page_pde_offset = (__PAGE_OFFSET 20); movl $pa(__brk_base), %edi @@ -257,6 +341,7 @@ page_pde_offset = (__PAGE_OFFSET 20); movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax movl %eax,pa(initial_page_table+0xffc) #endif +#endif #ifdef CONFIG_PARAVIRT /* This is can only trip for a broken bootloader... */ -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/11] x86/xip: resolve alternative instructions at build
Since the .text section can't be updated at run-time, remove the .alternatives sections and update the .text at build time. To pick the proper instructions, Kconfig options are exposed for each X86_FEATURE that needed to be resolved. Each X86_FEATURE gets a corresponding CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option is set, a resolver macro is setup to either generate that instruction, or the fallback. The resolver needs to be defined for each FEATURE, and the proper one is chosen via preprocessor string pasting. This approach is horrific and ugly. A better approach might be to add an additional build step that, after generating the vmlinux file, goes through the alternatives section and performs the fixups on the file. At the very least, a script like mkcapflags.sh could generate the resolver functions automatically. But since it's adding Kconfig options, it would need to run unconditionally before any of the config related Makefile targets. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/Kconfig | 45 + arch/x86/include/asm/alternative-xip.h | 161 + arch/x86/include/asm/alternative.h | 5 + arch/x86/kernel/alternative.c | 7 ++ arch/x86/kernel/cpu/bugs.c | 2 + arch/x86/kernel/setup.c| 2 + arch/x86/kernel/smpboot.c | 2 + arch/x86/kernel/vmlinux.lds.S | 2 + arch/x86/vdso/vma.c| 2 + 9 files changed, 228 insertions(+) create mode 100644 arch/x86/include/asm/alternative-xip.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f5fa02c..dff781d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -313,6 +313,51 @@ config XIP_BASE help The physical address for the beginning of the vmlinux file. +menu XIP Alternative Instructions + depends on XIP_KERNEL + + config XIP_ENABLE_X86_FEATURE_POPCNT + bool Enable POPCNT alternative instructions + default n + + config XIP_ENABLE_X86_BUG_11AP + bool Enable 11AP alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_XMM2 + bool Enable XMM2 alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_MFENCE_RDTSC + bool Enable MFENCE_RDTSC alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_LFENCE_RDTSC + bool Enable LFENCE_RDTSC alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_3DNOW + bool Enable 3DNOW alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_XMM + bool Enabled XMM alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_CLFLUSHOPT + bool Enable CLFLUSHOPT alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_XSAVEOPT + bool Enable XSAVEOPT alternative instructions + default n + + config XIP_ENABLE_X86_FEATURE_XSAVES + bool Enable XSAVES alternative instructions + default n + +endmenu + config SMP bool Symmetric multi-processing support ---help--- diff --git a/arch/x86/include/asm/alternative-xip.h b/arch/x86/include/asm/alternative-xip.h new file mode 100644 index 000..84f544e --- /dev/null +++ b/arch/x86/include/asm/alternative-xip.h @@ -0,0 +1,161 @@ +#ifndef _ASM_X86_ALTERNATIVE_XIP_H +#define _ASM_X86_ALTERNATIVE_XIP_H + +/* + * Alternative instruction fixup for XIP + * + * Copyright (C) 2014 Intel Corporation + * Author: Jim Kukunas james.t.kuku...@linux.intel.com + * + * Since the kernel text is executing from storage and is + * read-only, we can't update the opcodes in-flight. Instead, + * resolve the alternatives at build time through preprocessor + * (ab)use. + */ + +#ifdef CONFIG_SMP +#define LOCK_PREFIX \n\tlock; +#else +#define LOCK_PREFIX +#endif + +extern int poke_int3_handler(struct pt_regs *regs); + +/* TODO hook up to something like mkcapflags.sh */ +/* Unfortunately, each X86_FEATURE will need a corresponding define like this */ +#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_POPCNT +#define RESOLVE_X86_FEATURE_POPCNT(old, new) new +#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) new2 +#else +#define RESOLVE_X86_FEATURE_POPCNT(old, new) old +#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) \ + resolve1(old, new1) +#endif + +#ifdef CONFIG_XIP_ENABLE_X86_BUG_11AP +#define RESOLVE_X86_BUG_11AP(old, new) new +#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) new2 +#else +#define RESOLVE_X86_BUG_11AP(old, new) old +#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) \ + resolve1(old, new1) +#endif + +#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_XMM2 +#define RESOLVE_X86_FEATURE_XMM2(old, new) new +#define RESOLVE_2_X86_FEATURE_XMM2(old, new1, resolve1, new2) new2 +#else +#define RESOLVE_X86_FEATURE_XMM2(old, new) old +#define
[PATCH 08/11] x86/xip: in setup_arch(), handle resource physical addr
set code_resources to proper physical addr in setup_arch() Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/kernel/setup.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 74fc6c8..f044453 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -986,9 +986,16 @@ void __init setup_arch(char **cmdline_p) mpx_mm_init(init_mm); +#ifndef CONFIG_XIP_KERNEL code_resource.start = __pa_symbol(_text); code_resource.end = __pa_symbol(_etext)-1; + data_resource.start = _pa(_sdata)-1; +#else + code_resource.start = CONFIG_XIP_BASE; + code_resource.end = (phys_addr_t)phys_sdata-1; data_resource.start = __pa_symbol(_etext); +#endif + data_resource.end = __pa_symbol(_edata)-1; bss_resource.start = __pa_symbol(__bss_start); bss_resource.end = __pa_symbol(__bss_stop)-1; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/11] x86/xip: snip the kernel text out of the memory mapping
If the kernel tries to create an identity region for a memory range that spans the kernel text, split it into two pieces, skipping the text section. Otherwise, this will setup the standard text mapping, which will point to the normal RAM location for text instead of the XIP_BASE location. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/mm/init.c | 78 ++ 1 file changed, 78 insertions(+) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index a110efc..07b20c6 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -391,6 +391,82 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn) return false; } +#ifdef CONFIG_XIP_KERNEL +/* + * Cut the .text virtual address out of mem range b/c the mapping + * is already correctly setup + */ +static inline void snip_xip_text(struct map_range *mr, int *nr_range) +{ + int i; + + for (i = 0; i *nr_range; i++) { + long diff; + + if (mr[i].start = CONFIG_PHYSICAL_START + mr[i].end = CONFIG_PHYSICAL_START) + continue; + if (mr[i].start = __pa_symbol(_sdata)) + continue; + + diff = mr[i].start - CONFIG_PHYSICAL_START; + if (diff 0) { /* range starts below .text and includes it */ + diff = mr[i].end - __pa_symbol(_sdata); + + /* shorten segment so it ends just before .text */ + mr[i].end = CONFIG_PHYSICAL_START; + + /* if segment goes past .text, add 2nd segment*/ + if (diff 0) { + /* move next section down 1 */ + if (i + 1 *nr_range) { + memmove(mr[i + 1], mr[i + 2], + sizeof(struct map_range[ + *nr_range - i - 2])); + } + mr[i + 1].start = __pa_symbol(_sdata); + mr[i + 1].end = mr[i + 1].start + diff; + mr[i + 1].page_size_mask = 0; + *nr_range = *nr_range + 1; + i++; + } + } else if (diff == 0) { + diff = mr[i].end - __pa_symbol(_sdata); + if (diff 0) { + mr[i].start = __pa_symbol(_sdata); + mr[i].end = mr[i].start + diff; + mr[i].page_size_mask = 0; + } else { + /* delete this range */ + memmove(mr[i + 1], mr[i], sizeof( + struct map_range[*nr_range - i - 1])); + *nr_range = *nr_range - 1; + i--; + } + } else if (diff 0) { + long ediff = mr[i].end - __pa_symbol(_sdata); + + if (ediff 0) { + mr[i].start = __pa_symbol(_sdata); + mr[i].end = mr[i].start + ediff; + mr[i].page_size_mask = 0; + } else { + /* delete this range */ + memmove(mr[i + 1], mr[i], sizeof( + struct map_range[*nr_range - i - 1])); + *nr_range = *nr_range - 1; + i--; + } + } + break; + } +} +#else +static inline void snip_xip_text(struct map_range *mr, int *mr_range) +{ +} +#endif + /* * Setup the direct mapping of the physical memory at PAGE_OFFSET. * This runs before bootmem is initialized and gets pages directly from @@ -409,6 +485,8 @@ unsigned long __init_refok init_memory_mapping(unsigned long start, memset(mr, 0, sizeof(mr)); nr_range = split_mem_range(mr, 0, start, end); + snip_xip_text(mr, nr_range); + for (i = 0; i nr_range; i++) ret = kernel_physical_mapping_init(mr[i].start, mr[i].end, mr[i].page_size_mask); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/11] x86/xip: update _va() and _pa() macros
For obtaining the physical address, we always take the slow path of slow_virt_to_phys(). In the future, we should probably special case data addresses to avoid walking the page table. For obtaining a virtual address, this patch introduces a slow path of slow_xip_phys_to_virt(). Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/include/asm/page.h | 15 +++ arch/x86/mm/pageattr.c | 11 +++ 2 files changed, 26 insertions(+) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 802dde3..b54c7be 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -6,6 +6,7 @@ #ifdef __KERNEL__ #include asm/page_types.h +#include asm/pgtable_types.h #ifdef CONFIG_X86_64 #include asm/page_64.h @@ -37,8 +38,15 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr) #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE + +#ifdef CONFIG_XIP_KERNEL /* TODO special case text translations */ +#define __pa(x)slow_virt_to_phys((void *)(x)) +#define __pa_nodebug slow_virt_to_phys((void *)(x)) +#else #define __pa(x)__phys_addr((unsigned long)(x)) #define __pa_nodebug(x)__phys_addr_nodebug((unsigned long)(x)) +#endif + /* __pa_symbol should be used for C visible symbols. This seems to be the official gcc blessed way to do such arithmetic. */ /* @@ -51,7 +59,14 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, #define __pa_symbol(x) \ __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x))) + +#ifdef CONFIG_XIP_KERNEL +extern unsigned long slow_xip_phys_to_virt(phys_addr_t); + +#define __va(x)((void *)slow_xip_phys_to_virt((phys_addr_t)x)) +#else #define __va(x)((void *)((unsigned long)(x)+PAGE_OFFSET)) +#endif #define __boot_va(x) __va(x) #define __boot_pa(x) __pa(x) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 536ea2f..ca9e2ca 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -383,6 +383,17 @@ static pte_t *_lookup_address_cpa(struct cpa_data *cpa, unsigned long address, return lookup_address(address, level); } +#ifdef CONFIG_XIP_KERNEL +unsigned long slow_xip_phys_to_virt(phys_addr_t x) +{ + if (x = CONFIG_XIP_BASE x = (phys_addr_t)phys_sdata) { + unsigned long off = x - CONFIG_XIP_BASE; + return PAGE_OFFSET + off; + } + return x + PAGE_OFFSET; +} +#endif + /* * Lookup the PMD entry for a virtual address. Return a pointer to the entry * or NULL if not present. -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/11] x86/xip: make e820_add_kernel_range() a NOP
e820_add_kernel_range() checks whether the kernel text is present in the e820 map, and marked as usable RAM. If not, it modifies the e820 map accordingly. For XIP, that is unnecessary since the kernel text won't be loaded in RAM. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/kernel/setup.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index d276ebf..74fc6c8 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -787,6 +787,7 @@ static void __init trim_bios_range(void) } /* called before trim_bios_range() to spare extra sanitize */ +#ifndef CONFIG_XIP_KERNEL static void __init e820_add_kernel_range(void) { u64 start = __pa_symbol(_text); @@ -806,6 +807,11 @@ static void __init e820_add_kernel_range(void) e820_remove_range(start, size, E820_RAM, 0); e820_add_region(start, size, E820_RAM); } +#else +static void __init e820_add_kernel_range(void) +{ +} +#endif static unsigned reserve_low = CONFIG_X86_RESERVE_LOW 10; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/11] x86/xip: Update address of sections in linker script
In order to update the LMA for each section, according to CONFIG_XIP_BASE, this patch uses the preprocessor to change the arguments passed to the AT keyword. Each LMA is updated to that symbol's physical address. The text section is aligned to a page so that the ELF header at the beginning of XIP_BASE isn't mapped into the linear address space. Also the initial location counter is incremented to account for the ELF header. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/include/asm/boot.h | 4 arch/x86/kernel/vmlinux.lds.S | 17 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index 4fa687a..a128c71 100644 --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -10,6 +10,10 @@ + (CONFIG_PHYSICAL_ALIGN - 1)) \ ~(CONFIG_PHYSICAL_ALIGN - 1)) +#ifdef CONFIG_XIP_KERNEL +#define PHYS_XIP_OFFSET (CONFIG_XIP_BASE - (LOAD_OFFSET + LOAD_PHYSICAL_ADDR)) +#endif + /* Minimum kernel alignment, as a power of two */ #ifdef CONFIG_X86_64 #define MIN_KERNEL_ALIGN_LG2 PMD_SHIFT diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 00bf300..414a1ac 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -20,6 +20,15 @@ #define LOAD_OFFSET __START_KERNEL_map #endif +#ifdef CONFIG_XIP_KERNEL +#define AT(x) AT(x + LOAD_OFFSET + PHYS_XIP_OFFSET) +#define TEXT_ALIGN ALIGN(0x1000) +#define DATA_ALIGN ALIGN(0x20) +#else +#define TEXT_ALIGN +#define DATA_ALIGN +#endif + #include asm-generic/vmlinux.lds.h #include asm/asm-offsets.h #include asm/thread_info.h @@ -89,8 +98,12 @@ SECTIONS phys_startup_64 = startup_64 - LOAD_OFFSET; #endif +#ifdef CONFIG_XIP_KERNEL + . += SIZEOF_HEADERS; +#endif + /* Text and read-only data */ - .text : AT(ADDR(.text) - LOAD_OFFSET) { + .text : AT(ADDR(.text) - LOAD_OFFSET) TEXT_ALIGN { _text = .; /* bootstrapping code */ HEAD_TEXT @@ -121,7 +134,7 @@ SECTIONS X64_ALIGN_DEBUG_RODATA_END /* Data */ - .data : AT(ADDR(.data) - LOAD_OFFSET) { + .data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN { /* Start of data section */ _sdata = .; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/11] x86/xip: reserve memblock for only data
Nothing is loaded at the usual spot for .text, starting at CONFIG_PHYSICAL_START, so we don't reserve it. Additionally, the physical address of the _text isn't going to be physically contiguous with _data. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/kernel/setup.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 98dc931..e2d85c4 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -869,8 +869,13 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p) void __init setup_arch(char **cmdline_p) { +#ifdef CONFIG_XIP_KERNEL + memblock_reserve(__pa_symbol(_sdata), + (unsigned long)__bss_stop - (unsigned long)_sdata); +#else memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text); + #endif early_reserve_initrd(); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC] x86 XIP
Hi Folks, This patchset introduces eXecute-In-Place (XIP) support for x86. Right now only minimal configurations are supported (32-bit only, no SMP, no PAE, and so on). My goal is to increase the number of supported configurations in the future based on what functionality is requested. This patchset only supports storage configurations where the kernel text and read-only data will always be readable. I didn't create a special Makefile target for building xip images, like how ARM has xipImage. Instead, I'm just using the basic vmlinux ELF executable. The kernel must be built with CONFIG_XIP_BASE set to the physical address of the vmlinux file. Additionally, since the .text section is read-only, all of the alternative instructions need to be resolved at build-time. To accomplish this, the cpu features to enable are selected through a series of Kconfig options. In order to boot, the bootloader just needs to fill out the zero page (whose address startup_32() expects in esi), switch to 32-bit protected mode and then jump into startup_32(), which will be at CONFIG_XIP_BASE plus one page. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/11] x86/xip: copy writable sections into RAM
Loads all writable and non-zero sections into their VMA. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/include/asm/sections.h | 4 arch/x86/kernel/head_32.S | 22 ++ arch/x86/kernel/vmlinux.lds.S | 4 3 files changed, 30 insertions(+) diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h index 0a52424..9535e95 100644 --- a/arch/x86/include/asm/sections.h +++ b/arch/x86/include/asm/sections.h @@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[]; extern char __end_rodata_hpage_align[]; #endif +#ifdef CONFIG_XIP_KERNEL +extern char phys_sdata[]; +#endif + #endif /* _ASM_X86_SECTIONS_H */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index f36bd42..80f344a 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -86,6 +86,28 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE) */ __HEAD ENTRY(startup_32) + +#ifdef CONFIG_XIP_KERNEL + /* +* Copy writable sections into RAM +*/ + + movl %esi, %ebp # Preserve pointer to zero-page + + leal pa(_sdata), %edi + leal phys_edata, %ecx + leal phys_sdata, %esi + subl %esi, %ecx + + cld +xip_data_cp: + lodsb + stosb + loop xip_data_cp + + movl %ebp, %esi +#endif + movl pa(stack_start),%ecx /* test KEEP_SEGMENTS flag to see if the bootloader is asking diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 414a1ac..59a9edb 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -133,6 +133,9 @@ SECTIONS RO_DATA(PAGE_SIZE) X64_ALIGN_DEBUG_RODATA_END + phys_sdata = LOADADDR(.data); + phys_edata = phys_sdata + (_end_nonzero - _sdata); + /* Data */ .data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN { /* Start of data section */ @@ -319,6 +322,7 @@ SECTIONS NOSAVE_DATA } #endif + _end_nonzero = .; /* BSS */ . = ALIGN(PAGE_SIZE); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/11] x86/xip: add XIP_KERNEL and XIP_BASE options
The CONFIG_XIP_KERNEL Kconfig option enables eXecute-In-Place (XIP) support. When XIP_KERNEL is set, XIP_BASE points to the physical address of the vmlinux ELF file. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/Kconfig | 19 +++ 1 file changed, 19 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index b7d31ca..f5fa02c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -294,6 +294,25 @@ config ZONE_DMA If unsure, say Y. +config XIP_KERNEL + bool eXecute-In-Place (XIP) support if (X86_32 EXPERT EMBEDDED) + depends on !MODULES !X86_PAE !SMP + default n + help + With this option enabled, the text and any read-only segments of + the kernel are not copied from their initial location to their usual + location in RAM. As a result, when the kernel is located in storage + that is addressable by the CPU, the kernel text and read-only data + segments are never loaded into memory, thereby using less RAM. + + Only enable this option if you know what you're doing. + +config XIP_BASE + hex Physical address of XIP kernel if XIP_KERNEL + default 0xFF80 + help + The physical address for the beginning of the vmlinux file. + config SMP bool Symmetric multi-processing support ---help--- -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/11] x86/xip: after paging trampoline, discard PMDs above _brk
In the likely case that XIP_BASE is above PAGE_OFFSET, we want to discard any early identity mappings. So rather than keeping every PMD above PAGE_OFFSET, only copy the ones from PAGE_OFFSET to the last PMD of _end. At this point, the linear address space should look normal. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/include/asm/pgtable.h | 7 +++ arch/x86/kernel/setup.c| 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a0c35bf..5eaba7d 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -660,6 +660,13 @@ static inline int pgd_none(pgd_t pgd) #define KERNEL_PGD_BOUNDARYpgd_index(PAGE_OFFSET) #define KERNEL_PGD_PTRS(PTRS_PER_PGD - KERNEL_PGD_BOUNDARY) +#ifdef CONFIG_XIP_KERNEL +#define BOOT_PGD_COPY_PTRS \ + ((pgd_index((unsigned long)_end) - pgd_index(PAGE_OFFSET)) + 4) +#else +#define BOOT_PGD_COPY_PTRS KERNEL_PGD_PTRS +#endif + #ifndef __ASSEMBLY__ extern int direct_gbpages; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index e2d85c4..d276ebf 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -894,7 +894,7 @@ void __init setup_arch(char **cmdline_p) */ clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY, initial_page_table + KERNEL_PGD_BOUNDARY, - KERNEL_PGD_PTRS); + BOOT_PGD_COPY_PTRS); load_cr3(swapper_pg_dir); /* -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lib/deflate: Replace UNALIGNED_OK w/ HAVE_EFFICIENT_UNALIGNED_ACCESS
Zlib implements a byte-by-byte and a word-by-word longest_match() string comparision function. This implementation defaults to the slower byte-by-byte version unless the preprocessor macro UNALIGNED_OK is defined. Currently, nothing is hooked up to define this macro, but we do have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, which serves the same purpose. The exact performance improvement of the word-by-word implementation is data dependant, but on x86 it is typically in the range of a 5-10% cycle reduction. The code is already there, might as well use it ... Signed-off-by: Jim Kukunas --- lib/zlib_deflate/deflate.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c index d20ef45..4920e51 100644 --- a/lib/zlib_deflate/deflate.c +++ b/lib/zlib_deflate/deflate.c @@ -570,9 +570,9 @@ static uInt longest_match( Pos *prev = s->prev; uInt wmask = s->w_mask; -#ifdef UNALIGNED_OK +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS /* Compare two bytes at a time. Note: this is not always beneficial. - * Try with and without -DUNALIGNED_OK to check. + * Try with and without -DCONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to check. */ register Byte *strend = s->window + s->strstart + MAX_MATCH - 1; register ush scan_start = *(ush*)scan; @@ -606,9 +606,10 @@ static uInt longest_match( /* Skip to next match if the match length cannot increase * or if the match length is less than 2: */ -#if (defined(UNALIGNED_OK) && MAX_MATCH == 258) +#if (defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && MAX_MATCH == 258) /* This code assumes sizeof(unsigned short) == 2. Do not use - * UNALIGNED_OK if your compiler uses a different size. +* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if your compiler uses a +* different size. */ if (*(ush*)(match+best_len-1) != scan_end || *(ush*)match != scan_start) continue; @@ -639,7 +640,7 @@ static uInt longest_match( len = (MAX_MATCH - 1) - (int)(strend-scan); scan = strend - (MAX_MATCH-1); -#else /* UNALIGNED_OK */ +#else /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ if (match[best_len] != scan_end || match[best_len-1] != scan_end1 || @@ -670,13 +671,13 @@ static uInt longest_match( len = MAX_MATCH - (int)(strend - scan); scan = strend - MAX_MATCH; -#endif /* UNALIGNED_OK */ +#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ if (len > best_len) { s->match_start = cur_match; best_len = len; if (len >= nice_match) break; -#ifdef UNALIGNED_OK +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS scan_end = *(ush*)(scan+best_len-1); #else scan_end1 = scan[best_len-1]; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lib/deflate: Replace UNALIGNED_OK w/ HAVE_EFFICIENT_UNALIGNED_ACCESS
Zlib implements a byte-by-byte and a word-by-word longest_match() string comparision function. This implementation defaults to the slower byte-by-byte version unless the preprocessor macro UNALIGNED_OK is defined. Currently, nothing is hooked up to define this macro, but we do have CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, which serves the same purpose. The exact performance improvement of the word-by-word implementation is data dependant, but on x86 it is typically in the range of a 5-10% cycle reduction. The code is already there, might as well use it ... Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- lib/zlib_deflate/deflate.c | 15 --- 1 files changed, 8 insertions(+), 7 deletions(-) diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c index d20ef45..4920e51 100644 --- a/lib/zlib_deflate/deflate.c +++ b/lib/zlib_deflate/deflate.c @@ -570,9 +570,9 @@ static uInt longest_match( Pos *prev = s-prev; uInt wmask = s-w_mask; -#ifdef UNALIGNED_OK +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS /* Compare two bytes at a time. Note: this is not always beneficial. - * Try with and without -DUNALIGNED_OK to check. + * Try with and without -DCONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to check. */ register Byte *strend = s-window + s-strstart + MAX_MATCH - 1; register ush scan_start = *(ush*)scan; @@ -606,9 +606,10 @@ static uInt longest_match( /* Skip to next match if the match length cannot increase * or if the match length is less than 2: */ -#if (defined(UNALIGNED_OK) MAX_MATCH == 258) +#if (defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) MAX_MATCH == 258) /* This code assumes sizeof(unsigned short) == 2. Do not use - * UNALIGNED_OK if your compiler uses a different size. +* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if your compiler uses a +* different size. */ if (*(ush*)(match+best_len-1) != scan_end || *(ush*)match != scan_start) continue; @@ -639,7 +640,7 @@ static uInt longest_match( len = (MAX_MATCH - 1) - (int)(strend-scan); scan = strend - (MAX_MATCH-1); -#else /* UNALIGNED_OK */ +#else /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ if (match[best_len] != scan_end || match[best_len-1] != scan_end1 || @@ -670,13 +671,13 @@ static uInt longest_match( len = MAX_MATCH - (int)(strend - scan); scan = strend - MAX_MATCH; -#endif /* UNALIGNED_OK */ +#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */ if (len best_len) { s-match_start = cur_match; best_len = len; if (len = nice_match) break; -#ifdef UNALIGNED_OK +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS scan_end = *(ush*)(scan+best_len-1); #else scan_end1 = scan[best_len-1]; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lib/raid6: Add AVX2 optimized recovery functions
On Fri, Nov 09, 2012 at 10:50:25PM +1100, Neil Brown wrote: > On Fri, 09 Nov 2012 12:39:05 +0100 "H. Peter Anvin" wrote: > > > Sorry, we cannot share those at this time since the hardwarenis not yet > > released. > > Can I take that to imply "Acked-by: "H. Peter Anvin" " ?? > > It would be nice to have at least a statement like: > These patches have been tested both with the user-space testing tool and in > a RAID6 md array and the pass all test. While we cannot release performance > numbers as the hardwere is not released, we can confirm that on that hardware > the performance with these patches is faster than without. > > I guess I should be able to assume that - surely the patches would not be > posted if it were not true... But I like to avoid assuming when I can. Hi Neil, That assumption is correct. The patch was tested and benchmarked before submission. You'll notice that this code is very similar to the SSSE3-optimized recovery routines I wrote earlier. This implementation extends that same algorithm from 128-bit registers to 256-bit registers. Thanks. -- Jim Kukunas Intel Open Source Technology Center pgplXyZIdQ4sp.pgp Description: PGP signature
Re: [PATCH] lib/raid6: Add AVX2 optimized recovery functions
On Fri, Nov 09, 2012 at 10:50:25PM +1100, Neil Brown wrote: On Fri, 09 Nov 2012 12:39:05 +0100 H. Peter Anvin h...@zytor.com wrote: Sorry, we cannot share those at this time since the hardwarenis not yet released. Can I take that to imply Acked-by: H. Peter Anvin h...@zytor.com ?? It would be nice to have at least a statement like: These patches have been tested both with the user-space testing tool and in a RAID6 md array and the pass all test. While we cannot release performance numbers as the hardwere is not released, we can confirm that on that hardware the performance with these patches is faster than without. I guess I should be able to assume that - surely the patches would not be posted if it were not true... But I like to avoid assuming when I can. Hi Neil, That assumption is correct. The patch was tested and benchmarked before submission. You'll notice that this code is very similar to the SSSE3-optimized recovery routines I wrote earlier. This implementation extends that same algorithm from 128-bit registers to 256-bit registers. Thanks. -- Jim Kukunas Intel Open Source Technology Center pgplXyZIdQ4sp.pgp Description: PGP signature
[PATCH] lib/raid6: Add AVX2 optimized recovery functions
Optimize RAID6 recovery functions to take advantage of the 256-bit YMM integer instructions introduced in AVX2. Signed-off-by: Jim Kukunas --- arch/x86/Makefile | 5 +- include/linux/raid/pq.h | 1 + lib/raid6/Makefile | 2 +- lib/raid6/algos.c | 3 + lib/raid6/recov_avx2.c | 327 lib/raid6/test/Makefile | 2 +- lib/raid6/x86.h | 14 ++- 7 files changed, 345 insertions(+), 9 deletions(-) create mode 100644 lib/raid6/recov_avx2.c diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 682e9c2..f24c037 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -123,9 +123,10 @@ cfi-sections := $(call as-instr,.cfi_sections .debug_frame,-DCONFIG_AS_CFI_SECTI # does binutils support specific instructions? asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1) avx_instr := $(call as-instr,vxorps %ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_AVX=1) +avx2_instr :=$(call as-instr,vpbroadcastb %xmm0$(comma)%ymm1,-DCONFIG_AS_AVX2=1) -KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) -KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) +KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) $(avx2_instr) +KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) $(avx2_instr) LDFLAGS := -m elf_$(UTS_MACHINE) diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 640c69c..3156347 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -109,6 +109,7 @@ struct raid6_recov_calls { extern const struct raid6_recov_calls raid6_recov_intx1; extern const struct raid6_recov_calls raid6_recov_ssse3; +extern const struct raid6_recov_calls raid6_recov_avx2; /* Algorithm list */ extern const struct raid6_calls * const raid6_algos[]; diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index de06dfe..8c2e22b 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -1,6 +1,6 @@ obj-$(CONFIG_RAID6_PQ) += raid6_pq.o -raid6_pq-y += algos.o recov.o recov_ssse3.o tables.o int1.o int2.o int4.o \ +raid6_pq-y += algos.o recov.o recov_ssse3.o recov_avx2.o tables.o int1.o int2.o int4.o \ int8.o int16.o int32.o altivec1.o altivec2.o altivec4.o \ altivec8.o mmx.o sse1.o sse2.o hostprogs-y+= mktables diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c index 589f5f5..8b7f55c 100644 --- a/lib/raid6/algos.c +++ b/lib/raid6/algos.c @@ -72,6 +72,9 @@ EXPORT_SYMBOL_GPL(raid6_datap_recov); const struct raid6_recov_calls *const raid6_recov_algos[] = { #if (defined(__i386__) || defined(__x86_64__)) && !defined(__arch_um__) +#ifdef CONFIG_AS_AVX2 + _recov_avx2, +#endif _recov_ssse3, #endif _recov_intx1, diff --git a/lib/raid6/recov_avx2.c b/lib/raid6/recov_avx2.c new file mode 100644 index 000..43a9bab --- /dev/null +++ b/lib/raid6/recov_avx2.c @@ -0,0 +1,327 @@ +/* + * Copyright (C) 2012 Intel Corporation + * Author: Jim Kukunas + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#if (defined(__i386__) || defined(__x86_64__)) && !defined(__arch_um__) + +#if CONFIG_AS_AVX2 + +#include +#include "x86.h" + +static int raid6_has_avx2(void) +{ + return boot_cpu_has(X86_FEATURE_AVX2) && + boot_cpu_has(X86_FEATURE_AVX); +} + +static void raid6_2data_recov_avx2(int disks, size_t bytes, int faila, + int failb, void **ptrs) +{ + u8 *p, *q, *dp, *dq; + const u8 *pbmul;/* P multiplier table for B data */ + const u8 *qmul; /* Q multiplier table (for both) */ + const u8 x0f = 0x0f; + + p = (u8 *)ptrs[disks-2]; + q = (u8 *)ptrs[disks-1]; + + /* Compute syndrome with zero for the missing data pages + Use the dead data pages as temporary storage for + delta p and delta q */ + dp = (u8 *)ptrs[faila]; + ptrs[faila] = (void *)raid6_empty_zero_page; + ptrs[disks-2] = dp; + dq = (u8 *)ptrs[failb]; + ptrs[failb] = (void *)raid6_empty_zero_page; + ptrs[disks-1] = dq; + + raid6_call.gen_syndrome(disks, bytes, ptrs); + + /* Restore pointer table */ + ptrs[faila] = dp; + ptrs[failb] = dq; + ptrs[disks-2] = p; + ptrs[disks-1] = q; + + /* Now, pick the proper data tables */ + pbmul = raid6_vgfmul[raid6_gfexi[failb-faila]]; + qmul = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^ + raid6_gfexp[failb]]]; + + kernel_fpu_begin(); + + /* ymm0 = x0f[16] */ + asm volatile("vpbroadcastb %0, %%ymm7" : : "m" (x0f)); + + while (bytes) { +#ifdef CONFIG_X86_64 + asm volatile(&qu
[PATCH] lib/raid6: Add AVX2 optimized recovery functions
Optimize RAID6 recovery functions to take advantage of the 256-bit YMM integer instructions introduced in AVX2. Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com --- arch/x86/Makefile | 5 +- include/linux/raid/pq.h | 1 + lib/raid6/Makefile | 2 +- lib/raid6/algos.c | 3 + lib/raid6/recov_avx2.c | 327 lib/raid6/test/Makefile | 2 +- lib/raid6/x86.h | 14 ++- 7 files changed, 345 insertions(+), 9 deletions(-) create mode 100644 lib/raid6/recov_avx2.c diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 682e9c2..f24c037 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -123,9 +123,10 @@ cfi-sections := $(call as-instr,.cfi_sections .debug_frame,-DCONFIG_AS_CFI_SECTI # does binutils support specific instructions? asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1) avx_instr := $(call as-instr,vxorps %ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_AVX=1) +avx2_instr :=$(call as-instr,vpbroadcastb %xmm0$(comma)%ymm1,-DCONFIG_AS_AVX2=1) -KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) -KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) +KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) $(avx2_instr) +KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr) $(avx2_instr) LDFLAGS := -m elf_$(UTS_MACHINE) diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 640c69c..3156347 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -109,6 +109,7 @@ struct raid6_recov_calls { extern const struct raid6_recov_calls raid6_recov_intx1; extern const struct raid6_recov_calls raid6_recov_ssse3; +extern const struct raid6_recov_calls raid6_recov_avx2; /* Algorithm list */ extern const struct raid6_calls * const raid6_algos[]; diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index de06dfe..8c2e22b 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -1,6 +1,6 @@ obj-$(CONFIG_RAID6_PQ) += raid6_pq.o -raid6_pq-y += algos.o recov.o recov_ssse3.o tables.o int1.o int2.o int4.o \ +raid6_pq-y += algos.o recov.o recov_ssse3.o recov_avx2.o tables.o int1.o int2.o int4.o \ int8.o int16.o int32.o altivec1.o altivec2.o altivec4.o \ altivec8.o mmx.o sse1.o sse2.o hostprogs-y+= mktables diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c index 589f5f5..8b7f55c 100644 --- a/lib/raid6/algos.c +++ b/lib/raid6/algos.c @@ -72,6 +72,9 @@ EXPORT_SYMBOL_GPL(raid6_datap_recov); const struct raid6_recov_calls *const raid6_recov_algos[] = { #if (defined(__i386__) || defined(__x86_64__)) !defined(__arch_um__) +#ifdef CONFIG_AS_AVX2 + raid6_recov_avx2, +#endif raid6_recov_ssse3, #endif raid6_recov_intx1, diff --git a/lib/raid6/recov_avx2.c b/lib/raid6/recov_avx2.c new file mode 100644 index 000..43a9bab --- /dev/null +++ b/lib/raid6/recov_avx2.c @@ -0,0 +1,327 @@ +/* + * Copyright (C) 2012 Intel Corporation + * Author: Jim Kukunas james.t.kuku...@linux.intel.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 + * of the License. + */ + +#if (defined(__i386__) || defined(__x86_64__)) !defined(__arch_um__) + +#if CONFIG_AS_AVX2 + +#include linux/raid/pq.h +#include x86.h + +static int raid6_has_avx2(void) +{ + return boot_cpu_has(X86_FEATURE_AVX2) + boot_cpu_has(X86_FEATURE_AVX); +} + +static void raid6_2data_recov_avx2(int disks, size_t bytes, int faila, + int failb, void **ptrs) +{ + u8 *p, *q, *dp, *dq; + const u8 *pbmul;/* P multiplier table for B data */ + const u8 *qmul; /* Q multiplier table (for both) */ + const u8 x0f = 0x0f; + + p = (u8 *)ptrs[disks-2]; + q = (u8 *)ptrs[disks-1]; + + /* Compute syndrome with zero for the missing data pages + Use the dead data pages as temporary storage for + delta p and delta q */ + dp = (u8 *)ptrs[faila]; + ptrs[faila] = (void *)raid6_empty_zero_page; + ptrs[disks-2] = dp; + dq = (u8 *)ptrs[failb]; + ptrs[failb] = (void *)raid6_empty_zero_page; + ptrs[disks-1] = dq; + + raid6_call.gen_syndrome(disks, bytes, ptrs); + + /* Restore pointer table */ + ptrs[faila] = dp; + ptrs[failb] = dq; + ptrs[disks-2] = p; + ptrs[disks-1] = q; + + /* Now, pick the proper data tables */ + pbmul = raid6_vgfmul[raid6_gfexi[failb-faila]]; + qmul = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^ + raid6_gfexp[failb]]]; + + kernel_fpu_begin(); + + /* ymm0 = x0f[16] */ + asm volatile(vpbroadcastb %0, %%ymm7 : : m (x0f)); + + while (bytes) { +#ifdef CONFIG_X86_64