Re: [PATCH 10/11] x86/xip: resolve alternative instructions at build

2015-03-24 Thread Jim Kukunas
On Mon, Mar 23, 2015 at 09:33:02AM +0100, Borislav Petkov wrote:
> On Mon, Mar 23, 2015 at 12:46:39AM -0700, Jim Kukunas wrote:
> > Since the .text section can't be updated at run-time, remove the
> > .alternatives sections and update the .text at build time. To pick the
> > proper instructions, Kconfig options are exposed for each X86_FEATURE
> > that needed to be resolved. Each X86_FEATURE gets a corresponding
> > CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option
> > is set, a resolver macro is setup to either generate that instruction,
> > or the fallback. The resolver needs to be defined for each FEATURE, and
> > the proper one is chosen via preprocessor string pasting.
> > 
> > This approach is horrific and ugly.
> 
> You said it.
> 
> And with XIP enabled - whatever that means, your announce message could
> explain a lot more and more verbosely what this whole patchset is all
> about - this kernel is not going to be generic anymore but it will be
> destined only for the machine it is being built for, correct?

Please see my response to Ingo for more information about the patchset.

Yes, regardless of how it's implemented, selecting alternatives at build
time will produce a non-generic kernel (unless all alternative instructions
are disabled and just the fallbacks are used).

> If that is so, how are distros supposed to ship one kernel with XIP or
> is this some obscure feature distros won't have to enable anyway?

XIP isn't a general feature that distros are going to be enabling. It's 
designed for a very specific usage where people are building very custom
kernels.

> Concerning this particular patch, I'd suggest a switch which simply
> disables alternatives patching at run time so that you don't add this
> ifdeffery to alternative.c

I'll look into this.

> Btw, why do you even have to disable the alternatives? I see this in
> your patch 1/11:
> 
> + location in RAM. As a result, when the kernel is located in storage
> + that is addressable by the CPU, the kernel text and read-only data
> + segments are never loaded into memory, thereby using less RAM.
> 
> is this it? To save some memory? Probably embedded, maybe some light
> bulb running linux... Yuck.

Alternatives are disabled because the kernel text will be read-only. For
example, consider the kernel image being stored in and executing from
ROM. Cutting the kernel text and read-only data out of RAM really helps
Linux scale down to smaller systems.

So yes, it's for embedded. Linux has a lot to offer in that area.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center


pgpnw8LNDbWIa.pgp
Description: PGP signature


Re: [RFC] x86 XIP

2015-03-24 Thread Jim Kukunas
On Mon, Mar 23, 2015 at 09:07:14AM +0100, Ingo Molnar wrote:
> * Jim Kukunas  wrote:
> 
> > 
> > Hi Folks,
> > 
> > This patchset introduces eXecute-In-Place (XIP) support for x86. 
> > [...]
> 
> So we'd need a lot better high level description than this:

In future patch revisions, I'll update my coverletter to include the
information below.

>  - a bit of background description: what are the advantages of having
>the kernel image in non-RAM (flash), etc.

Currently for tiny memory-constrained embedded systems, the kernel
configuration is usually stripped down in order to reduce the kernel's 
RAM footprint, freeing up more precious memory for user space and allowing
the kernel to fit into smaller systems.  With XIP, the kernel's text and
read-only data sections are never loaded into RAM, thereby reducing the
kernel's memory usage. Also, since a significant portion of the kernel
is never loaded into RAM, a larger kernel configuration can be used without
bloating memory usage. I haven't done any performance analysis yet, but it's
probably safe to say that executing from storage will negatively affect
performance.

>  - on what hardware/bootloaders is or will be XIP supported?

With regards to supported hardware, these patches aren't targeting any
specific platform. As mentioned in the coverletter, there are current
limits on the supported configurations (32-bit only, no SMP, no PAE),
but these are not technical limits ... I just need to implement support
for them.

With regards to supported bootloaders, I've been testing with a small
bootloader that I wrote specifically for XIP. Which other bootloaders
I add support to will depend on the feedback/requests that I get.

> Also, there should probably be some fail-safe mechanism included: such
> as to check whether caching attributes (MTRRs, PAT) are properly set 
> for the XIP area (at minimum to not be uncacheable).

Good idea. I'll add that into the next revision.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center


pgp9Hs_FSZajB.pgp
Description: PGP signature


Re: [RFC] x86 XIP

2015-03-24 Thread Jim Kukunas
On Mon, Mar 23, 2015 at 09:07:14AM +0100, Ingo Molnar wrote:
 * Jim Kukunas james.t.kuku...@linux.intel.com wrote:
 
  
  Hi Folks,
  
  This patchset introduces eXecute-In-Place (XIP) support for x86. 
  [...]
 
 So we'd need a lot better high level description than this:

In future patch revisions, I'll update my coverletter to include the
information below.

  - a bit of background description: what are the advantages of having
the kernel image in non-RAM (flash), etc.

Currently for tiny memory-constrained embedded systems, the kernel
configuration is usually stripped down in order to reduce the kernel's 
RAM footprint, freeing up more precious memory for user space and allowing
the kernel to fit into smaller systems.  With XIP, the kernel's text and
read-only data sections are never loaded into RAM, thereby reducing the
kernel's memory usage. Also, since a significant portion of the kernel
is never loaded into RAM, a larger kernel configuration can be used without
bloating memory usage. I haven't done any performance analysis yet, but it's
probably safe to say that executing from storage will negatively affect
performance.

  - on what hardware/bootloaders is or will be XIP supported?

With regards to supported hardware, these patches aren't targeting any
specific platform. As mentioned in the coverletter, there are current
limits on the supported configurations (32-bit only, no SMP, no PAE),
but these are not technical limits ... I just need to implement support
for them.

With regards to supported bootloaders, I've been testing with a small
bootloader that I wrote specifically for XIP. Which other bootloaders
I add support to will depend on the feedback/requests that I get.

 Also, there should probably be some fail-safe mechanism included: such
 as to check whether caching attributes (MTRRs, PAT) are properly set 
 for the XIP area (at minimum to not be uncacheable).

Good idea. I'll add that into the next revision.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center


pgp9Hs_FSZajB.pgp
Description: PGP signature


Re: [PATCH 10/11] x86/xip: resolve alternative instructions at build

2015-03-24 Thread Jim Kukunas
On Mon, Mar 23, 2015 at 09:33:02AM +0100, Borislav Petkov wrote:
 On Mon, Mar 23, 2015 at 12:46:39AM -0700, Jim Kukunas wrote:
  Since the .text section can't be updated at run-time, remove the
  .alternatives sections and update the .text at build time. To pick the
  proper instructions, Kconfig options are exposed for each X86_FEATURE
  that needed to be resolved. Each X86_FEATURE gets a corresponding
  CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option
  is set, a resolver macro is setup to either generate that instruction,
  or the fallback. The resolver needs to be defined for each FEATURE, and
  the proper one is chosen via preprocessor string pasting.
  
  This approach is horrific and ugly.
 
 You said it.
 
 And with XIP enabled - whatever that means, your announce message could
 explain a lot more and more verbosely what this whole patchset is all
 about - this kernel is not going to be generic anymore but it will be
 destined only for the machine it is being built for, correct?

Please see my response to Ingo for more information about the patchset.

Yes, regardless of how it's implemented, selecting alternatives at build
time will produce a non-generic kernel (unless all alternative instructions
are disabled and just the fallbacks are used).

 If that is so, how are distros supposed to ship one kernel with XIP or
 is this some obscure feature distros won't have to enable anyway?

XIP isn't a general feature that distros are going to be enabling. It's 
designed for a very specific usage where people are building very custom
kernels.

 Concerning this particular patch, I'd suggest a switch which simply
 disables alternatives patching at run time so that you don't add this
 ifdeffery to alternative.c

I'll look into this.

 Btw, why do you even have to disable the alternatives? I see this in
 your patch 1/11:
 
 + location in RAM. As a result, when the kernel is located in storage
 + that is addressable by the CPU, the kernel text and read-only data
 + segments are never loaded into memory, thereby using less RAM.
 
 is this it? To save some memory? Probably embedded, maybe some light
 bulb running linux... Yuck.

Alternatives are disabled because the kernel text will be read-only. For
example, consider the kernel image being stored in and executing from
ROM. Cutting the kernel text and read-only data out of RAM really helps
Linux scale down to smaller systems.

So yes, it's for embedded. Linux has a lot to offer in that area.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center


pgpnw8LNDbWIa.pgp
Description: PGP signature


[RFC] x86 XIP

2015-03-23 Thread Jim Kukunas

Hi Folks,

This patchset introduces eXecute-In-Place (XIP) support for x86. Right now only
minimal configurations are supported (32-bit only, no SMP, no PAE, and so on).
My goal is to increase the number of supported configurations in the future 
based on what functionality is requested. This patchset only supports storage
configurations where the kernel text and read-only data will always be readable.

I didn't create a special Makefile target for building xip images, like how ARM
has xipImage. Instead, I'm just using the basic vmlinux ELF executable. The 
kernel must be built with CONFIG_XIP_BASE set to the physical address of the 
vmlinux file. Additionally, since the .text section is read-only, all of the
alternative instructions need to be resolved at build-time. To accomplish this,
the cpu features to enable are selected through a series of Kconfig options.
In order to boot, the bootloader just needs to fill out the zero page (whose
address startup_32() expects in esi), switch to 32-bit protected mode and then
jump into startup_32(), which will be at CONFIG_XIP_BASE plus one page.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] x86/xip: copy writable sections into RAM

2015-03-23 Thread Jim Kukunas
Loads all writable and non-zero sections into their VMA.

Signed-off-by: Jim Kukunas 
---
 arch/x86/include/asm/sections.h |  4 
 arch/x86/kernel/head_32.S   | 22 ++
 arch/x86/kernel/vmlinux.lds.S   |  4 
 3 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 0a52424..9535e95 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#ifdef CONFIG_XIP_KERNEL
+extern char phys_sdata[];
+#endif
+
 #endif /* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f36bd42..80f344a 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -86,6 +86,28 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
  */
 __HEAD
 ENTRY(startup_32)
+
+#ifdef CONFIG_XIP_KERNEL
+   /*
+* Copy writable sections into RAM
+*/
+
+   movl %esi, %ebp # Preserve pointer to zero-page
+
+   leal pa(_sdata), %edi
+   leal phys_edata, %ecx
+   leal phys_sdata, %esi
+   subl %esi, %ecx
+
+   cld
+xip_data_cp:
+   lodsb
+   stosb
+   loop xip_data_cp
+
+   movl %ebp, %esi
+#endif
+
movl pa(stack_start),%ecx

/* test KEEP_SEGMENTS flag to see if the bootloader is asking
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 414a1ac..59a9edb 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -133,6 +133,9 @@ SECTIONS
RO_DATA(PAGE_SIZE)
X64_ALIGN_DEBUG_RODATA_END
 
+   phys_sdata = LOADADDR(.data);
+   phys_edata = phys_sdata + (_end_nonzero - _sdata);
+
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN {
/* Start of data section */
@@ -319,6 +322,7 @@ SECTIONS
NOSAVE_DATA
}
 #endif
+   _end_nonzero = .;
 
/* BSS */
. = ALIGN(PAGE_SIZE);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] x86/xip: add XIP_KERNEL and XIP_BASE options

2015-03-23 Thread Jim Kukunas
The CONFIG_XIP_KERNEL Kconfig option enables eXecute-In-Place
(XIP) support. When XIP_KERNEL is set, XIP_BASE points to the
physical address of the vmlinux ELF file.

Signed-off-by: Jim Kukunas 
---
 arch/x86/Kconfig | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca..f5fa02c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -294,6 +294,25 @@ config ZONE_DMA
 
  If unsure, say Y.
 
+config XIP_KERNEL
+   bool "eXecute-In-Place (XIP) support" if (X86_32 && EXPERT && EMBEDDED)
+   depends on !MODULES && !X86_PAE && !SMP
+   default n
+   help
+ With this option enabled, the text and any read-only segments of
+ the kernel are not copied from their initial location to their usual
+ location in RAM. As a result, when the kernel is located in storage
+ that is addressable by the CPU, the kernel text and read-only data
+ segments are never loaded into memory, thereby using less RAM.
+
+ Only enable this option if you know what you're doing.
+
+config XIP_BASE
+   hex "Physical address of XIP kernel" if XIP_KERNEL
+   default "0xFF80"
+   help
+ The physical address for the beginning of the vmlinux file.
+
 config SMP
bool "Symmetric multi-processing support"
---help---
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] x86/xip: after paging trampoline, discard PMDs above _brk

2015-03-23 Thread Jim Kukunas
In the likely case that XIP_BASE is above PAGE_OFFSET, we
want to discard any early identity mappings. So rather than
keeping every PMD above PAGE_OFFSET, only copy the ones from
PAGE_OFFSET to the last PMD of _end. At this point, the linear
address space should look normal.

Signed-off-by: Jim Kukunas 
---
 arch/x86/include/asm/pgtable.h | 7 +++
 arch/x86/kernel/setup.c| 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a0c35bf..5eaba7d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -660,6 +660,13 @@ static inline int pgd_none(pgd_t pgd)
 #define KERNEL_PGD_BOUNDARYpgd_index(PAGE_OFFSET)
 #define KERNEL_PGD_PTRS(PTRS_PER_PGD - KERNEL_PGD_BOUNDARY)
 
+#ifdef CONFIG_XIP_KERNEL
+#define BOOT_PGD_COPY_PTRS \
+   ((pgd_index((unsigned long)_end)  - pgd_index(PAGE_OFFSET)) + 4)
+#else
+#define BOOT_PGD_COPY_PTRS KERNEL_PGD_PTRS
+#endif
+
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e2d85c4..d276ebf 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -894,7 +894,7 @@ void __init setup_arch(char **cmdline_p)
 */
clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
initial_page_table + KERNEL_PGD_BOUNDARY,
-   KERNEL_PGD_PTRS);
+   BOOT_PGD_COPY_PTRS);
 
load_cr3(swapper_pg_dir);
/*
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] x86/xip: make e820_add_kernel_range() a NOP

2015-03-23 Thread Jim Kukunas
e820_add_kernel_range() checks whether the kernel text is present
in the e820 map, and marked as usable RAM. If not, it modifies
the e820 map accordingly.

For XIP, that is unnecessary since the kernel text won't be loaded
in RAM.

Signed-off-by: Jim Kukunas 
---
 arch/x86/kernel/setup.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d276ebf..74fc6c8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -787,6 +787,7 @@ static void __init trim_bios_range(void)
 }
 
 /* called before trim_bios_range() to spare extra sanitize */
+#ifndef CONFIG_XIP_KERNEL
 static void __init e820_add_kernel_range(void)
 {
u64 start = __pa_symbol(_text);
@@ -806,6 +807,11 @@ static void __init e820_add_kernel_range(void)
e820_remove_range(start, size, E820_RAM, 0);
e820_add_region(start, size, E820_RAM);
 }
+#else
+static void __init e820_add_kernel_range(void)
+{
+}
+#endif
 
 static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/11] x86/xip: in setup_arch(), handle resource physical addr

2015-03-23 Thread Jim Kukunas
set code_resources to proper physical addr in setup_arch()

Signed-off-by: Jim Kukunas 
---
 arch/x86/kernel/setup.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 74fc6c8..f044453 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -986,9 +986,16 @@ void __init setup_arch(char **cmdline_p)
 
mpx_mm_init(_mm);
 
+#ifndef CONFIG_XIP_KERNEL
code_resource.start = __pa_symbol(_text);
code_resource.end = __pa_symbol(_etext)-1;
+   data_resource.start = _pa(_sdata)-1;
+#else
+   code_resource.start = CONFIG_XIP_BASE;
+   code_resource.end = (phys_addr_t)phys_sdata-1;
data_resource.start = __pa_symbol(_etext);
+#endif
+
data_resource.end = __pa_symbol(_edata)-1;
bss_resource.start = __pa_symbol(__bss_start);
bss_resource.end = __pa_symbol(__bss_stop)-1;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] x86/xip: snip the kernel text out of the memory mapping

2015-03-23 Thread Jim Kukunas
If the kernel tries to create an identity region for a memory range
that spans the kernel text, split it into two pieces, skipping the
text section. Otherwise, this will setup the standard text mapping,
which will point to the normal RAM location for text instead of the
XIP_BASE location.

Signed-off-by: Jim Kukunas 
---
 arch/x86/mm/init.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index a110efc..07b20c6 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -391,6 +391,82 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned 
long end_pfn)
return false;
 }
 
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * Cut the .text virtual address out of mem range b/c the mapping
+ * is already correctly setup
+ */
+static inline void snip_xip_text(struct map_range *mr, int *nr_range)
+{
+   int i;
+
+   for (i = 0; i < *nr_range; i++) {
+   long diff;
+
+   if (mr[i].start <= CONFIG_PHYSICAL_START &&
+   mr[i].end <= CONFIG_PHYSICAL_START)
+   continue;
+   if (mr[i].start >= __pa_symbol(_sdata))
+   continue;
+
+   diff = mr[i].start - CONFIG_PHYSICAL_START;
+   if (diff < 0) { /* range starts below .text and includes it */
+   diff = mr[i].end - __pa_symbol(_sdata);
+
+   /* shorten segment so it ends just before .text */
+   mr[i].end = CONFIG_PHYSICAL_START;
+
+   /* if segment goes past .text, add 2nd segment*/
+   if (diff > 0) {
+   /* move next section down 1 */
+   if (i + 1 < *nr_range) {
+   memmove([i + 1], [i + 2],
+   sizeof(struct map_range[
+   *nr_range - i - 2]));
+   }
+   mr[i + 1].start = __pa_symbol(_sdata);
+   mr[i + 1].end =  mr[i + 1].start + diff;
+   mr[i + 1].page_size_mask = 0;
+   *nr_range = *nr_range + 1;
+   i++;
+   }
+   } else if (diff == 0) {
+   diff = mr[i].end - __pa_symbol(_sdata);
+   if (diff > 0) {
+   mr[i].start = __pa_symbol(_sdata);
+   mr[i].end = mr[i].start + diff;
+   mr[i].page_size_mask = 0;
+   } else {
+   /* delete this range */
+   memmove([i + 1], [i], sizeof(
+   struct map_range[*nr_range - i - 1]));
+   *nr_range = *nr_range - 1;
+   i--;
+   }
+   } else if (diff > 0) {
+   long ediff = mr[i].end - __pa_symbol(_sdata);
+
+   if (ediff > 0) {
+   mr[i].start = __pa_symbol(_sdata);
+   mr[i].end = mr[i].start + ediff;
+   mr[i].page_size_mask = 0;
+   } else {
+   /* delete this range */
+   memmove([i + 1], [i], sizeof(
+   struct map_range[*nr_range - i - 1]));
+   *nr_range = *nr_range - 1;
+   i--;
+   }
+   }
+   break;
+   }
+}
+#else
+static inline void snip_xip_text(struct map_range *mr, int *mr_range)
+{
+}
+#endif
+
 /*
  * Setup the direct mapping of the physical memory at PAGE_OFFSET.
  * This runs before bootmem is initialized and gets pages directly from
@@ -409,6 +485,8 @@ unsigned long __init_refok init_memory_mapping(unsigned 
long start,
memset(mr, 0, sizeof(mr));
nr_range = split_mem_range(mr, 0, start, end);
 
+   snip_xip_text(mr, _range);
+
for (i = 0; i < nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
   mr[i].page_size_mask);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] x86/xip: XIP boot trampoline page tables

2015-03-23 Thread Jim Kukunas
Constructs the trampoline page tables for early XIP boot.

Signed-off-by: Jim Kukunas 
---
 arch/x86/kernel/head_32.S | 85 +++
 1 file changed, 85 insertions(+)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 80f344a..642d73b 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -227,6 +227,90 @@ xip_data_cp:
movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 #else  /* Not PAE */
 
+#ifdef CONFIG_XIP_KERNEL
+   movl $pa(__brk_base), %edi
+   movl $pa(initial_page_table), %edx
+
+   movl $PTE_IDENT_ATTR, %eax  /* EAX holds identity mapping addr */
+   movl $__PAGE_OFFSET + PTE_IDENT_ATTR, %ebx /* EBX holds kernel addr */
+
+.Lxip_mapping:
+/* Allocate or Load Identity PDE */
+   leal -PTE_IDENT_ATTR(%eax), %ebp
+   andl $0xFFC0, %ebp
+   shrl $20, %ebp
+   movl (%edx, %ebp), %ecx
+
+   test %ecx, %ecx
+   jnz .Lskip_ident_pde_alloc
+   leal PDE_IDENT_ATTR(%edi), %ecx
+   addl $4096, %edi
+   movl %ecx, (%edx, %ebp)
+
+.Lskip_ident_pde_alloc:
+   leal -PDE_IDENT_ATTR(%ecx), %ecx
+   leal -PTE_IDENT_ATTR(%eax), %ebp
+   andl $0x3FF000, %ebp
+   shrl $10, %ebp
+   movl %eax, (%ecx, %ebp)
+
+/* Allocate or Load PAGE_OFFSET PDE */
+   leal -PTE_IDENT_ATTR(%ebx), %ebp
+   andl $0xFFC0, %ebp
+   shrl $20, %ebp
+   movl (%edx, %ebp), %ecx
+
+   test %ecx, %ecx
+   jnz .Lskip_offset_pde_alloc
+   leal PDE_IDENT_ATTR(%edi), %ecx
+   addl $4096, %edi
+   movl %ecx, (%edx, %ebp)
+
+.Lskip_offset_pde_alloc:
+   leal -PDE_IDENT_ATTR(%ecx), %ecx
+   leal -PTE_IDENT_ATTR(%ebx), %ebp
+   andl $0x3FF000, %ebp
+   shrl $10, %ebp
+   movl %eax, (%ecx, %ebp)
+
+   addl $4096, %eax
+   addl $4096, %ebx
+
+   cmpl $CONFIG_PHYSICAL_START + PTE_IDENT_ATTR, %eax
+   je   .Lsetup_text_addr
+
+   cmpl $phys_sdata + PTE_IDENT_ATTR, %eax
+   je   .Lsetup_data_addr
+
+   cmpl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax
+   je   .Ldone
+
+   jmp  .Lxip_mapping
+
+.Lsetup_text_addr:
+   movl $CONFIG_XIP_BASE + 4096 + PTE_IDENT_ATTR, %eax
+   movl $_text, %ebx
+   addl $PTE_IDENT_ATTR, %ebx
+   jmp  .Lxip_mapping
+
+.Lsetup_data_addr:
+   movl $pa(_sdata), %eax
+   addl $PTE_IDENT_ATTR, %eax
+   movl $_sdata, %ebx
+   addl $PTE_IDENT_ATTR, %ebx
+   jmp  .Lxip_mapping
+.Ldone:
+   addl $__PAGE_OFFSET, %edi
+   movl %edi, pa(_brk_end)
+   movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax
+   shrl $12, %eax
+   movl %eax, pa(max_pfn_mapped)
+
+   movl $pa(initial_pg_fixmap) + PTE_IDENT_ATTR, %eax
+   movl %eax, pa(initial_page_table + 0xFFC)
+
+#else
+
 page_pde_offset = (__PAGE_OFFSET >> 20);
 
movl $pa(__brk_base), %edi
@@ -257,6 +341,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
movl %eax,pa(initial_page_table+0xffc)
 #endif
+#endif
 
 #ifdef CONFIG_PARAVIRT
/* This is can only trip for a broken bootloader... */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] x86/xip: resolve alternative instructions at build

2015-03-23 Thread Jim Kukunas
Since the .text section can't be updated at run-time, remove the
.alternatives sections and update the .text at build time. To pick the
proper instructions, Kconfig options are exposed for each X86_FEATURE
that needed to be resolved. Each X86_FEATURE gets a corresponding
CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option
is set, a resolver macro is setup to either generate that instruction,
or the fallback. The resolver needs to be defined for each FEATURE, and
the proper one is chosen via preprocessor string pasting.

This approach is horrific and ugly. A better approach might be to add
an additional build step that, after generating the vmlinux file, goes
through the alternatives section and performs the fixups on the file.

At the very least, a script like mkcapflags.sh could generate the
resolver functions automatically. But since it's adding Kconfig
options, it would need to run unconditionally before any of the
config related Makefile targets.

Signed-off-by: Jim Kukunas 
---
 arch/x86/Kconfig   |  45 +
 arch/x86/include/asm/alternative-xip.h | 161 +
 arch/x86/include/asm/alternative.h |   5 +
 arch/x86/kernel/alternative.c  |   7 ++
 arch/x86/kernel/cpu/bugs.c |   2 +
 arch/x86/kernel/setup.c|   2 +
 arch/x86/kernel/smpboot.c  |   2 +
 arch/x86/kernel/vmlinux.lds.S  |   2 +
 arch/x86/vdso/vma.c|   2 +
 9 files changed, 228 insertions(+)
 create mode 100644 arch/x86/include/asm/alternative-xip.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f5fa02c..dff781d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -313,6 +313,51 @@ config XIP_BASE
help
  The physical address for the beginning of the vmlinux file.
 
+menu "XIP Alternative Instructions"
+   depends on XIP_KERNEL
+
+   config XIP_ENABLE_X86_FEATURE_POPCNT
+   bool "Enable POPCNT alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_BUG_11AP
+   bool "Enable 11AP alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XMM2
+   bool "Enable XMM2 alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_MFENCE_RDTSC
+   bool "Enable MFENCE_RDTSC alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_LFENCE_RDTSC
+   bool "Enable LFENCE_RDTSC alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_3DNOW
+   bool "Enable 3DNOW alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XMM
+   bool "Enabled XMM alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_CLFLUSHOPT
+   bool "Enable CLFLUSHOPT alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XSAVEOPT
+   bool "Enable XSAVEOPT alternative instructions"
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XSAVES
+   bool "Enable XSAVES alternative instructions"
+   default n
+
+endmenu
+
 config SMP
bool "Symmetric multi-processing support"
---help---
diff --git a/arch/x86/include/asm/alternative-xip.h 
b/arch/x86/include/asm/alternative-xip.h
new file mode 100644
index 000..84f544e
--- /dev/null
+++ b/arch/x86/include/asm/alternative-xip.h
@@ -0,0 +1,161 @@
+#ifndef _ASM_X86_ALTERNATIVE_XIP_H
+#define _ASM_X86_ALTERNATIVE_XIP_H
+
+/*
+ * Alternative instruction fixup for XIP
+ *
+ * Copyright (C) 2014 Intel Corporation
+ * Author: Jim Kukunas 
+ *
+ * Since the kernel text is executing from storage and is
+ * read-only, we can't update the opcodes in-flight. Instead,
+ * resolve the alternatives at build time through preprocessor
+ * (ab)use.
+ */
+
+#ifdef CONFIG_SMP
+#define LOCK_PREFIX "\n\tlock; "
+#else
+#define LOCK_PREFIX ""
+#endif
+
+extern int poke_int3_handler(struct pt_regs *regs);
+
+/* TODO hook up to something like mkcapflags.sh */
+/* Unfortunately, each X86_FEATURE will need a corresponding define like this 
*/
+#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_POPCNT
+#define RESOLVE_X86_FEATURE_POPCNT(old, new) new
+#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) new2
+#else
+#define RESOLVE_X86_FEATURE_POPCNT(old, new) old
+#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) \
+   resolve1(old, new1)
+#endif
+
+#ifdef CONFIG_XIP_ENABLE_X86_BUG_11AP
+#define RESOLVE_X86_BUG_11AP(old, new) new
+#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) new2
+#else
+#define RESOLVE_X86_BUG_11AP(old, new) old
+#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) \
+   resolve1(old, new1)
+#endif
+
+#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_XMM2
+#define RESOLVE_X86_FEATURE_XMM2(old, new) new
+#define RESOLVE_2_X86_FEATURE_XMM2(old, new

[PATCH 11/11] x86/xip: update _va() and _pa() macros

2015-03-23 Thread Jim Kukunas
For obtaining the physical address, we always take the slow path
of slow_virt_to_phys(). In the future, we should probably special
case data addresses to avoid walking the page table. For obtaining
a virtual address, this patch introduces a slow path of
slow_xip_phys_to_virt().

Signed-off-by: Jim Kukunas 
---
 arch/x86/include/asm/page.h | 15 +++
 arch/x86/mm/pageattr.c  | 11 +++
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..b54c7be 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -6,6 +6,7 @@
 #ifdef __KERNEL__
 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -37,8 +38,15 @@ static inline void copy_user_page(void *to, void *from, 
unsigned long vaddr,
alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+
+#ifdef CONFIG_XIP_KERNEL   /* TODO special case text translations */
+#define __pa(x)slow_virt_to_phys((void *)(x))
+#define __pa_nodebug   slow_virt_to_phys((void *)(x))
+#else
 #define __pa(x)__phys_addr((unsigned long)(x))
 #define __pa_nodebug(x)__phys_addr_nodebug((unsigned long)(x))
+#endif
+
 /* __pa_symbol should be used for C visible symbols.
This seems to be the official gcc blessed way to do such arithmetic. */
 /*
@@ -51,7 +59,14 @@ static inline void copy_user_page(void *to, void *from, 
unsigned long vaddr,
 #define __pa_symbol(x) \
__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+
+#ifdef CONFIG_XIP_KERNEL
+extern unsigned long slow_xip_phys_to_virt(phys_addr_t);
+
+#define __va(x)((void 
*)slow_xip_phys_to_virt((phys_addr_t)x))
+#else
 #define __va(x)((void *)((unsigned 
long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)   __va(x)
 #define __boot_pa(x)   __pa(x)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 536ea2f..ca9e2ca 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -383,6 +383,17 @@ static pte_t *_lookup_address_cpa(struct cpa_data *cpa, 
unsigned long address,
 return lookup_address(address, level);
 }
 
+#ifdef CONFIG_XIP_KERNEL
+unsigned long slow_xip_phys_to_virt(phys_addr_t x)
+{
+   if (x >= CONFIG_XIP_BASE && x <= (phys_addr_t)phys_sdata) {
+   unsigned long off = x - CONFIG_XIP_BASE;
+   return PAGE_OFFSET + off;
+   }
+   return x + PAGE_OFFSET;
+}
+#endif
+
 /*
  * Lookup the PMD entry for a virtual address. Return a pointer to the entry
  * or NULL if not present.
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] x86/xip: reserve memblock for only data

2015-03-23 Thread Jim Kukunas
Nothing is loaded at the usual spot for .text, starting at
CONFIG_PHYSICAL_START, so we don't reserve it. Additionally,
the physical address of the _text isn't going to be physically
contiguous with _data.

Signed-off-by: Jim Kukunas 
---
 arch/x86/kernel/setup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 98dc931..e2d85c4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -869,8 +869,13 @@ dump_kernel_offset(struct notifier_block *self, unsigned 
long v, void *p)
 
 void __init setup_arch(char **cmdline_p)
 {
+#ifdef CONFIG_XIP_KERNEL
+   memblock_reserve(__pa_symbol(_sdata),
+   (unsigned long)__bss_stop - (unsigned long)_sdata);
+#else
memblock_reserve(__pa_symbol(_text),
 (unsigned long)__bss_stop - (unsigned long)_text);
+ #endif
 
early_reserve_initrd();
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] x86/xip: Update address of sections in linker script

2015-03-23 Thread Jim Kukunas
In order to update the LMA for each section, according to
CONFIG_XIP_BASE, this patch uses the preprocessor to change
the arguments passed to the AT keyword. Each LMA is updated
to that symbol's physical address.

The text section is aligned to a page so that the ELF
header at the beginning of XIP_BASE isn't mapped into
the linear address space. Also the initial location counter
is incremented to account for the ELF header.

Signed-off-by: Jim Kukunas 
---
 arch/x86/include/asm/boot.h   |  4 
 arch/x86/kernel/vmlinux.lds.S | 17 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 4fa687a..a128c71 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -10,6 +10,10 @@
+ (CONFIG_PHYSICAL_ALIGN - 1)) \
& ~(CONFIG_PHYSICAL_ALIGN - 1))
 
+#ifdef CONFIG_XIP_KERNEL
+#define PHYS_XIP_OFFSET (CONFIG_XIP_BASE - (LOAD_OFFSET + LOAD_PHYSICAL_ADDR))
+#endif
+
 /* Minimum kernel alignment, as a power of two */
 #ifdef CONFIG_X86_64
 #define MIN_KERNEL_ALIGN_LG2   PMD_SHIFT
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00bf300..414a1ac 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -20,6 +20,15 @@
 #define LOAD_OFFSET __START_KERNEL_map
 #endif
 
+#ifdef CONFIG_XIP_KERNEL
+#define AT(x)  AT(x + LOAD_OFFSET + PHYS_XIP_OFFSET)
+#define TEXT_ALIGN ALIGN(0x1000)
+#define DATA_ALIGN ALIGN(0x20)
+#else
+#define TEXT_ALIGN
+#define DATA_ALIGN
+#endif
+
 #include 
 #include 
 #include 
@@ -89,8 +98,12 @@ SECTIONS
 phys_startup_64 = startup_64 - LOAD_OFFSET;
 #endif
 
+#ifdef CONFIG_XIP_KERNEL
+   . += SIZEOF_HEADERS;
+#endif
+
/* Text and read-only data */
-   .text :  AT(ADDR(.text) - LOAD_OFFSET) {
+   .text :  AT(ADDR(.text) - LOAD_OFFSET) TEXT_ALIGN {
_text = .;
/* bootstrapping code */
HEAD_TEXT
@@ -121,7 +134,7 @@ SECTIONS
X64_ALIGN_DEBUG_RODATA_END
 
/* Data */
-   .data : AT(ADDR(.data) - LOAD_OFFSET) {
+   .data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN {
/* Start of data section */
_sdata = .;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/11] x86/xip: XIP boot trampoline page tables

2015-03-23 Thread Jim Kukunas
Constructs the trampoline page tables for early XIP boot.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/kernel/head_32.S | 85 +++
 1 file changed, 85 insertions(+)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 80f344a..642d73b 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -227,6 +227,90 @@ xip_data_cp:
movl %eax,pa(initial_pg_pmd+0x1000*KPMDS-8)
 #else  /* Not PAE */
 
+#ifdef CONFIG_XIP_KERNEL
+   movl $pa(__brk_base), %edi
+   movl $pa(initial_page_table), %edx
+
+   movl $PTE_IDENT_ATTR, %eax  /* EAX holds identity mapping addr */
+   movl $__PAGE_OFFSET + PTE_IDENT_ATTR, %ebx /* EBX holds kernel addr */
+
+.Lxip_mapping:
+/* Allocate or Load Identity PDE */
+   leal -PTE_IDENT_ATTR(%eax), %ebp
+   andl $0xFFC0, %ebp
+   shrl $20, %ebp
+   movl (%edx, %ebp), %ecx
+
+   test %ecx, %ecx
+   jnz .Lskip_ident_pde_alloc
+   leal PDE_IDENT_ATTR(%edi), %ecx
+   addl $4096, %edi
+   movl %ecx, (%edx, %ebp)
+
+.Lskip_ident_pde_alloc:
+   leal -PDE_IDENT_ATTR(%ecx), %ecx
+   leal -PTE_IDENT_ATTR(%eax), %ebp
+   andl $0x3FF000, %ebp
+   shrl $10, %ebp
+   movl %eax, (%ecx, %ebp)
+
+/* Allocate or Load PAGE_OFFSET PDE */
+   leal -PTE_IDENT_ATTR(%ebx), %ebp
+   andl $0xFFC0, %ebp
+   shrl $20, %ebp
+   movl (%edx, %ebp), %ecx
+
+   test %ecx, %ecx
+   jnz .Lskip_offset_pde_alloc
+   leal PDE_IDENT_ATTR(%edi), %ecx
+   addl $4096, %edi
+   movl %ecx, (%edx, %ebp)
+
+.Lskip_offset_pde_alloc:
+   leal -PDE_IDENT_ATTR(%ecx), %ecx
+   leal -PTE_IDENT_ATTR(%ebx), %ebp
+   andl $0x3FF000, %ebp
+   shrl $10, %ebp
+   movl %eax, (%ecx, %ebp)
+
+   addl $4096, %eax
+   addl $4096, %ebx
+
+   cmpl $CONFIG_PHYSICAL_START + PTE_IDENT_ATTR, %eax
+   je   .Lsetup_text_addr
+
+   cmpl $phys_sdata + PTE_IDENT_ATTR, %eax
+   je   .Lsetup_data_addr
+
+   cmpl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax
+   je   .Ldone
+
+   jmp  .Lxip_mapping
+
+.Lsetup_text_addr:
+   movl $CONFIG_XIP_BASE + 4096 + PTE_IDENT_ATTR, %eax
+   movl $_text, %ebx
+   addl $PTE_IDENT_ATTR, %ebx
+   jmp  .Lxip_mapping
+
+.Lsetup_data_addr:
+   movl $pa(_sdata), %eax
+   addl $PTE_IDENT_ATTR, %eax
+   movl $_sdata, %ebx
+   addl $PTE_IDENT_ATTR, %ebx
+   jmp  .Lxip_mapping
+.Ldone:
+   addl $__PAGE_OFFSET, %edi
+   movl %edi, pa(_brk_end)
+   movl $pa(_end) + MAPPING_BEYOND_END + PTE_IDENT_ATTR, %eax
+   shrl $12, %eax
+   movl %eax, pa(max_pfn_mapped)
+
+   movl $pa(initial_pg_fixmap) + PTE_IDENT_ATTR, %eax
+   movl %eax, pa(initial_page_table + 0xFFC)
+
+#else
+
 page_pde_offset = (__PAGE_OFFSET  20);
 
movl $pa(__brk_base), %edi
@@ -257,6 +341,7 @@ page_pde_offset = (__PAGE_OFFSET  20);
movl $pa(initial_pg_fixmap)+PDE_IDENT_ATTR,%eax
movl %eax,pa(initial_page_table+0xffc)
 #endif
+#endif
 
 #ifdef CONFIG_PARAVIRT
/* This is can only trip for a broken bootloader... */
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/11] x86/xip: resolve alternative instructions at build

2015-03-23 Thread Jim Kukunas
Since the .text section can't be updated at run-time, remove the
.alternatives sections and update the .text at build time. To pick the
proper instructions, Kconfig options are exposed for each X86_FEATURE
that needed to be resolved. Each X86_FEATURE gets a corresponding
CONFIG_XIP_ENABLE_X86_FEATURE_ option. Based on whether this option
is set, a resolver macro is setup to either generate that instruction,
or the fallback. The resolver needs to be defined for each FEATURE, and
the proper one is chosen via preprocessor string pasting.

This approach is horrific and ugly. A better approach might be to add
an additional build step that, after generating the vmlinux file, goes
through the alternatives section and performs the fixups on the file.

At the very least, a script like mkcapflags.sh could generate the
resolver functions automatically. But since it's adding Kconfig
options, it would need to run unconditionally before any of the
config related Makefile targets.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/Kconfig   |  45 +
 arch/x86/include/asm/alternative-xip.h | 161 +
 arch/x86/include/asm/alternative.h |   5 +
 arch/x86/kernel/alternative.c  |   7 ++
 arch/x86/kernel/cpu/bugs.c |   2 +
 arch/x86/kernel/setup.c|   2 +
 arch/x86/kernel/smpboot.c  |   2 +
 arch/x86/kernel/vmlinux.lds.S  |   2 +
 arch/x86/vdso/vma.c|   2 +
 9 files changed, 228 insertions(+)
 create mode 100644 arch/x86/include/asm/alternative-xip.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f5fa02c..dff781d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -313,6 +313,51 @@ config XIP_BASE
help
  The physical address for the beginning of the vmlinux file.
 
+menu XIP Alternative Instructions
+   depends on XIP_KERNEL
+
+   config XIP_ENABLE_X86_FEATURE_POPCNT
+   bool Enable POPCNT alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_BUG_11AP
+   bool Enable 11AP alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XMM2
+   bool Enable XMM2 alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_MFENCE_RDTSC
+   bool Enable MFENCE_RDTSC alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_LFENCE_RDTSC
+   bool Enable LFENCE_RDTSC alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_3DNOW
+   bool Enable 3DNOW alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XMM
+   bool Enabled XMM alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_CLFLUSHOPT
+   bool Enable CLFLUSHOPT alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XSAVEOPT
+   bool Enable XSAVEOPT alternative instructions
+   default n
+
+   config XIP_ENABLE_X86_FEATURE_XSAVES
+   bool Enable XSAVES alternative instructions
+   default n
+
+endmenu
+
 config SMP
bool Symmetric multi-processing support
---help---
diff --git a/arch/x86/include/asm/alternative-xip.h 
b/arch/x86/include/asm/alternative-xip.h
new file mode 100644
index 000..84f544e
--- /dev/null
+++ b/arch/x86/include/asm/alternative-xip.h
@@ -0,0 +1,161 @@
+#ifndef _ASM_X86_ALTERNATIVE_XIP_H
+#define _ASM_X86_ALTERNATIVE_XIP_H
+
+/*
+ * Alternative instruction fixup for XIP
+ *
+ * Copyright (C) 2014 Intel Corporation
+ * Author: Jim Kukunas james.t.kuku...@linux.intel.com
+ *
+ * Since the kernel text is executing from storage and is
+ * read-only, we can't update the opcodes in-flight. Instead,
+ * resolve the alternatives at build time through preprocessor
+ * (ab)use.
+ */
+
+#ifdef CONFIG_SMP
+#define LOCK_PREFIX \n\tlock; 
+#else
+#define LOCK_PREFIX 
+#endif
+
+extern int poke_int3_handler(struct pt_regs *regs);
+
+/* TODO hook up to something like mkcapflags.sh */
+/* Unfortunately, each X86_FEATURE will need a corresponding define like this 
*/
+#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_POPCNT
+#define RESOLVE_X86_FEATURE_POPCNT(old, new) new
+#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) new2
+#else
+#define RESOLVE_X86_FEATURE_POPCNT(old, new) old
+#define RESOLVE_2_X86_FEATURE_POPCNT(old, new1, resolve1, new2) \
+   resolve1(old, new1)
+#endif
+
+#ifdef CONFIG_XIP_ENABLE_X86_BUG_11AP
+#define RESOLVE_X86_BUG_11AP(old, new) new
+#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) new2
+#else
+#define RESOLVE_X86_BUG_11AP(old, new) old
+#define RESOLVE_2_X86_BUG_11AP(old, new1, resolve1, new2) \
+   resolve1(old, new1)
+#endif
+
+#ifdef CONFIG_XIP_ENABLE_X86_FEATURE_XMM2
+#define RESOLVE_X86_FEATURE_XMM2(old, new) new
+#define RESOLVE_2_X86_FEATURE_XMM2(old, new1, resolve1, new2) new2
+#else
+#define RESOLVE_X86_FEATURE_XMM2(old, new) old
+#define

[PATCH 08/11] x86/xip: in setup_arch(), handle resource physical addr

2015-03-23 Thread Jim Kukunas
set code_resources to proper physical addr in setup_arch()

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/kernel/setup.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 74fc6c8..f044453 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -986,9 +986,16 @@ void __init setup_arch(char **cmdline_p)
 
mpx_mm_init(init_mm);
 
+#ifndef CONFIG_XIP_KERNEL
code_resource.start = __pa_symbol(_text);
code_resource.end = __pa_symbol(_etext)-1;
+   data_resource.start = _pa(_sdata)-1;
+#else
+   code_resource.start = CONFIG_XIP_BASE;
+   code_resource.end = (phys_addr_t)phys_sdata-1;
data_resource.start = __pa_symbol(_etext);
+#endif
+
data_resource.end = __pa_symbol(_edata)-1;
bss_resource.start = __pa_symbol(__bss_start);
bss_resource.end = __pa_symbol(__bss_stop)-1;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] x86/xip: snip the kernel text out of the memory mapping

2015-03-23 Thread Jim Kukunas
If the kernel tries to create an identity region for a memory range
that spans the kernel text, split it into two pieces, skipping the
text section. Otherwise, this will setup the standard text mapping,
which will point to the normal RAM location for text instead of the
XIP_BASE location.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/mm/init.c | 78 ++
 1 file changed, 78 insertions(+)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index a110efc..07b20c6 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -391,6 +391,82 @@ bool pfn_range_is_mapped(unsigned long start_pfn, unsigned 
long end_pfn)
return false;
 }
 
+#ifdef CONFIG_XIP_KERNEL
+/*
+ * Cut the .text virtual address out of mem range b/c the mapping
+ * is already correctly setup
+ */
+static inline void snip_xip_text(struct map_range *mr, int *nr_range)
+{
+   int i;
+
+   for (i = 0; i  *nr_range; i++) {
+   long diff;
+
+   if (mr[i].start = CONFIG_PHYSICAL_START 
+   mr[i].end = CONFIG_PHYSICAL_START)
+   continue;
+   if (mr[i].start = __pa_symbol(_sdata))
+   continue;
+
+   diff = mr[i].start - CONFIG_PHYSICAL_START;
+   if (diff  0) { /* range starts below .text and includes it */
+   diff = mr[i].end - __pa_symbol(_sdata);
+
+   /* shorten segment so it ends just before .text */
+   mr[i].end = CONFIG_PHYSICAL_START;
+
+   /* if segment goes past .text, add 2nd segment*/
+   if (diff  0) {
+   /* move next section down 1 */
+   if (i + 1  *nr_range) {
+   memmove(mr[i + 1], mr[i + 2],
+   sizeof(struct map_range[
+   *nr_range - i - 2]));
+   }
+   mr[i + 1].start = __pa_symbol(_sdata);
+   mr[i + 1].end =  mr[i + 1].start + diff;
+   mr[i + 1].page_size_mask = 0;
+   *nr_range = *nr_range + 1;
+   i++;
+   }
+   } else if (diff == 0) {
+   diff = mr[i].end - __pa_symbol(_sdata);
+   if (diff  0) {
+   mr[i].start = __pa_symbol(_sdata);
+   mr[i].end = mr[i].start + diff;
+   mr[i].page_size_mask = 0;
+   } else {
+   /* delete this range */
+   memmove(mr[i + 1], mr[i], sizeof(
+   struct map_range[*nr_range - i - 1]));
+   *nr_range = *nr_range - 1;
+   i--;
+   }
+   } else if (diff  0) {
+   long ediff = mr[i].end - __pa_symbol(_sdata);
+
+   if (ediff  0) {
+   mr[i].start = __pa_symbol(_sdata);
+   mr[i].end = mr[i].start + ediff;
+   mr[i].page_size_mask = 0;
+   } else {
+   /* delete this range */
+   memmove(mr[i + 1], mr[i], sizeof(
+   struct map_range[*nr_range - i - 1]));
+   *nr_range = *nr_range - 1;
+   i--;
+   }
+   }
+   break;
+   }
+}
+#else
+static inline void snip_xip_text(struct map_range *mr, int *mr_range)
+{
+}
+#endif
+
 /*
  * Setup the direct mapping of the physical memory at PAGE_OFFSET.
  * This runs before bootmem is initialized and gets pages directly from
@@ -409,6 +485,8 @@ unsigned long __init_refok init_memory_mapping(unsigned 
long start,
memset(mr, 0, sizeof(mr));
nr_range = split_mem_range(mr, 0, start, end);
 
+   snip_xip_text(mr, nr_range);
+
for (i = 0; i  nr_range; i++)
ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,
   mr[i].page_size_mask);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/11] x86/xip: update _va() and _pa() macros

2015-03-23 Thread Jim Kukunas
For obtaining the physical address, we always take the slow path
of slow_virt_to_phys(). In the future, we should probably special
case data addresses to avoid walking the page table. For obtaining
a virtual address, this patch introduces a slow path of
slow_xip_phys_to_virt().

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/include/asm/page.h | 15 +++
 arch/x86/mm/pageattr.c  | 11 +++
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index 802dde3..b54c7be 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -6,6 +6,7 @@
 #ifdef __KERNEL__
 
 #include asm/page_types.h
+#include asm/pgtable_types.h
 
 #ifdef CONFIG_X86_64
 #include asm/page_64.h
@@ -37,8 +38,15 @@ static inline void copy_user_page(void *to, void *from, 
unsigned long vaddr,
alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
 #define __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
 
+
+#ifdef CONFIG_XIP_KERNEL   /* TODO special case text translations */
+#define __pa(x)slow_virt_to_phys((void *)(x))
+#define __pa_nodebug   slow_virt_to_phys((void *)(x))
+#else
 #define __pa(x)__phys_addr((unsigned long)(x))
 #define __pa_nodebug(x)__phys_addr_nodebug((unsigned long)(x))
+#endif
+
 /* __pa_symbol should be used for C visible symbols.
This seems to be the official gcc blessed way to do such arithmetic. */
 /*
@@ -51,7 +59,14 @@ static inline void copy_user_page(void *to, void *from, 
unsigned long vaddr,
 #define __pa_symbol(x) \
__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
 
+
+#ifdef CONFIG_XIP_KERNEL
+extern unsigned long slow_xip_phys_to_virt(phys_addr_t);
+
+#define __va(x)((void 
*)slow_xip_phys_to_virt((phys_addr_t)x))
+#else
 #define __va(x)((void *)((unsigned 
long)(x)+PAGE_OFFSET))
+#endif
 
 #define __boot_va(x)   __va(x)
 #define __boot_pa(x)   __pa(x)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 536ea2f..ca9e2ca 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -383,6 +383,17 @@ static pte_t *_lookup_address_cpa(struct cpa_data *cpa, 
unsigned long address,
 return lookup_address(address, level);
 }
 
+#ifdef CONFIG_XIP_KERNEL
+unsigned long slow_xip_phys_to_virt(phys_addr_t x)
+{
+   if (x = CONFIG_XIP_BASE  x = (phys_addr_t)phys_sdata) {
+   unsigned long off = x - CONFIG_XIP_BASE;
+   return PAGE_OFFSET + off;
+   }
+   return x + PAGE_OFFSET;
+}
+#endif
+
 /*
  * Lookup the PMD entry for a virtual address. Return a pointer to the entry
  * or NULL if not present.
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] x86/xip: make e820_add_kernel_range() a NOP

2015-03-23 Thread Jim Kukunas
e820_add_kernel_range() checks whether the kernel text is present
in the e820 map, and marked as usable RAM. If not, it modifies
the e820 map accordingly.

For XIP, that is unnecessary since the kernel text won't be loaded
in RAM.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/kernel/setup.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d276ebf..74fc6c8 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -787,6 +787,7 @@ static void __init trim_bios_range(void)
 }
 
 /* called before trim_bios_range() to spare extra sanitize */
+#ifndef CONFIG_XIP_KERNEL
 static void __init e820_add_kernel_range(void)
 {
u64 start = __pa_symbol(_text);
@@ -806,6 +807,11 @@ static void __init e820_add_kernel_range(void)
e820_remove_range(start, size, E820_RAM, 0);
e820_add_region(start, size, E820_RAM);
 }
+#else
+static void __init e820_add_kernel_range(void)
+{
+}
+#endif
 
 static unsigned reserve_low = CONFIG_X86_RESERVE_LOW  10;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] x86/xip: Update address of sections in linker script

2015-03-23 Thread Jim Kukunas
In order to update the LMA for each section, according to
CONFIG_XIP_BASE, this patch uses the preprocessor to change
the arguments passed to the AT keyword. Each LMA is updated
to that symbol's physical address.

The text section is aligned to a page so that the ELF
header at the beginning of XIP_BASE isn't mapped into
the linear address space. Also the initial location counter
is incremented to account for the ELF header.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/include/asm/boot.h   |  4 
 arch/x86/kernel/vmlinux.lds.S | 17 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 4fa687a..a128c71 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -10,6 +10,10 @@
+ (CONFIG_PHYSICAL_ALIGN - 1)) \
 ~(CONFIG_PHYSICAL_ALIGN - 1))
 
+#ifdef CONFIG_XIP_KERNEL
+#define PHYS_XIP_OFFSET (CONFIG_XIP_BASE - (LOAD_OFFSET + LOAD_PHYSICAL_ADDR))
+#endif
+
 /* Minimum kernel alignment, as a power of two */
 #ifdef CONFIG_X86_64
 #define MIN_KERNEL_ALIGN_LG2   PMD_SHIFT
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 00bf300..414a1ac 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -20,6 +20,15 @@
 #define LOAD_OFFSET __START_KERNEL_map
 #endif
 
+#ifdef CONFIG_XIP_KERNEL
+#define AT(x)  AT(x + LOAD_OFFSET + PHYS_XIP_OFFSET)
+#define TEXT_ALIGN ALIGN(0x1000)
+#define DATA_ALIGN ALIGN(0x20)
+#else
+#define TEXT_ALIGN
+#define DATA_ALIGN
+#endif
+
 #include asm-generic/vmlinux.lds.h
 #include asm/asm-offsets.h
 #include asm/thread_info.h
@@ -89,8 +98,12 @@ SECTIONS
 phys_startup_64 = startup_64 - LOAD_OFFSET;
 #endif
 
+#ifdef CONFIG_XIP_KERNEL
+   . += SIZEOF_HEADERS;
+#endif
+
/* Text and read-only data */
-   .text :  AT(ADDR(.text) - LOAD_OFFSET) {
+   .text :  AT(ADDR(.text) - LOAD_OFFSET) TEXT_ALIGN {
_text = .;
/* bootstrapping code */
HEAD_TEXT
@@ -121,7 +134,7 @@ SECTIONS
X64_ALIGN_DEBUG_RODATA_END
 
/* Data */
-   .data : AT(ADDR(.data) - LOAD_OFFSET) {
+   .data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN {
/* Start of data section */
_sdata = .;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] x86/xip: reserve memblock for only data

2015-03-23 Thread Jim Kukunas
Nothing is loaded at the usual spot for .text, starting at
CONFIG_PHYSICAL_START, so we don't reserve it. Additionally,
the physical address of the _text isn't going to be physically
contiguous with _data.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/kernel/setup.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 98dc931..e2d85c4 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -869,8 +869,13 @@ dump_kernel_offset(struct notifier_block *self, unsigned 
long v, void *p)
 
 void __init setup_arch(char **cmdline_p)
 {
+#ifdef CONFIG_XIP_KERNEL
+   memblock_reserve(__pa_symbol(_sdata),
+   (unsigned long)__bss_stop - (unsigned long)_sdata);
+#else
memblock_reserve(__pa_symbol(_text),
 (unsigned long)__bss_stop - (unsigned long)_text);
+ #endif
 
early_reserve_initrd();
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] x86 XIP

2015-03-23 Thread Jim Kukunas

Hi Folks,

This patchset introduces eXecute-In-Place (XIP) support for x86. Right now only
minimal configurations are supported (32-bit only, no SMP, no PAE, and so on).
My goal is to increase the number of supported configurations in the future 
based on what functionality is requested. This patchset only supports storage
configurations where the kernel text and read-only data will always be readable.

I didn't create a special Makefile target for building xip images, like how ARM
has xipImage. Instead, I'm just using the basic vmlinux ELF executable. The 
kernel must be built with CONFIG_XIP_BASE set to the physical address of the 
vmlinux file. Additionally, since the .text section is read-only, all of the
alternative instructions need to be resolved at build-time. To accomplish this,
the cpu features to enable are selected through a series of Kconfig options.
In order to boot, the bootloader just needs to fill out the zero page (whose
address startup_32() expects in esi), switch to 32-bit protected mode and then
jump into startup_32(), which will be at CONFIG_XIP_BASE plus one page.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] x86/xip: copy writable sections into RAM

2015-03-23 Thread Jim Kukunas
Loads all writable and non-zero sections into their VMA.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/include/asm/sections.h |  4 
 arch/x86/kernel/head_32.S   | 22 ++
 arch/x86/kernel/vmlinux.lds.S   |  4 
 3 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 0a52424..9535e95 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -11,4 +11,8 @@ extern struct exception_table_entry __stop___ex_table[];
 extern char __end_rodata_hpage_align[];
 #endif
 
+#ifdef CONFIG_XIP_KERNEL
+extern char phys_sdata[];
+#endif
+
 #endif /* _ASM_X86_SECTIONS_H */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index f36bd42..80f344a 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -86,6 +86,28 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
  */
 __HEAD
 ENTRY(startup_32)
+
+#ifdef CONFIG_XIP_KERNEL
+   /*
+* Copy writable sections into RAM
+*/
+
+   movl %esi, %ebp # Preserve pointer to zero-page
+
+   leal pa(_sdata), %edi
+   leal phys_edata, %ecx
+   leal phys_sdata, %esi
+   subl %esi, %ecx
+
+   cld
+xip_data_cp:
+   lodsb
+   stosb
+   loop xip_data_cp
+
+   movl %ebp, %esi
+#endif
+
movl pa(stack_start),%ecx

/* test KEEP_SEGMENTS flag to see if the bootloader is asking
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 414a1ac..59a9edb 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -133,6 +133,9 @@ SECTIONS
RO_DATA(PAGE_SIZE)
X64_ALIGN_DEBUG_RODATA_END
 
+   phys_sdata = LOADADDR(.data);
+   phys_edata = phys_sdata + (_end_nonzero - _sdata);
+
/* Data */
.data : AT(ADDR(.data) - LOAD_OFFSET) DATA_ALIGN {
/* Start of data section */
@@ -319,6 +322,7 @@ SECTIONS
NOSAVE_DATA
}
 #endif
+   _end_nonzero = .;
 
/* BSS */
. = ALIGN(PAGE_SIZE);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] x86/xip: add XIP_KERNEL and XIP_BASE options

2015-03-23 Thread Jim Kukunas
The CONFIG_XIP_KERNEL Kconfig option enables eXecute-In-Place
(XIP) support. When XIP_KERNEL is set, XIP_BASE points to the
physical address of the vmlinux ELF file.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/Kconfig | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b7d31ca..f5fa02c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -294,6 +294,25 @@ config ZONE_DMA
 
  If unsure, say Y.
 
+config XIP_KERNEL
+   bool eXecute-In-Place (XIP) support if (X86_32  EXPERT  EMBEDDED)
+   depends on !MODULES  !X86_PAE  !SMP
+   default n
+   help
+ With this option enabled, the text and any read-only segments of
+ the kernel are not copied from their initial location to their usual
+ location in RAM. As a result, when the kernel is located in storage
+ that is addressable by the CPU, the kernel text and read-only data
+ segments are never loaded into memory, thereby using less RAM.
+
+ Only enable this option if you know what you're doing.
+
+config XIP_BASE
+   hex Physical address of XIP kernel if XIP_KERNEL
+   default 0xFF80
+   help
+ The physical address for the beginning of the vmlinux file.
+
 config SMP
bool Symmetric multi-processing support
---help---
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/11] x86/xip: after paging trampoline, discard PMDs above _brk

2015-03-23 Thread Jim Kukunas
In the likely case that XIP_BASE is above PAGE_OFFSET, we
want to discard any early identity mappings. So rather than
keeping every PMD above PAGE_OFFSET, only copy the ones from
PAGE_OFFSET to the last PMD of _end. At this point, the linear
address space should look normal.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/include/asm/pgtable.h | 7 +++
 arch/x86/kernel/setup.c| 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a0c35bf..5eaba7d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -660,6 +660,13 @@ static inline int pgd_none(pgd_t pgd)
 #define KERNEL_PGD_BOUNDARYpgd_index(PAGE_OFFSET)
 #define KERNEL_PGD_PTRS(PTRS_PER_PGD - KERNEL_PGD_BOUNDARY)
 
+#ifdef CONFIG_XIP_KERNEL
+#define BOOT_PGD_COPY_PTRS \
+   ((pgd_index((unsigned long)_end)  - pgd_index(PAGE_OFFSET)) + 4)
+#else
+#define BOOT_PGD_COPY_PTRS KERNEL_PGD_PTRS
+#endif
+
 #ifndef __ASSEMBLY__
 
 extern int direct_gbpages;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index e2d85c4..d276ebf 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -894,7 +894,7 @@ void __init setup_arch(char **cmdline_p)
 */
clone_pgd_range(swapper_pg_dir + KERNEL_PGD_BOUNDARY,
initial_page_table + KERNEL_PGD_BOUNDARY,
-   KERNEL_PGD_PTRS);
+   BOOT_PGD_COPY_PTRS);
 
load_cr3(swapper_pg_dir);
/*
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lib/deflate: Replace UNALIGNED_OK w/ HAVE_EFFICIENT_UNALIGNED_ACCESS

2014-10-14 Thread Jim Kukunas
Zlib implements a byte-by-byte and a word-by-word longest_match() string
comparision function. This implementation defaults to the slower byte-by-byte
version unless the preprocessor macro UNALIGNED_OK is defined.
Currently, nothing is hooked up to define this macro, but we do have
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, which serves the same purpose.

The exact performance improvement of the word-by-word implementation is data
dependant, but on x86 it is typically in the range of a 5-10% cycle reduction.

The code is already there, might as well use it ...

Signed-off-by: Jim Kukunas 
---
 lib/zlib_deflate/deflate.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c
index d20ef45..4920e51 100644
--- a/lib/zlib_deflate/deflate.c
+++ b/lib/zlib_deflate/deflate.c
@@ -570,9 +570,9 @@ static uInt longest_match(
 Pos *prev = s->prev;
 uInt wmask = s->w_mask;
 
-#ifdef UNALIGNED_OK
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 /* Compare two bytes at a time. Note: this is not always beneficial.
- * Try with and without -DUNALIGNED_OK to check.
+ * Try with and without -DCONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to check.
  */
 register Byte *strend = s->window + s->strstart + MAX_MATCH - 1;
 register ush scan_start = *(ush*)scan;
@@ -606,9 +606,10 @@ static uInt longest_match(
 /* Skip to next match if the match length cannot increase
  * or if the match length is less than 2:
  */
-#if (defined(UNALIGNED_OK) && MAX_MATCH == 258)
+#if (defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && MAX_MATCH == 258)
 /* This code assumes sizeof(unsigned short) == 2. Do not use
- * UNALIGNED_OK if your compiler uses a different size.
+* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if your compiler uses a
+* different size.
  */
 if (*(ush*)(match+best_len-1) != scan_end ||
 *(ush*)match != scan_start) continue;
@@ -639,7 +640,7 @@ static uInt longest_match(
 len = (MAX_MATCH - 1) - (int)(strend-scan);
 scan = strend - (MAX_MATCH-1);
 
-#else /* UNALIGNED_OK */
+#else /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
 
 if (match[best_len]   != scan_end  ||
 match[best_len-1] != scan_end1 ||
@@ -670,13 +671,13 @@ static uInt longest_match(
 len = MAX_MATCH - (int)(strend - scan);
 scan = strend - MAX_MATCH;
 
-#endif /* UNALIGNED_OK */
+#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
 
 if (len > best_len) {
 s->match_start = cur_match;
 best_len = len;
 if (len >= nice_match) break;
-#ifdef UNALIGNED_OK
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 scan_end = *(ush*)(scan+best_len-1);
 #else
 scan_end1  = scan[best_len-1];
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] lib/deflate: Replace UNALIGNED_OK w/ HAVE_EFFICIENT_UNALIGNED_ACCESS

2014-10-14 Thread Jim Kukunas
Zlib implements a byte-by-byte and a word-by-word longest_match() string
comparision function. This implementation defaults to the slower byte-by-byte
version unless the preprocessor macro UNALIGNED_OK is defined.
Currently, nothing is hooked up to define this macro, but we do have
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, which serves the same purpose.

The exact performance improvement of the word-by-word implementation is data
dependant, but on x86 it is typically in the range of a 5-10% cycle reduction.

The code is already there, might as well use it ...

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 lib/zlib_deflate/deflate.c |   15 ---
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/zlib_deflate/deflate.c b/lib/zlib_deflate/deflate.c
index d20ef45..4920e51 100644
--- a/lib/zlib_deflate/deflate.c
+++ b/lib/zlib_deflate/deflate.c
@@ -570,9 +570,9 @@ static uInt longest_match(
 Pos *prev = s-prev;
 uInt wmask = s-w_mask;
 
-#ifdef UNALIGNED_OK
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 /* Compare two bytes at a time. Note: this is not always beneficial.
- * Try with and without -DUNALIGNED_OK to check.
+ * Try with and without -DCONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to check.
  */
 register Byte *strend = s-window + s-strstart + MAX_MATCH - 1;
 register ush scan_start = *(ush*)scan;
@@ -606,9 +606,10 @@ static uInt longest_match(
 /* Skip to next match if the match length cannot increase
  * or if the match length is less than 2:
  */
-#if (defined(UNALIGNED_OK)  MAX_MATCH == 258)
+#if (defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)  MAX_MATCH == 258)
 /* This code assumes sizeof(unsigned short) == 2. Do not use
- * UNALIGNED_OK if your compiler uses a different size.
+* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if your compiler uses a
+* different size.
  */
 if (*(ush*)(match+best_len-1) != scan_end ||
 *(ush*)match != scan_start) continue;
@@ -639,7 +640,7 @@ static uInt longest_match(
 len = (MAX_MATCH - 1) - (int)(strend-scan);
 scan = strend - (MAX_MATCH-1);
 
-#else /* UNALIGNED_OK */
+#else /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
 
 if (match[best_len]   != scan_end  ||
 match[best_len-1] != scan_end1 ||
@@ -670,13 +671,13 @@ static uInt longest_match(
 len = MAX_MATCH - (int)(strend - scan);
 scan = strend - MAX_MATCH;
 
-#endif /* UNALIGNED_OK */
+#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
 
 if (len  best_len) {
 s-match_start = cur_match;
 best_len = len;
 if (len = nice_match) break;
-#ifdef UNALIGNED_OK
+#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 scan_end = *(ush*)(scan+best_len-1);
 #else
 scan_end1  = scan[best_len-1];
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] lib/raid6: Add AVX2 optimized recovery functions

2012-11-09 Thread Jim Kukunas
On Fri, Nov 09, 2012 at 10:50:25PM +1100, Neil Brown wrote:
> On Fri, 09 Nov 2012 12:39:05 +0100 "H. Peter Anvin"  wrote:
> 
> > Sorry, we cannot share those at this time since the hardwarenis not yet 
> > released.
> 
> Can I take that to imply "Acked-by: "H. Peter Anvin" " ??
> 
> It would be nice to have at least a statement like:
>  These patches have been tested both with the user-space testing tool and in 
>  a RAID6 md array and the pass all test.  While we cannot release performance
>  numbers as the hardwere is not released, we can confirm that on that hardware
>  the performance with these patches is faster than without.
> 
> I guess I should be able to assume that - surely the patches would not be
> posted if it were not true...  But I like to avoid assuming when I can.

Hi Neil,

That assumption is correct. The patch was tested and benchmarked before 
submission.

You'll notice that this code is very similar to the SSSE3-optimized
recovery routines I wrote earlier. This implementation extends that same
algorithm from 128-bit registers to 256-bit registers.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center


pgplXyZIdQ4sp.pgp
Description: PGP signature


Re: [PATCH] lib/raid6: Add AVX2 optimized recovery functions

2012-11-09 Thread Jim Kukunas
On Fri, Nov 09, 2012 at 10:50:25PM +1100, Neil Brown wrote:
 On Fri, 09 Nov 2012 12:39:05 +0100 H. Peter Anvin h...@zytor.com wrote:
 
  Sorry, we cannot share those at this time since the hardwarenis not yet 
  released.
 
 Can I take that to imply Acked-by: H. Peter Anvin h...@zytor.com ??
 
 It would be nice to have at least a statement like:
  These patches have been tested both with the user-space testing tool and in 
  a RAID6 md array and the pass all test.  While we cannot release performance
  numbers as the hardwere is not released, we can confirm that on that hardware
  the performance with these patches is faster than without.
 
 I guess I should be able to assume that - surely the patches would not be
 posted if it were not true...  But I like to avoid assuming when I can.

Hi Neil,

That assumption is correct. The patch was tested and benchmarked before 
submission.

You'll notice that this code is very similar to the SSSE3-optimized
recovery routines I wrote earlier. This implementation extends that same
algorithm from 128-bit registers to 256-bit registers.

Thanks.

-- 
Jim Kukunas
Intel Open Source Technology Center


pgplXyZIdQ4sp.pgp
Description: PGP signature


[PATCH] lib/raid6: Add AVX2 optimized recovery functions

2012-11-08 Thread Jim Kukunas
Optimize RAID6 recovery functions to take advantage of
the 256-bit YMM integer instructions introduced in AVX2.

Signed-off-by: Jim Kukunas 
---
 arch/x86/Makefile   |   5 +-
 include/linux/raid/pq.h |   1 +
 lib/raid6/Makefile  |   2 +-
 lib/raid6/algos.c   |   3 +
 lib/raid6/recov_avx2.c  | 327 
 lib/raid6/test/Makefile |   2 +-
 lib/raid6/x86.h |  14 ++-
 7 files changed, 345 insertions(+), 9 deletions(-)
 create mode 100644 lib/raid6/recov_avx2.c

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 682e9c2..f24c037 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -123,9 +123,10 @@ cfi-sections := $(call as-instr,.cfi_sections 
.debug_frame,-DCONFIG_AS_CFI_SECTI
 # does binutils support specific instructions?
 asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1)
 avx_instr := $(call as-instr,vxorps 
%ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_AVX=1)
+avx2_instr :=$(call as-instr,vpbroadcastb 
%xmm0$(comma)%ymm1,-DCONFIG_AS_AVX2=1)
 
-KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr)
-KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr)
+KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) 
$(avx_instr) $(avx2_instr)
+KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) 
$(avx_instr) $(avx2_instr)
 
 LDFLAGS := -m elf_$(UTS_MACHINE)
 
diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 640c69c..3156347 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -109,6 +109,7 @@ struct raid6_recov_calls {
 
 extern const struct raid6_recov_calls raid6_recov_intx1;
 extern const struct raid6_recov_calls raid6_recov_ssse3;
+extern const struct raid6_recov_calls raid6_recov_avx2;
 
 /* Algorithm list */
 extern const struct raid6_calls * const raid6_algos[];
diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index de06dfe..8c2e22b 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -1,6 +1,6 @@
 obj-$(CONFIG_RAID6_PQ) += raid6_pq.o
 
-raid6_pq-y += algos.o recov.o recov_ssse3.o tables.o int1.o int2.o int4.o \
+raid6_pq-y += algos.o recov.o recov_ssse3.o recov_avx2.o tables.o int1.o 
int2.o int4.o \
   int8.o int16.o int32.o altivec1.o altivec2.o altivec4.o \
   altivec8.o mmx.o sse1.o sse2.o
 hostprogs-y+= mktables
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 589f5f5..8b7f55c 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -72,6 +72,9 @@ EXPORT_SYMBOL_GPL(raid6_datap_recov);
 
 const struct raid6_recov_calls *const raid6_recov_algos[] = {
 #if (defined(__i386__) || defined(__x86_64__)) && !defined(__arch_um__)
+#ifdef CONFIG_AS_AVX2
+   _recov_avx2,
+#endif
_recov_ssse3,
 #endif
_recov_intx1,
diff --git a/lib/raid6/recov_avx2.c b/lib/raid6/recov_avx2.c
new file mode 100644
index 000..43a9bab
--- /dev/null
+++ b/lib/raid6/recov_avx2.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (C) 2012 Intel Corporation
+ * Author: Jim Kukunas 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#if (defined(__i386__) || defined(__x86_64__)) && !defined(__arch_um__)
+
+#if CONFIG_AS_AVX2
+
+#include 
+#include "x86.h"
+
+static int raid6_has_avx2(void)
+{
+   return boot_cpu_has(X86_FEATURE_AVX2) &&
+   boot_cpu_has(X86_FEATURE_AVX);
+}
+
+static void raid6_2data_recov_avx2(int disks, size_t bytes, int faila,
+   int failb, void **ptrs)
+{
+   u8 *p, *q, *dp, *dq;
+   const u8 *pbmul;/* P multiplier table for B data */
+   const u8 *qmul; /* Q multiplier table (for both) */
+   const u8 x0f = 0x0f;
+
+   p = (u8 *)ptrs[disks-2];
+   q = (u8 *)ptrs[disks-1];
+
+   /* Compute syndrome with zero for the missing data pages
+  Use the dead data pages as temporary storage for
+  delta p and delta q */
+   dp = (u8 *)ptrs[faila];
+   ptrs[faila] = (void *)raid6_empty_zero_page;
+   ptrs[disks-2] = dp;
+   dq = (u8 *)ptrs[failb];
+   ptrs[failb] = (void *)raid6_empty_zero_page;
+   ptrs[disks-1] = dq;
+
+   raid6_call.gen_syndrome(disks, bytes, ptrs);
+
+   /* Restore pointer table */
+   ptrs[faila]   = dp;
+   ptrs[failb]   = dq;
+   ptrs[disks-2] = p;
+   ptrs[disks-1] = q;
+
+   /* Now, pick the proper data tables */
+   pbmul = raid6_vgfmul[raid6_gfexi[failb-faila]];
+   qmul  = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^
+   raid6_gfexp[failb]]];
+
+   kernel_fpu_begin();
+
+   /* ymm0 = x0f[16] */
+   asm volatile("vpbroadcastb %0, %%ymm7" : : "m" (x0f));
+
+   while (bytes) {
+#ifdef CONFIG_X86_64
+   asm volatile(&qu

[PATCH] lib/raid6: Add AVX2 optimized recovery functions

2012-11-08 Thread Jim Kukunas
Optimize RAID6 recovery functions to take advantage of
the 256-bit YMM integer instructions introduced in AVX2.

Signed-off-by: Jim Kukunas james.t.kuku...@linux.intel.com
---
 arch/x86/Makefile   |   5 +-
 include/linux/raid/pq.h |   1 +
 lib/raid6/Makefile  |   2 +-
 lib/raid6/algos.c   |   3 +
 lib/raid6/recov_avx2.c  | 327 
 lib/raid6/test/Makefile |   2 +-
 lib/raid6/x86.h |  14 ++-
 7 files changed, 345 insertions(+), 9 deletions(-)
 create mode 100644 lib/raid6/recov_avx2.c

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 682e9c2..f24c037 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -123,9 +123,10 @@ cfi-sections := $(call as-instr,.cfi_sections 
.debug_frame,-DCONFIG_AS_CFI_SECTI
 # does binutils support specific instructions?
 asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1)
 avx_instr := $(call as-instr,vxorps 
%ymm0$(comma)%ymm1$(comma)%ymm2,-DCONFIG_AS_AVX=1)
+avx2_instr :=$(call as-instr,vpbroadcastb 
%xmm0$(comma)%ymm1,-DCONFIG_AS_AVX2=1)
 
-KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr)
-KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) $(avx_instr)
+KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) 
$(avx_instr) $(avx2_instr)
+KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr) 
$(avx_instr) $(avx2_instr)
 
 LDFLAGS := -m elf_$(UTS_MACHINE)
 
diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 640c69c..3156347 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -109,6 +109,7 @@ struct raid6_recov_calls {
 
 extern const struct raid6_recov_calls raid6_recov_intx1;
 extern const struct raid6_recov_calls raid6_recov_ssse3;
+extern const struct raid6_recov_calls raid6_recov_avx2;
 
 /* Algorithm list */
 extern const struct raid6_calls * const raid6_algos[];
diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index de06dfe..8c2e22b 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -1,6 +1,6 @@
 obj-$(CONFIG_RAID6_PQ) += raid6_pq.o
 
-raid6_pq-y += algos.o recov.o recov_ssse3.o tables.o int1.o int2.o int4.o \
+raid6_pq-y += algos.o recov.o recov_ssse3.o recov_avx2.o tables.o int1.o 
int2.o int4.o \
   int8.o int16.o int32.o altivec1.o altivec2.o altivec4.o \
   altivec8.o mmx.o sse1.o sse2.o
 hostprogs-y+= mktables
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 589f5f5..8b7f55c 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -72,6 +72,9 @@ EXPORT_SYMBOL_GPL(raid6_datap_recov);
 
 const struct raid6_recov_calls *const raid6_recov_algos[] = {
 #if (defined(__i386__) || defined(__x86_64__))  !defined(__arch_um__)
+#ifdef CONFIG_AS_AVX2
+   raid6_recov_avx2,
+#endif
raid6_recov_ssse3,
 #endif
raid6_recov_intx1,
diff --git a/lib/raid6/recov_avx2.c b/lib/raid6/recov_avx2.c
new file mode 100644
index 000..43a9bab
--- /dev/null
+++ b/lib/raid6/recov_avx2.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (C) 2012 Intel Corporation
+ * Author: Jim Kukunas james.t.kuku...@linux.intel.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#if (defined(__i386__) || defined(__x86_64__))  !defined(__arch_um__)
+
+#if CONFIG_AS_AVX2
+
+#include linux/raid/pq.h
+#include x86.h
+
+static int raid6_has_avx2(void)
+{
+   return boot_cpu_has(X86_FEATURE_AVX2) 
+   boot_cpu_has(X86_FEATURE_AVX);
+}
+
+static void raid6_2data_recov_avx2(int disks, size_t bytes, int faila,
+   int failb, void **ptrs)
+{
+   u8 *p, *q, *dp, *dq;
+   const u8 *pbmul;/* P multiplier table for B data */
+   const u8 *qmul; /* Q multiplier table (for both) */
+   const u8 x0f = 0x0f;
+
+   p = (u8 *)ptrs[disks-2];
+   q = (u8 *)ptrs[disks-1];
+
+   /* Compute syndrome with zero for the missing data pages
+  Use the dead data pages as temporary storage for
+  delta p and delta q */
+   dp = (u8 *)ptrs[faila];
+   ptrs[faila] = (void *)raid6_empty_zero_page;
+   ptrs[disks-2] = dp;
+   dq = (u8 *)ptrs[failb];
+   ptrs[failb] = (void *)raid6_empty_zero_page;
+   ptrs[disks-1] = dq;
+
+   raid6_call.gen_syndrome(disks, bytes, ptrs);
+
+   /* Restore pointer table */
+   ptrs[faila]   = dp;
+   ptrs[failb]   = dq;
+   ptrs[disks-2] = p;
+   ptrs[disks-1] = q;
+
+   /* Now, pick the proper data tables */
+   pbmul = raid6_vgfmul[raid6_gfexi[failb-faila]];
+   qmul  = raid6_vgfmul[raid6_gfinv[raid6_gfexp[faila] ^
+   raid6_gfexp[failb]]];
+
+   kernel_fpu_begin();
+
+   /* ymm0 = x0f[16] */
+   asm volatile(vpbroadcastb %0, %%ymm7 : : m (x0f));
+
+   while (bytes) {
+#ifdef CONFIG_X86_64