from:"tip\-bot for Baoquan He"

[tip:x86/boot] x86/kdump/64: Restrict kdump kernel reservation to <64TB

2019-06-27 Thread tip-bot for Baoquan He

Commit-ID:  8ff80fbe7e9870078b1cc3c2cdd8f3f223b333a9
Gitweb: https://git.kernel.org/tip/8ff80fbe7e9870078b1cc3c2cdd8f3f223b333a9
Author: Baoquan He 
AuthorDate: Fri, 24 May 2019 15:38:10 +0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 28 Jun 2019 07:14:59 +0200

x86/kdump/64: Restrict kdump kernel reservation to <64TB

Restrict kdump to only reserve crashkernel below 64TB.

The reaons is that the kdump may jump from a 5-level paging mode to a
4-level paging mode kernel. If a 4-level paging mode kdump kernel is put
above 64TB, then the kdump kernel cannot start.

The 1st kernel reserves the kdump kernel region during bootup. At that
point it is not known whether the kdump kernel has 5-level or 4-level
paging support.

To support both restrict the kdump kernel reservation to the lower 64TB
address space to ensure that a 4-level paging mode kdump kernel can be
loaded and successfully started.

[ tglx: Massaged changelog ]

Signed-off-by: Baoquan He 
Signed-off-by: Thomas Gleixner 
Acked-by: Kirill A. Shutemov 
Acked-by: Dave Young 
Cc: b...@alien8.de
Cc: h...@zytor.com
Link: https://lkml.kernel.org/r/20190524073810.24298-4-...@redhat.com

---
 arch/x86/kernel/setup.c | 15 ---
 include/linux/sizes.h   |  1 +
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 08a5f4a131f5..dcbdf54fb5c1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -453,15 +453,24 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 #define CRASH_ALIGNSZ_16M
 
 /*
- * Keep the crash kernel below this limit.  On 32 bits earlier kernels
- * would limit the kernel to the low 512 MiB due to mapping restrictions.
+ * Keep the crash kernel below this limit.
+ *
+ * On 32 bits earlier kernels would limit the kernel to the low 512 MiB
+ * due to mapping restrictions.
+ *
+ * On 64bit, kdump kernel need be restricted to be under 64TB, which is
+ * the upper limit of system RAM in 4-level paing mode. Since the kdump
+ * jumping could be from 5-level to 4-level, the jumping will fail if
+ * kernel is put above 64TB, and there's no way to detect the paging mode
+ * of the kernel which will be loaded for dumping during the 1st kernel
+ * bootup.
  */
 #ifdef CONFIG_X86_32
 # define CRASH_ADDR_LOW_MAXSZ_512M
 # define CRASH_ADDR_HIGH_MAX   SZ_512M
 #else
 # define CRASH_ADDR_LOW_MAXSZ_4G
-# define CRASH_ADDR_HIGH_MAX   MAXMEM
+# define CRASH_ADDR_HIGH_MAX   SZ_64T
 #endif
 
 static int __init reserve_crashkernel_low(void)
diff --git a/include/linux/sizes.h b/include/linux/sizes.h
index fbde0bc7e882..8651269cb46c 100644
--- a/include/linux/sizes.h
+++ b/include/linux/sizes.h
@@ -47,5 +47,6 @@
 #define SZ_2G  0x8000
 
 #define SZ_4G  _AC(0x1, ULL)
+#define SZ_64T _AC(0x4000, ULL)
 
 #endif /* __LINUX_SIZES_H__ */

[tip:x86/boot] x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel

2019-06-27 Thread tip-bot for Baoquan He

Commit-ID:  ee338b9ee2822e65a85750da6129946c14962410
Gitweb: https://git.kernel.org/tip/ee338b9ee2822e65a85750da6129946c14962410
Author: Baoquan He 
AuthorDate: Fri, 24 May 2019 15:38:09 +0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 28 Jun 2019 07:14:59 +0200

x86/kexec/64: Prevent kexec from 5-level paging to a 4-level only kernel

If the running kernel has 5-level paging activated, the 5-level paging mode
is preserved across kexec. If the kexec'ed kernel does not contain support
for handling active 5-level paging mode in the decompressor, the
decompressor will crash with #GP.

Prevent this situation at load time. If 5-level paging is active, check the
xloadflags whether the kexec kernel can handle 5-level paging at least in
the decompressor. If not, reject the load attempt and print out an error
message.

Signed-off-by: Baoquan He 
Signed-off-by: Thomas Gleixner 
Acked-by: Kirill A. Shutemov 
Cc: b...@alien8.de
Cc: h...@zytor.com
Cc: dyo...@redhat.com
Link: https://lkml.kernel.org/r/20190524073810.24298-3-...@redhat.com

---
 arch/x86/kernel/kexec-bzimage64.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/kexec-bzimage64.c 
b/arch/x86/kernel/kexec-bzimage64.c
index 22f60dd26460..7f439739ea3d 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -321,6 +321,11 @@ static int bzImage64_probe(const char *buf, unsigned long 
len)
return ret;
}
 
+   if (!(header->xloadflags & XLF_5LEVEL) && pgtable_l5_enabled()) {
+   pr_err("bzImage cannot handle 5-level paging mode.\n");
+   return ret;
+   }
+
/* I've got a bzImage */
pr_debug("It's a relocatable bzImage64\n");
ret = 0;

[tip:x86/boot] x86/boot: Add xloadflags bits to check for 5-level paging support

2019-06-27 Thread tip-bot for Baoquan He

Commit-ID:  f2d08c5d3bcf3f7ef788af122b57a919efa1e9d0
Gitweb: https://git.kernel.org/tip/f2d08c5d3bcf3f7ef788af122b57a919efa1e9d0
Author: Baoquan He 
AuthorDate: Fri, 24 May 2019 15:38:08 +0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 28 Jun 2019 07:14:59 +0200

x86/boot: Add xloadflags bits to check for 5-level paging support

The current kernel supports 5-level paging mode, and supports dynamically
choosing the paging mode during bootup depending on the kernel image,
hardware and kernel parameter settings. This flexibility brings several
issues to kexec/kdump:

1) Dynamic switching between paging modes requires support in the target
   kernel. This means kexec from a 5-level paging kernel into a kernel
   which does not support mode switching is not possible. So the loader
   needs to be able to analyze the supported paging modes of the kexec
   target kernel.

2) If running on a 5-level paging kernel and the kexec target kernel is a
   4-level paging kernel, the target immage cannot be loaded above the 64TB
   address space limit. But the kexec loader searches for a load area from
   top to bottom which would eventually put the target kernel above 64TB
   when the machine has large enough RAM size. So the loader needs to be
   able to analyze the paging mode of the target kernel to load it at a
   suitable spot in the address space.

Solution:

Add two bits XLF_5LEVEL and XLF_5LEVEL_ENABLED:

 - Bit XLF_5LEVEL indicates whether 5-level paging mode switching support
   is available. (Issue #1)

 - Bit XLF_5LEVEL_ENABLED indicates whether the kernel was compiled with
   full 5-level paging support (CONFIG_X86_5LEVEL=y). (Issue #2)

The loader will use these bits to verify whether the target kernel is
suitable to be kexec'ed to from a 5-level paging kernel and to determine
the constraints of the target kernel load address.

The flags will be used by the kernel kexec subsystem and the userspace
kexec tools.

[ tglx: Massaged changelog ]

Signed-off-by: Baoquan He 
Signed-off-by: Thomas Gleixner 
Acked-by: Kirill A. Shutemov 
Cc: b...@alien8.de
Cc: h...@zytor.com
Cc: dyo...@redhat.com
Link: https://lkml.kernel.org/r/20190524073810.24298-2-...@redhat.com

---
 arch/x86/boot/header.S| 12 +++-
 arch/x86/include/uapi/asm/bootparam.h |  2 ++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 850b8762e889..be19f4199727 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -419,7 +419,17 @@ xloadflags:
 # define XLF4 0
 #endif
 
-   .word XLF0 | XLF1 | XLF23 | XLF4
+#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_5LEVEL
+#define XLF56 (XLF_5LEVEL|XLF_5LEVEL_ENABLED)
+#else
+#define XLF56 XLF_5LEVEL
+#endif
+#else
+#define XLF56 0
+#endif
+
+   .word XLF0 | XLF1 | XLF23 | XLF4 | XLF56
 
 cmdline_size:   .long   COMMAND_LINE_SIZE-1 #length of the command line,
 #added with boot protocol
diff --git a/arch/x86/include/uapi/asm/bootparam.h 
b/arch/x86/include/uapi/asm/bootparam.h
index 60733f137e9a..c895df5482c5 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -29,6 +29,8 @@
 #define XLF_EFI_HANDOVER_32(1<<2)
 #define XLF_EFI_HANDOVER_64(1<<3)
 #define XLF_EFI_KEXEC  (1<<4)
+#define XLF_5LEVEL (1<<5)
+#define XLF_5LEVEL_ENABLED (1<<6)
 
 #ifndef __ASSEMBLY__

[tip:x86/urgent] x86/mm/KASLR: Compute the size of the vmemmap section properly

2019-06-07 Thread tip-bot for Baoquan He

Commit-ID:  00e5a2bbcc31d5fea853f8daeba0f06c1c88c3ff
Gitweb: https://git.kernel.org/tip/00e5a2bbcc31d5fea853f8daeba0f06c1c88c3ff
Author: Baoquan He 
AuthorDate: Thu, 23 May 2019 10:57:44 +0800
Committer:  Borislav Petkov 
CommitDate: Fri, 7 Jun 2019 23:12:13 +0200

x86/mm/KASLR: Compute the size of the vmemmap section properly

The size of the vmemmap section is hardcoded to 1 TB to support the
maximum amount of system RAM in 4-level paging mode - 64 TB.

However, 1 TB is not enough for vmemmap in 5-level paging mode. Assuming
the size of struct page is 64 Bytes, to support 4 PB system RAM in 5-level,
64 TB of vmemmap area is needed:

  4 * 1000^5 PB / 4096 bytes page size * 64 bytes per page struct / 1000^4 TB = 
62.5 TB.

This hardcoding may cause vmemmap to corrupt the following
cpu_entry_area section, if KASLR puts vmemmap very close to it and the
actual vmemmap size is bigger than 1 TB.

So calculate the actual size of the vmemmap region needed and then align
it up to 1 TB boundary.

In 4-level paging mode it is always 1 TB. In 5-level it's adjusted on
demand. The current code reserves 0.5 PB for vmemmap on 5-level. With
this change, the space can be saved and thus used to increase entropy
for the randomization.

 [ bp: Spell out how the 64 TB needed for vmemmap is computed and massage commit
   message. ]

Fixes: eedb92abb9bb ("x86/mm: Make virtual memory layout dynamic for 
CONFIG_X86_5LEVEL=y")
Signed-off-by: Baoquan He 
Signed-off-by: Borislav Petkov 
Reviewed-by: Kees Cook 
Acked-by: Kirill A. Shutemov 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: kirill.shute...@linux.intel.com
Cc: Peter Zijlstra 
Cc: stable 
Cc: Thomas Gleixner 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190523025744.3756-1-...@redhat.com
---
 arch/x86/mm/kaslr.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index dc3f058bdf9b..dc6182eecefa 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -52,7 +52,7 @@ static __initdata struct kaslr_memory_region {
 } kaslr_regions[] = {
{ _offset_base, 0 },
{ _base, 0 },
-   { _base, 1 },
+   { _base, 0 },
 };
 
 /* Get size in bytes used by the memory region */
@@ -78,6 +78,7 @@ void __init kernel_randomize_memory(void)
unsigned long rand, memory_tb;
struct rnd_state rand_state;
unsigned long remain_entropy;
+   unsigned long vmemmap_size;
 
vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : 
__PAGE_OFFSET_BASE_L4;
vaddr = vaddr_start;
@@ -109,6 +110,14 @@ void __init kernel_randomize_memory(void)
if (memory_tb < kaslr_regions[0].size_tb)
kaslr_regions[0].size_tb = memory_tb;
 
+   /*
+* Calculate the vmemmap region size in TBs, aligned to a TB
+* boundary.
+*/
+   vmemmap_size = (kaslr_regions[0].size_tb << (TB_SHIFT - PAGE_SHIFT)) *
+   sizeof(struct page);
+   kaslr_regions[2].size_tb = DIV_ROUND_UP(vmemmap_size, 1UL << TB_SHIFT);
+
/* Calculate entropy available between regions */
remain_entropy = vaddr_end - vaddr_start;
for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)

[tip:x86/urgent] x86/mm/KASLR: Fix the size of the direct mapping section

2019-04-18 Thread tip-bot for Baoquan He

Commit-ID:  ec3937107ab43f3e8b2bc9dad95710043c462ff7
Gitweb: https://git.kernel.org/tip/ec3937107ab43f3e8b2bc9dad95710043c462ff7
Author: Baoquan He 
AuthorDate: Thu, 4 Apr 2019 10:03:13 +0800
Committer:  Borislav Petkov 
CommitDate: Thu, 18 Apr 2019 10:42:58 +0200

x86/mm/KASLR: Fix the size of the direct mapping section

kernel_randomize_memory() uses __PHYSICAL_MASK_SHIFT to calculate
the maximum amount of system RAM supported. The size of the direct
mapping section is obtained from the smaller one of the below two
values:

  (actual system RAM size + padding size) vs (max system RAM size supported)

This calculation is wrong since commit

  b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52").

In it, __PHYSICAL_MASK_SHIFT was changed to be 52, regardless of whether
the kernel is using 4-level or 5-level page tables. Thus, it will always
use 4 PB as the maximum amount of system RAM, even in 4-level paging
mode where it should actually be 64 TB.

Thus, the size of the direct mapping section will always
be the sum of the actual system RAM size plus the padding size.

Even when the amount of system RAM is 64 TB, the following layout will
still be used. Obviously KALSR will be weakened significantly.

   ||___actual RAM___|_padding_|__the rest___|
   064TB~120TB

Instead, it should be like this:

   ||___actual RAM___|_the rest__|
   064TB~120TB

The size of padding region is controlled by
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING, which is 10 TB by default.

The above issue only exists when
CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING is set to a non-zero value,
which is the case when CONFIG_MEMORY_HOTPLUG is enabled. Otherwise,
using __PHYSICAL_MASK_SHIFT doesn't affect KASLR.

Fix it by replacing __PHYSICAL_MASK_SHIFT with MAX_PHYSMEM_BITS.

 [ bp: Massage commit message. ]

Fixes: b83ce5ee9147 ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52")
Signed-off-by: Baoquan He 
Signed-off-by: Borislav Petkov 
Reviewed-by: Thomas Garnier 
Acked-by: Kirill A. Shutemov 
Cc: "H. Peter Anvin" 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Ingo Molnar 
Cc: Kees Cook 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: frank.ram...@hpe.com
Cc: herb...@gondor.apana.org.au
Cc: kir...@shutemov.name
Cc: mike.tra...@hpe.com
Cc: thgar...@google.com
Cc: x86-ml 
Cc: yamada.masah...@socionext.com
Link: https://lkml.kernel.org/r/20190417083536.GE7065@MiWiFi-R3L-srv
---
 arch/x86/mm/kaslr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 3f452ffed7e9..d669c5e797e0 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -94,7 +94,7 @@ void __init kernel_randomize_memory(void)
if (!kaslr_memory_enabled())
return;
 
-   kaslr_regions[0].size_tb = 1 << (__PHYSICAL_MASK_SHIFT - TB_SHIFT);
+   kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT);
kaslr_regions[1].size_tb = VMALLOC_SIZE_TB;
 
/*

[tip:x86/mm] x86/mm/KASLR: Use only one PUD entry for real mode trampoline

2019-04-05 Thread tip-bot for Baoquan He

Commit-ID:  0925dda5962e9b55e4d38a72eba93858f24bac41
Gitweb: https://git.kernel.org/tip/0925dda5962e9b55e4d38a72eba93858f24bac41
Author: Baoquan He 
AuthorDate: Fri, 8 Mar 2019 10:56:15 +0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 5 Apr 2019 22:13:00 +0200

x86/mm/KASLR: Use only one PUD entry for real mode trampoline

The current code builds identity mapping for the real mode trampoline by
borrowing page tables from the direct mapping section if KASLR is
enabled. It copies present entries of the first PUD table in 4-level paging
mode, or the first P4D table in 5-level paging mode.

However, there's only a very small area under low 1 MB reserved for the
real mode trampoline in reserve_real_mode() so it makes no sense to build
up a really large mapping for it.

Reduce it to one PUD (1GB) entry. This matches the randomization
granularity in 4-level paging mode and allows to change the randomization
granularity in 5-level paging mode from 512GB to 1GB later.

[ tglx: Massaged changelog and comments ]

Signed-off-by: Baoquan He 
Signed-off-by: Thomas Gleixner 
Acked-by: Kirill A. Shutemov 
Cc: dave.han...@linux.intel.com
Cc: l...@kernel.org
Cc: pet...@infradead.org
Cc: b...@alien8.de
Cc: h...@zytor.com
Cc: keesc...@chromium.org
Cc: thgar...@google.com
Link: https://lkml.kernel.org/r/20190308025616.21440-2-...@redhat.com

---
 arch/x86/mm/kaslr.c | 84 +++--
 1 file changed, 37 insertions(+), 47 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 3f452ffed7e9..97813751340d 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -147,74 +147,64 @@ void __init kernel_randomize_memory(void)
 
 static void __meminit init_trampoline_pud(void)
 {
-   unsigned long paddr, paddr_next;
+   pud_t *pud_page_tramp, *pud, *pud_tramp;
+   p4d_t *p4d_page_tramp, *p4d, *p4d_tramp;
+   unsigned long paddr, vaddr;
pgd_t *pgd;
-   pud_t *pud_page, *pud_page_tramp;
-   int i;
 
pud_page_tramp = alloc_low_page();
 
+   /*
+* There are two mappings for the low 1MB area, the direct mapping
+* and the 1:1 mapping for the real mode trampoline:
+*
+* Direct mapping: virt_addr = phys_addr + PAGE_OFFSET
+* 1:1 mapping:virt_addr = phys_addr
+*/
paddr = 0;
-   pgd = pgd_offset_k((unsigned long)__va(paddr));
-   pud_page = (pud_t *) pgd_page_vaddr(*pgd);
-
-   for (i = pud_index(paddr); i < PTRS_PER_PUD; i++, paddr = paddr_next) {
-   pud_t *pud, *pud_tramp;
-   unsigned long vaddr = (unsigned long)__va(paddr);
+   vaddr = (unsigned long)__va(paddr);
+   pgd = pgd_offset_k(vaddr);
 
-   pud_tramp = pud_page_tramp + pud_index(paddr);
-   pud = pud_page + pud_index(vaddr);
-   paddr_next = (paddr & PUD_MASK) + PUD_SIZE;
-
-   *pud_tramp = *pud;
-   }
+   p4d = p4d_offset(pgd, vaddr);
+   pud = pud_offset(p4d, vaddr);
 
-   set_pgd(_pgd_entry,
-   __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
-}
-
-static void __meminit init_trampoline_p4d(void)
-{
-   unsigned long paddr, paddr_next;
-   pgd_t *pgd;
-   p4d_t *p4d_page, *p4d_page_tramp;
-   int i;
+   pud_tramp = pud_page_tramp + pud_index(paddr);
+   *pud_tramp = *pud;
 
-   p4d_page_tramp = alloc_low_page();
-
-   paddr = 0;
-   pgd = pgd_offset_k((unsigned long)__va(paddr));
-   p4d_page = (p4d_t *) pgd_page_vaddr(*pgd);
-
-   for (i = p4d_index(paddr); i < PTRS_PER_P4D; i++, paddr = paddr_next) {
-   p4d_t *p4d, *p4d_tramp;
-   unsigned long vaddr = (unsigned long)__va(paddr);
+   if (pgtable_l5_enabled()) {
+   p4d_page_tramp = alloc_low_page();
 
p4d_tramp = p4d_page_tramp + p4d_index(paddr);
-   p4d = p4d_page + p4d_index(vaddr);
-   paddr_next = (paddr & P4D_MASK) + P4D_SIZE;
 
-   *p4d_tramp = *p4d;
-   }
+   set_p4d(p4d_tramp,
+   __p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
 
-   set_pgd(_pgd_entry,
-   __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
+   set_pgd(_pgd_entry,
+   __pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
+   } else {
+   set_pgd(_pgd_entry,
+   __pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+   }
 }
 
 /*
- * Create PGD aligned trampoline table to allow real mode initialization
- * of additional CPUs. Consume only 1 low memory page.
+ * The real mode trampoline, which is required for bootstrapping CPUs
+ * occupies only a small area under the low 1MB.  See reserve_real_mode()
+ * for details.
+ *
+ * If KASLR is disabled the first PGD entry of the direct mapping is copied
+ * to map the real mode trampoline.
+ *
+ * If KASLR is enabled, copy only the PUD which covers the low 1MB
+ * area. This limits

[tip:x86/mm] x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB

2019-04-05 Thread tip-bot for Baoquan He

Commit-ID:  b569c18434987163a05f05a12cdf6a9975c55ff3
Gitweb: https://git.kernel.org/tip/b569c18434987163a05f05a12cdf6a9975c55ff3
Author: Baoquan He 
AuthorDate: Fri, 8 Mar 2019 10:56:16 +0800
Committer:  Thomas Gleixner 
CommitDate: Fri, 5 Apr 2019 22:13:52 +0200

x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB

The current randomization granularity of 5-level is 512 GB. The mapping of
the real mode trampoline has been reduced to one PUD entry, so there is no
restriction anymore.

Reduce the granularity to 1GB for 5-level paging mode which allows better
randomization.

[ tglx: Massaged changelog ]

Signed-off-by: Baoquan He 
Signed-off-by: Thomas Gleixner 
Acked-by: Kirill A. Shutemov 
Cc: dave.han...@linux.intel.com
Cc: l...@kernel.org
Cc: pet...@infradead.org
Cc: b...@alien8.de
Cc: h...@zytor.com
Cc: keesc...@chromium.org
Cc: thgar...@google.com
Link: https://lkml.kernel.org/r/20190308025616.21440-3-...@redhat.com
---
 arch/x86/mm/kaslr.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 97813751340d..f6ba2791eeb5 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -125,10 +125,7 @@ void __init kernel_randomize_memory(void)
 */
entropy = remain_entropy / (ARRAY_SIZE(kaslr_regions) - i);
prandom_bytes_state(_state, , sizeof(rand));
-   if (pgtable_l5_enabled())
-   entropy = (rand % (entropy + 1)) & P4D_MASK;
-   else
-   entropy = (rand % (entropy + 1)) & PUD_MASK;
+   entropy = (rand % (entropy + 1)) & PUD_MASK;
vaddr += entropy;
*kaslr_regions[i].base = vaddr;
 
@@ -137,10 +134,7 @@ void __init kernel_randomize_memory(void)
 * randomization alignment.
 */
vaddr += get_padding(_regions[i]);
-   if (pgtable_l5_enabled())
-   vaddr = round_up(vaddr + 1, P4D_SIZE);
-   else
-   vaddr = round_up(vaddr + 1, PUD_SIZE);
+   vaddr = round_up(vaddr + 1, PUD_SIZE);
remain_entropy -= entropy;
}
 }

[tip:x86/urgent] x86/boot: Fix incorrect ifdeffery scope

2019-03-27 Thread tip-bot for Baoquan He

Commit-ID:  0f02daed4e089c7a380a0ffdc9d93a5989043cf4
Gitweb: https://git.kernel.org/tip/0f02daed4e089c7a380a0ffdc9d93a5989043cf4
Author: Baoquan He 
AuthorDate: Mon, 4 Mar 2019 13:55:46 +0800
Committer:  Borislav Petkov 
CommitDate: Wed, 27 Mar 2019 14:00:51 +0100

x86/boot: Fix incorrect ifdeffery scope

The declarations related to immovable memory handling are out of the
BOOT_COMPRESSED_MISC_H #ifdef scope, wrap them inside.

Signed-off-by: Baoquan He 
Signed-off-by: Borislav Petkov 
Cc: Chao Fan 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Juergen Gross 
Cc: "Kirill A. Shutemov" 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190304055546.18566-1-...@redhat.com
---
 arch/x86/boot/compressed/misc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index fd13655e0f9b..d2f184165934 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -120,8 +120,6 @@ static inline void console_init(void)
 
 void set_sev_encryption_mask(void);
 
-#endif
-
 /* acpi.c */
 #ifdef CONFIG_ACPI
 acpi_physical_address get_rsdp_addr(void);
@@ -135,3 +133,5 @@ int count_immovable_mem_regions(void);
 #else
 static inline int count_immovable_mem_regions(void) { return 0; }
 #endif
+
+#endif /* BOOT_COMPRESSED_MISC_H */

[tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions

2018-10-06 Thread tip-bot for Baoquan He

Commit-ID:  5b12904065798fee8b153a506ac7b72d5ebbe26c
Gitweb: https://git.kernel.org/tip/5b12904065798fee8b153a506ac7b72d5ebbe26c
Author: Baoquan He 
AuthorDate: Sat, 6 Oct 2018 16:43:26 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 6 Oct 2018 14:46:47 +0200

x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions

In Documentation/x86/x86_64/mm.txt, the description of the x86-64 virtual
memory layout has become a confusing hodgepodge of inconsistencies:

 - there's a hard to read mixture of 'TB' and 'bits' notation
 - the entries sometimes mention a size in the description and sometimes not
 - sometimes they list holes by address, sometimes only as an 'unused hole' line

So make it all a coherent, readable, well organized description.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: cor...@lwn.net
Cc: linux-...@vger.kernel.org
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-3-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 Documentation/x86/x86_64/mm.txt | 84 -
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..b4bc95c9790e 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,55 @@
 
 Virtual memory map with 4 level page tables:
 
- - 7fff (=47 bits) user space, different per mm
-hole caused by [47:63] sign extension
-8000 - 87ff (=43 bits) guard hole, reserved for 
hypervisor
-8800 - c7ff (=64 TB) direct mapping of all phys. memory
-c800 - c8ff (=40 bits) hole
-c900 - e8ff (=45 bits) vmalloc/ioremap space
-e900 - e9ff (=40 bits) hole
-ea00 - eaff (=40 bits) virtual memory map (1TB)
-... unused hole ...
-ec00 - fbff (=44 bits) kasan shadow memory (16TB)
-... unused hole ...
+ - 7fff (=47 bits,   128 TB) user space, different 
per mm
+   hole caused by [47:63] sign extension
+8000 - 87ff (=43 bits, 8 TB) guard hole, reserved 
for hypervisor
+8800 - c7ff (=46 bits,64 TB) direct mapping of all 
phys. memory (page_offset_base)
+c800 - c8ff (=40 bits, 1 TB) unused hole
+c900 - e8ff (=45 bits,32 TB) vmalloc/ioremap space 
(vmalloc_base)
+e900 - e9ff (=40 bits, 1 TB) unused hole
+ea00 - eaff (=40 bits, 1 TB) virtual memory map 
(vmemmap_base)
+eb00 - ebff (=40 bits, 1 TB) unused hole
+ec00 - fbff (=44 bits,16 TB) kasan shadow memory
+fc00 - fdff (=41 bits, 2 TB) unused hole
vaddr_end for KASLR
-fe00 - fe7f (=39 bits) cpu_entry_area mapping
-fe80 - feff (=39 bits) LDT remap for PTI
-ff00 - ff7f (=39 bits) %esp fixup stacks
-... unused hole ...
-ffef - fffe (=64 GB) EFI region mapping space
-... unused hole ...
-8000 - 9fff (=512 MB)  kernel text mapping, from phys 0
-a000 - feff (1520 MB) module mapping space
+fe00 - fe7f (=39 bits,   512 GB) cpu_entry_area mapping
+fe80 - feff (=39 bits,   512 GB) LDT remap for PTI
+ff00 - ff7f (=39 bits,   512 GB) %esp fixup stacks
+ff80 - fffeefff (~39 bits,  ~507 GB) unused hole
+ffef - fffe (=36 bits,64 GB) EFI region mapping 
space
+ - 7fff (=31 bits, 2 GB) unused hole
+8000 - 9fff (=29 bits,   512 MB) kernel text mapping, 
from phys 0
+a000 - feff (~31 bits,  1520 MB) module mapping space
 [fixmap start]   - ff5f kernel-internal fixmap range
-ff60 - ff600fff (=4 kB) legacy vsyscall ABI
-ffe0 -  (=2 MB) unused hole
+ff60 - ff600fff ( =4 kB) legacy vsyscall ABI
+ffe0 -  ( =2 MB) unused hole
 
 Virtual memory map with 5 level page tables:
 
- - 00ff (=56 bits) user space, different per mm
-hole caused by [56:63] sign extension
-ff00 - ff0f (=52 bits) guard hole, reserved for 
hypervisor
-ff10 - ff8f (=55 bits) direct mapping of all phys. 
memory
-ff90 -

[tip:x86/mm] x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions

2018-10-06 Thread tip-bot for Baoquan He

Commit-ID:  5b12904065798fee8b153a506ac7b72d5ebbe26c
Gitweb: https://git.kernel.org/tip/5b12904065798fee8b153a506ac7b72d5ebbe26c
Author: Baoquan He 
AuthorDate: Sat, 6 Oct 2018 16:43:26 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 6 Oct 2018 14:46:47 +0200

x86/mm/doc: Clean up the x86-64 virtual memory layout descriptions

In Documentation/x86/x86_64/mm.txt, the description of the x86-64 virtual
memory layout has become a confusing hodgepodge of inconsistencies:

 - there's a hard to read mixture of 'TB' and 'bits' notation
 - the entries sometimes mention a size in the description and sometimes not
 - sometimes they list holes by address, sometimes only as an 'unused hole' line

So make it all a coherent, readable, well organized description.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: cor...@lwn.net
Cc: linux-...@vger.kernel.org
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-3-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 Documentation/x86/x86_64/mm.txt | 84 -
 1 file changed, 42 insertions(+), 42 deletions(-)

diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index 5432a96d31ff..b4bc95c9790e 100644
--- a/Documentation/x86/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -1,55 +1,55 @@
 
 Virtual memory map with 4 level page tables:
 
- - 7fff (=47 bits) user space, different per mm
-hole caused by [47:63] sign extension
-8000 - 87ff (=43 bits) guard hole, reserved for 
hypervisor
-8800 - c7ff (=64 TB) direct mapping of all phys. memory
-c800 - c8ff (=40 bits) hole
-c900 - e8ff (=45 bits) vmalloc/ioremap space
-e900 - e9ff (=40 bits) hole
-ea00 - eaff (=40 bits) virtual memory map (1TB)
-... unused hole ...
-ec00 - fbff (=44 bits) kasan shadow memory (16TB)
-... unused hole ...
+ - 7fff (=47 bits,   128 TB) user space, different 
per mm
+   hole caused by [47:63] sign extension
+8000 - 87ff (=43 bits, 8 TB) guard hole, reserved 
for hypervisor
+8800 - c7ff (=46 bits,64 TB) direct mapping of all 
phys. memory (page_offset_base)
+c800 - c8ff (=40 bits, 1 TB) unused hole
+c900 - e8ff (=45 bits,32 TB) vmalloc/ioremap space 
(vmalloc_base)
+e900 - e9ff (=40 bits, 1 TB) unused hole
+ea00 - eaff (=40 bits, 1 TB) virtual memory map 
(vmemmap_base)
+eb00 - ebff (=40 bits, 1 TB) unused hole
+ec00 - fbff (=44 bits,16 TB) kasan shadow memory
+fc00 - fdff (=41 bits, 2 TB) unused hole
vaddr_end for KASLR
-fe00 - fe7f (=39 bits) cpu_entry_area mapping
-fe80 - feff (=39 bits) LDT remap for PTI
-ff00 - ff7f (=39 bits) %esp fixup stacks
-... unused hole ...
-ffef - fffe (=64 GB) EFI region mapping space
-... unused hole ...
-8000 - 9fff (=512 MB)  kernel text mapping, from phys 0
-a000 - feff (1520 MB) module mapping space
+fe00 - fe7f (=39 bits,   512 GB) cpu_entry_area mapping
+fe80 - feff (=39 bits,   512 GB) LDT remap for PTI
+ff00 - ff7f (=39 bits,   512 GB) %esp fixup stacks
+ff80 - fffeefff (~39 bits,  ~507 GB) unused hole
+ffef - fffe (=36 bits,64 GB) EFI region mapping 
space
+ - 7fff (=31 bits, 2 GB) unused hole
+8000 - 9fff (=29 bits,   512 MB) kernel text mapping, 
from phys 0
+a000 - feff (~31 bits,  1520 MB) module mapping space
 [fixmap start]   - ff5f kernel-internal fixmap range
-ff60 - ff600fff (=4 kB) legacy vsyscall ABI
-ffe0 -  (=2 MB) unused hole
+ff60 - ff600fff ( =4 kB) legacy vsyscall ABI
+ffe0 -  ( =2 MB) unused hole
 
 Virtual memory map with 5 level page tables:
 
- - 00ff (=56 bits) user space, different per mm
-hole caused by [56:63] sign extension
-ff00 - ff0f (=52 bits) guard hole, reserved for 
hypervisor
-ff10 - ff8f (=55 bits) direct mapping of all phys. 
memory
-ff90 -

[tip:x86/mm] x86/KASLR: Update KERNEL_IMAGE_SIZE description

2018-10-06 Thread tip-bot for Baoquan He

Commit-ID:  06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Gitweb: https://git.kernel.org/tip/06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Author: Baoquan He 
AuthorDate: Sat, 6 Oct 2018 16:43:25 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 6 Oct 2018 14:46:46 +0200

x86/KASLR: Update KERNEL_IMAGE_SIZE description

Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the
old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them
to the current state of affairs.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: cor...@lwn.net
Cc: linux-...@vger.kernel.org
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-2-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/page_64_types.h | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 6afac386a434..cd0cf1c568b4 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -59,13 +59,16 @@
 #endif
 
 /*
- * Kernel image size is limited to 1GiB due to the fixmap living in the
- * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
- * 512MiB by default, leaving 1.5GiB for modules once the page tables
- * are fully set up. If kernel ASLR is configured, it can extend the
- * kernel page table mapping, reducing the size of the modules area.
+ * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
+ * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
+ *
+ * On KASLR use 1 GiB by default, leaving 1 GiB for modules once the
+ * page tables are fully set up.
+ *
+ * If KASLR is disabled we can shrink it to 0.5 GiB and increase the size
+ * of the modules area to 1.5 GiB.
  */
-#if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE
 #define KERNEL_IMAGE_SIZE  (1024 * 1024 * 1024)
 #else
 #define KERNEL_IMAGE_SIZE  (512 * 1024 * 1024)

[tip:x86/mm] x86/KASLR: Update KERNEL_IMAGE_SIZE description

2018-10-06 Thread tip-bot for Baoquan He

Commit-ID:  06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Gitweb: https://git.kernel.org/tip/06d4a462e954756f3d3d54e6f3f1bdc2e6f592a9
Author: Baoquan He 
AuthorDate: Sat, 6 Oct 2018 16:43:25 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 6 Oct 2018 14:46:46 +0200

x86/KASLR: Update KERNEL_IMAGE_SIZE description

Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the
old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them
to the current state of affairs.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Hansen 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: cor...@lwn.net
Cc: linux-...@vger.kernel.org
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/20181006084327.27467-2-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/page_64_types.h | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 6afac386a434..cd0cf1c568b4 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -59,13 +59,16 @@
 #endif
 
 /*
- * Kernel image size is limited to 1GiB due to the fixmap living in the
- * next 1GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S). Use
- * 512MiB by default, leaving 1.5GiB for modules once the page tables
- * are fully set up. If kernel ASLR is configured, it can extend the
- * kernel page table mapping, reducing the size of the modules area.
+ * Maximum kernel image size is limited to 1 GiB, due to the fixmap living
+ * in the next 1 GiB (see level2_kernel_pgt in arch/x86/kernel/head_64.S).
+ *
+ * On KASLR use 1 GiB by default, leaving 1 GiB for modules once the
+ * page tables are fully set up.
+ *
+ * If KASLR is disabled we can shrink it to 0.5 GiB and increase the size
+ * of the modules area to 1.5 GiB.
  */
-#if defined(CONFIG_RANDOMIZE_BASE)
+#ifdef CONFIG_RANDOMIZE_BASE
 #define KERNEL_IMAGE_SIZE  (1024 * 1024 * 1024)
 #else
 #define KERNEL_IMAGE_SIZE  (512 * 1024 * 1024)

[tip:x86/boot] x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR)

2018-07-03 Thread tip-bot for Baoquan He

Commit-ID:  747ff6265db4c2b77e8c7384f8054916a0c1eb39
Gitweb: https://git.kernel.org/tip/747ff6265db4c2b77e8c7384f8054916a0c1eb39
Author: Baoquan He 
AuthorDate: Mon, 25 Jun 2018 11:16:56 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 10:50:13 +0200

x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical 
randomization (KASLR)

When KASLR is enabled then 1GB huge pages allocations might regress
sporadically.

To reproduce on a KVM guest with 4GB RAM:

- add the following options to the kernel command-line:

   'default_hugepagesz=1G hugepagesz=1G hugepages=1'

- boot the guest and check number of 1GB pages reserved:

# grep HugePages_Total /proc/meminfo

- sporadically, every couple of bootups the output of this
  command shows that when booting with "nokaslr" HugePages_Total is always 1,
  while booting without "nokaslr" sometimes HugePages_Total is set as 0
  (that is, reserving the 1GB page failed).

Note that you may need to boot a few times to trigger the issue,
because it's somewhat non-deterministic.

The root cause is that kernel may be put into the only good 1GB huge page
in the [0x4000, 0x7fff] physical range randomly.

Below is the dmesg output snippet from the KVM guest. We can see that only
[0x4000, 0x7fff] region is good 1GB huge page,
[0x1, 0x13fff] will be touched by the memblock top-down allocation:

[...] e820: BIOS-provided physical RAM map:
[...] BIOS-e820: [mem 0x-0x0009fbff] usable
[...] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[...] BIOS-e820: [mem 0x000f-0x000f] reserved
[...] BIOS-e820: [mem 0x0010-0xbffd] usable
[...] BIOS-e820: [mem 0xbffe-0xbfff] reserved
[...] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[...] BIOS-e820: [mem 0xfffc-0x] reserved
[...] BIOS-e820: [mem 0x0001-0x00013fff] usable

Besides, on bare-metal machines with larger memory, one less 1GB huge page
might be available with KASLR enabled. That too is because the kernel
image might be randomized into those "good" 1GB huge pages.

To fix this, firstly parse the kernel command-line to get how many 1GB huge
pages are specified. Then try to skip the specified number of 1GB huge
pages when decide which memory region kernel can be randomized into.

Also change the name of handle_mem_memmap() as handle_mem_options()
since it handles not only 'mem=' and 'memmap=', but also 'hugepagesxxx' now.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: fanc.f...@cn.fujitsu.com
Cc: indou.ta...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: lcapitul...@redhat.com
Cc: yasu.isim...@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-3-...@redhat.com
[ Rewrote the changelog, fixed style problems in the code. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d97647b5ffb7..531c9876f573 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -244,7 +244,7 @@ static void parse_gb_huge_pages(char *param, char *val)
 }
 
 
-static int handle_mem_memmap(void)
+static int handle_mem_options(void)
 {
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
@@ -252,7 +252,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;
 
-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "hugepages"))
return 0;
 
tmp_cmdline = malloc(len + 1);
@@ -277,6 +278,8 @@ static int handle_mem_memmap(void)
 
if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (strstr(param, "hugepages")) {
+   parse_gb_huge_pages(param, val);
} else if (!strcmp(param, "mem")) {
char *p = val;
 
@@ -416,7 +419,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
/* We don't need to set a mapping for setup_data. */
 
/* Mark the memmap regions we need to avoid */
-   handle_mem_memmap();
+   handle_mem_options();
 
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
@@ -629,7 +632,7 @@ static void process_mem_region(struct mem_vector *entry,
 
/* If nothing overlaps, store the region and return. */
if (!mem_avoid_overlap(, )) {
-   store_slot_info(, image_size);
+   process_gb_huge_pages(, image_size);
return;
}
 
@@ -639,7 +642,7 @@

[tip:x86/boot] x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR)

2018-07-03 Thread tip-bot for Baoquan He

Commit-ID:  747ff6265db4c2b77e8c7384f8054916a0c1eb39
Gitweb: https://git.kernel.org/tip/747ff6265db4c2b77e8c7384f8054916a0c1eb39
Author: Baoquan He 
AuthorDate: Mon, 25 Jun 2018 11:16:56 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 10:50:13 +0200

x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical 
randomization (KASLR)

When KASLR is enabled then 1GB huge pages allocations might regress
sporadically.

To reproduce on a KVM guest with 4GB RAM:

- add the following options to the kernel command-line:

   'default_hugepagesz=1G hugepagesz=1G hugepages=1'

- boot the guest and check number of 1GB pages reserved:

# grep HugePages_Total /proc/meminfo

- sporadically, every couple of bootups the output of this
  command shows that when booting with "nokaslr" HugePages_Total is always 1,
  while booting without "nokaslr" sometimes HugePages_Total is set as 0
  (that is, reserving the 1GB page failed).

Note that you may need to boot a few times to trigger the issue,
because it's somewhat non-deterministic.

The root cause is that kernel may be put into the only good 1GB huge page
in the [0x4000, 0x7fff] physical range randomly.

Below is the dmesg output snippet from the KVM guest. We can see that only
[0x4000, 0x7fff] region is good 1GB huge page,
[0x1, 0x13fff] will be touched by the memblock top-down allocation:

[...] e820: BIOS-provided physical RAM map:
[...] BIOS-e820: [mem 0x-0x0009fbff] usable
[...] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[...] BIOS-e820: [mem 0x000f-0x000f] reserved
[...] BIOS-e820: [mem 0x0010-0xbffd] usable
[...] BIOS-e820: [mem 0xbffe-0xbfff] reserved
[...] BIOS-e820: [mem 0xfeffc000-0xfeff] reserved
[...] BIOS-e820: [mem 0xfffc-0x] reserved
[...] BIOS-e820: [mem 0x0001-0x00013fff] usable

Besides, on bare-metal machines with larger memory, one less 1GB huge page
might be available with KASLR enabled. That too is because the kernel
image might be randomized into those "good" 1GB huge pages.

To fix this, firstly parse the kernel command-line to get how many 1GB huge
pages are specified. Then try to skip the specified number of 1GB huge
pages when decide which memory region kernel can be randomized into.

Also change the name of handle_mem_memmap() as handle_mem_options()
since it handles not only 'mem=' and 'memmap=', but also 'hugepagesxxx' now.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: fanc.f...@cn.fujitsu.com
Cc: indou.ta...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: lcapitul...@redhat.com
Cc: yasu.isim...@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-3-...@redhat.com
[ Rewrote the changelog, fixed style problems in the code. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d97647b5ffb7..531c9876f573 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -244,7 +244,7 @@ static void parse_gb_huge_pages(char *param, char *val)
 }
 
 
-static int handle_mem_memmap(void)
+static int handle_mem_options(void)
 {
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
@@ -252,7 +252,8 @@ static int handle_mem_memmap(void)
char *param, *val;
u64 mem_size;
 
-   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem=") &&
+   !strstr(args, "hugepages"))
return 0;
 
tmp_cmdline = malloc(len + 1);
@@ -277,6 +278,8 @@ static int handle_mem_memmap(void)
 
if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (strstr(param, "hugepages")) {
+   parse_gb_huge_pages(param, val);
} else if (!strcmp(param, "mem")) {
char *p = val;
 
@@ -416,7 +419,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
/* We don't need to set a mapping for setup_data. */
 
/* Mark the memmap regions we need to avoid */
-   handle_mem_memmap();
+   handle_mem_options();
 
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
@@ -629,7 +632,7 @@ static void process_mem_region(struct mem_vector *entry,
 
/* If nothing overlaps, store the region and return. */
if (!mem_avoid_overlap(, )) {
-   store_slot_info(, image_size);
+   process_gb_huge_pages(, image_size);
return;
}
 
@@ -639,7 +642,7 @@

[tip:x86/boot] x86/boot/KASLR: Add two new functions for 1GB huge pages handling

2018-07-03 Thread tip-bot for Baoquan He

Commit-ID:  9b912485e0e74a74e042e4f2dd87f262e46fcdf1
Gitweb: https://git.kernel.org/tip/9b912485e0e74a74e042e4f2dd87f262e46fcdf1
Author: Baoquan He 
AuthorDate: Mon, 25 Jun 2018 11:16:55 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 10:50:12 +0200

x86/boot/KASLR: Add two new functions for 1GB huge pages handling

Introduce two new functions: parse_gb_huge_pages() and process_gb_huge_pages(),
which handle a conflict between KASLR and huge pages of 1GB.

These two functions will be used in the next patch:

- parse_gb_huge_pages() is used to parse kernel command-line to get
  how many 1GB huge pages have been specified. A static global
  variable 'max_gb_huge_pages' is added to store the number.

- process_gb_huge_pages() is used to skip as many 1GB huge pages
  as possible from the passed in memory region according to the
  specified number.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: fanc.f...@cn.fujitsu.com
Cc: indou.ta...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: lcapitul...@redhat.com
Cc: yasu.isim...@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-2-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 83 
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b87a7582853d..d97647b5ffb7 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -215,6 +215,35 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }
 
+/* Store the number of 1GB huge pages which users specified: */
+static unsigned long max_gb_huge_pages;
+
+static void parse_gb_huge_pages(char *param, char *val)
+{
+   static bool gbpage_sz;
+   char *p;
+
+   if (!strcmp(param, "hugepagesz")) {
+   p = val;
+   if (memparse(p, ) != PUD_SIZE) {
+   gbpage_sz = false;
+   return;
+   }
+
+   if (gbpage_sz)
+   warn("Repeatedly set hugeTLB page size of 1G!\n");
+   gbpage_sz = true;
+   return;
+   }
+
+   if (!strcmp(param, "hugepages") && gbpage_sz) {
+   p = val;
+   max_gb_huge_pages = simple_strtoull(p, , 0);
+   return;
+   }
+}
+
+
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
@@ -466,6 +495,60 @@ static void store_slot_info(struct mem_vector *region, 
unsigned long image_size)
}
 }
 
+/*
+ * Skip as many 1GB huge pages as possible in the passed region
+ * according to the number which users specified:
+ */
+static void
+process_gb_huge_pages(struct mem_vector *region, unsigned long image_size)
+{
+   unsigned long addr, size = 0;
+   struct mem_vector tmp;
+   int i = 0;
+
+   if (!max_gb_huge_pages) {
+   store_slot_info(region, image_size);
+   return;
+   }
+
+   addr = ALIGN(region->start, PUD_SIZE);
+   /* Did we raise the address above the passed in memory entry? */
+   if (addr < region->start + region->size)
+   size = region->size - (addr - region->start);
+
+   /* Check how many 1GB huge pages can be filtered out: */
+   while (size > PUD_SIZE && max_gb_huge_pages) {
+   size -= PUD_SIZE;
+   max_gb_huge_pages--;
+   i++;
+   }
+
+   /* No good 1GB huge pages found: */
+   if (!i) {
+   store_slot_info(region, image_size);
+   return;
+   }
+
+   /*
+* Skip those 'i'*1GB good huge pages, and continue checking and
+* processing the remaining head or tail part of the passed region
+* if available.
+*/
+
+   if (addr >= region->start + image_size) {
+   tmp.start = region->start;
+   tmp.size = addr - region->start;
+   store_slot_info(, image_size);
+   }
+
+   size  = region->size - (addr - region->start) - i * PUD_SIZE;
+   if (size >= image_size) {
+   tmp.start = addr + i * PUD_SIZE;
+   tmp.size = size;
+   store_slot_info(, image_size);
+   }
+}
+
 static unsigned long slots_fetch_random(void)
 {
unsigned long slot;

[tip:x86/boot] x86/boot/KASLR: Add two new functions for 1GB huge pages handling

2018-07-03 Thread tip-bot for Baoquan He

Commit-ID:  9b912485e0e74a74e042e4f2dd87f262e46fcdf1
Gitweb: https://git.kernel.org/tip/9b912485e0e74a74e042e4f2dd87f262e46fcdf1
Author: Baoquan He 
AuthorDate: Mon, 25 Jun 2018 11:16:55 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 3 Jul 2018 10:50:12 +0200

x86/boot/KASLR: Add two new functions for 1GB huge pages handling

Introduce two new functions: parse_gb_huge_pages() and process_gb_huge_pages(),
which handle a conflict between KASLR and huge pages of 1GB.

These two functions will be used in the next patch:

- parse_gb_huge_pages() is used to parse kernel command-line to get
  how many 1GB huge pages have been specified. A static global
  variable 'max_gb_huge_pages' is added to store the number.

- process_gb_huge_pages() is used to skip as many 1GB huge pages
  as possible from the passed in memory region according to the
  specified number.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: fanc.f...@cn.fujitsu.com
Cc: indou.ta...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: lcapitul...@redhat.com
Cc: yasu.isim...@gmail.com
Link: http://lkml.kernel.org/r/20180625031656.12443-2-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 83 
 1 file changed, 83 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index b87a7582853d..d97647b5ffb7 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -215,6 +215,35 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }
 
+/* Store the number of 1GB huge pages which users specified: */
+static unsigned long max_gb_huge_pages;
+
+static void parse_gb_huge_pages(char *param, char *val)
+{
+   static bool gbpage_sz;
+   char *p;
+
+   if (!strcmp(param, "hugepagesz")) {
+   p = val;
+   if (memparse(p, ) != PUD_SIZE) {
+   gbpage_sz = false;
+   return;
+   }
+
+   if (gbpage_sz)
+   warn("Repeatedly set hugeTLB page size of 1G!\n");
+   gbpage_sz = true;
+   return;
+   }
+
+   if (!strcmp(param, "hugepages") && gbpage_sz) {
+   p = val;
+   max_gb_huge_pages = simple_strtoull(p, , 0);
+   return;
+   }
+}
+
+
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
@@ -466,6 +495,60 @@ static void store_slot_info(struct mem_vector *region, 
unsigned long image_size)
}
 }
 
+/*
+ * Skip as many 1GB huge pages as possible in the passed region
+ * according to the number which users specified:
+ */
+static void
+process_gb_huge_pages(struct mem_vector *region, unsigned long image_size)
+{
+   unsigned long addr, size = 0;
+   struct mem_vector tmp;
+   int i = 0;
+
+   if (!max_gb_huge_pages) {
+   store_slot_info(region, image_size);
+   return;
+   }
+
+   addr = ALIGN(region->start, PUD_SIZE);
+   /* Did we raise the address above the passed in memory entry? */
+   if (addr < region->start + region->size)
+   size = region->size - (addr - region->start);
+
+   /* Check how many 1GB huge pages can be filtered out: */
+   while (size > PUD_SIZE && max_gb_huge_pages) {
+   size -= PUD_SIZE;
+   max_gb_huge_pages--;
+   i++;
+   }
+
+   /* No good 1GB huge pages found: */
+   if (!i) {
+   store_slot_info(region, image_size);
+   return;
+   }
+
+   /*
+* Skip those 'i'*1GB good huge pages, and continue checking and
+* processing the remaining head or tail part of the passed region
+* if available.
+*/
+
+   if (addr >= region->start + image_size) {
+   tmp.start = region->start;
+   tmp.size = addr - region->start;
+   store_slot_info(, image_size);
+   }
+
+   size  = region->size - (addr - region->start) - i * PUD_SIZE;
+   if (size >= image_size) {
+   tmp.start = addr + i * PUD_SIZE;
+   tmp.size = size;
+   store_slot_info(, image_size);
+   }
+}
+
 static unsigned long slots_fetch_random(void)
 {
unsigned long slot;

[tip:x86/mm] kdump, vmcoreinfo: Export pgtable_l5_enabled value

2018-03-12 Thread tip-bot for Baoquan He

Commit-ID:  c100a583601d357f923c41af5434dc1f8d07890f
Gitweb: https://git.kernel.org/tip/c100a583601d357f923c41af5434dc1f8d07890f
Author: Baoquan He 
AuthorDate: Fri, 2 Mar 2018 13:18:01 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 12 Mar 2018 09:43:56 +0100

kdump, vmcoreinfo: Export pgtable_l5_enabled value

User-space utilities examining crash-kernels need to know if the
crashed kernel was in 5-level paging mode or not.

So write 'pgtable_l5_enabled' to vmcoreinfo, which covers these
three cases:

  pgtable_l5_enabled == 0 when:
   - Compiled with !CONFIG_X86_5LEVEL
   - Compiled with CONFIG_X86_5LEVEL=y while CPU has no 'la57' flag

  pgtable_l5_enabled != 0 when:
   - Compiled with CONFIG_X86_5LEVEL=y and CPU has 'la57' flag

Signed-off-by: Baoquan He 
Acked-by: Kirill A. Shutemov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Cc: ebied...@xmission.com
Cc: kirill.shute...@linux.intel.com
Cc: vgo...@redhat.com
Link: http://lkml.kernel.org/r/20180302051801.19594-1-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/machine_kexec_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 3b7427aa7d85..02f913cb27b5 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -350,6 +350,7 @@ void arch_crash_save_vmcoreinfo(void)
 {
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
+   VMCOREINFO_NUMBER(pgtable_l5_enabled);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);

[tip:x86/mm] kdump, vmcoreinfo: Export pgtable_l5_enabled value

2018-03-12 Thread tip-bot for Baoquan He

Commit-ID:  c100a583601d357f923c41af5434dc1f8d07890f
Gitweb: https://git.kernel.org/tip/c100a583601d357f923c41af5434dc1f8d07890f
Author: Baoquan He 
AuthorDate: Fri, 2 Mar 2018 13:18:01 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 12 Mar 2018 09:43:56 +0100

kdump, vmcoreinfo: Export pgtable_l5_enabled value

User-space utilities examining crash-kernels need to know if the
crashed kernel was in 5-level paging mode or not.

So write 'pgtable_l5_enabled' to vmcoreinfo, which covers these
three cases:

  pgtable_l5_enabled == 0 when:
   - Compiled with !CONFIG_X86_5LEVEL
   - Compiled with CONFIG_X86_5LEVEL=y while CPU has no 'la57' flag

  pgtable_l5_enabled != 0 when:
   - Compiled with CONFIG_X86_5LEVEL=y and CPU has 'la57' flag

Signed-off-by: Baoquan He 
Acked-by: Kirill A. Shutemov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Cc: ebied...@xmission.com
Cc: kirill.shute...@linux.intel.com
Cc: vgo...@redhat.com
Link: http://lkml.kernel.org/r/20180302051801.19594-1-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/machine_kexec_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 3b7427aa7d85..02f913cb27b5 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -350,6 +350,7 @@ void arch_crash_save_vmcoreinfo(void)
 {
VMCOREINFO_NUMBER(phys_base);
VMCOREINFO_SYMBOL(init_top_pgt);
+   VMCOREINFO_NUMBER(pgtable_l5_enabled);
 
 #ifdef CONFIG_NUMA
VMCOREINFO_SYMBOL(node_data);

[tip:x86/apic] x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  bee3204ec3c49f6f53add9c3962c9012a5c036fa
Gitweb: https://git.kernel.org/tip/bee3204ec3c49f6f53add9c3962c9012a5c036fa
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:56 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:46 +0100

x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified

Currently the kdump kernel becomes very slow if 'noapic' is specified.
Normal kernel doesn't have this bug.

Kernel parameter 'noapic' is used to disable IO-APIC in system for
testing or special purpose. Here the root cause is that in kdump
kernel LAPIC is disabled since commit:

  522e664644 ("x86/apic: Disable I/O APIC before shutdown of the local APIC")

In this case we need set up through-local-APIC on boot CPU in
setup_local_APIC().

In normal kernel the legacy irq mode is enabled by the BIOS. If
it is virtual wire mode, the local-APIC has been enabled and set as
through-local-APIC.

Though we fixed the regression introduced by commit 522e664644,
to further improve robustness set up the through-local-APIC mode
explicitly, do not rely on the default boot IRQ mode.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-7-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/apic/apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 871018d..2ceac9f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1570,7 +1570,7 @@ static void setup_local_APIC(void)
 * TODO: set up through-local-APIC from through-I/O-APIC? --macro
 */
value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
-   if (!cpu && (pic_mode || !value)) {
+   if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
value = APIC_DM_EXTINT;
apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);
} else {

[tip:x86/apic] x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  bee3204ec3c49f6f53add9c3962c9012a5c036fa
Gitweb: https://git.kernel.org/tip/bee3204ec3c49f6f53add9c3962c9012a5c036fa
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:56 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:46 +0100

x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified

Currently the kdump kernel becomes very slow if 'noapic' is specified.
Normal kernel doesn't have this bug.

Kernel parameter 'noapic' is used to disable IO-APIC in system for
testing or special purpose. Here the root cause is that in kdump
kernel LAPIC is disabled since commit:

  522e664644 ("x86/apic: Disable I/O APIC before shutdown of the local APIC")

In this case we need set up through-local-APIC on boot CPU in
setup_local_APIC().

In normal kernel the legacy irq mode is enabled by the BIOS. If
it is virtual wire mode, the local-APIC has been enabled and set as
through-local-APIC.

Though we fixed the regression introduced by commit 522e664644,
to further improve robustness set up the through-local-APIC mode
explicitly, do not rely on the default boot IRQ mode.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-7-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/apic/apic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 871018d..2ceac9f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1570,7 +1570,7 @@ static void setup_local_APIC(void)
 * TODO: set up through-local-APIC from through-I/O-APIC? --macro
 */
value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
-   if (!cpu && (pic_mode || !value)) {
+   if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
value = APIC_DM_EXTINT;
apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);
} else {

[tip:x86/apic] x86/apic: Rename variables and functions related to x86_io_apic_ops

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  51b146c572201e3c368e0baa3e565760aefcf25f
Gitweb: https://git.kernel.org/tip/51b146c572201e3c368e0baa3e565760aefcf25f
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:55 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:45 +0100

x86/apic: Rename variables and functions related to x86_io_apic_ops

The names of x86_io_apic_ops and its two member variables are
misleading:

The ->read() member is to read IO_APIC reg, while ->disable()
which is called by native_disable_io_apic()/irq_remapping_disable_io_apic()
is actually used to restore boot IRQ mode, not to disable the IO-APIC.

So rename x86_io_apic_ops to 'x86_apic_ops' since it doesn't only
handle the IO-APIC, but also the local APIC.

Also rename its member variables and the related callbacks.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-6-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h  | 6 +++---
 arch/x86/include/asm/x86_init.h | 8 
 arch/x86/kernel/apic/io_apic.c  | 4 ++--
 arch/x86/kernel/x86_init.c  | 6 +++---
 arch/x86/xen/apic.c | 2 +-
 drivers/iommu/irq_remapping.c   | 4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 8018fc4..fd20a23 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -183,11 +183,11 @@ extern void disable_ioapic_support(void);
 
 extern void __init io_apic_init_mappings(void);
 extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
-extern void native_disable_io_apic(void);
+extern void native_restore_boot_irq_mode(void);
 
 static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
 {
-   return x86_io_apic_ops.read(apic, reg);
+   return x86_apic_ops.io_apic_read(apic, reg);
 }
 
 extern void setup_IO_APIC(void);
@@ -229,7 +229,7 @@ static inline void mp_save_irq(struct mpc_intsrc *m) { }
 static inline void disable_ioapic_support(void) { }
 static inline void io_apic_init_mappings(void) { }
 #define native_io_apic_readNULL
-#define native_disable_io_apic NULL
+#define native_restore_boot_irq_mode   NULL
 
 static inline void setup_IO_APIC(void) { }
 static inline void enable_IO_APIC(void) { }
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index fc2f082..8830605 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -274,16 +274,16 @@ struct x86_msi_ops {
void (*restore_msi_irqs)(struct pci_dev *dev);
 };
 
-struct x86_io_apic_ops {
-   unsigned int(*read)   (unsigned int apic, unsigned int reg);
-   void(*disable)(void);
+struct x86_apic_ops {
+   unsigned int(*io_apic_read)   (unsigned int apic, unsigned int reg);
+   void(*restore)(void);
 };
 
 extern struct x86_init_ops x86_init;
 extern struct x86_cpuinit_ops x86_cpuinit;
 extern struct x86_platform_ops x86_platform;
 extern struct x86_msi_ops x86_msi;
-extern struct x86_io_apic_ops x86_io_apic_ops;
+extern struct x86_apic_ops x86_apic_ops;
 
 extern void x86_early_init_platform_quirks(void);
 extern void x86_init_noop(void);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 9d86b10..68129f1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1410,7 +1410,7 @@ void __init enable_IO_APIC(void)
clear_IO_APIC();
 }
 
-void native_disable_io_apic(void)
+void native_restore_boot_irq_mode(void)
 {
/*
 * If the i8259 is routed through an IOAPIC
@@ -1443,7 +1443,7 @@ void restore_boot_irq_mode(void)
if (!nr_legacy_irqs())
return;
 
-   x86_io_apic_ops.disable();
+   x86_apic_ops.restore();
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 1151ccd..2bccd03 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -146,7 +146,7 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
 }
 #endif
 
-struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
-   .read   = native_io_apic_read,
-   .disable= native_disable_io_apic,
+struct x86_apic_ops x86_apic_ops __ro_after_init = {
+   .io_apic_read   = native_io_apic_read,
+   .restore= native_restore_boot_irq_mode,
 };
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index de58533..2163888 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -215,7 +215,7 @@ static

[tip:x86/apic] x86/apic: Rename variables and functions related to x86_io_apic_ops

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  51b146c572201e3c368e0baa3e565760aefcf25f
Gitweb: https://git.kernel.org/tip/51b146c572201e3c368e0baa3e565760aefcf25f
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:55 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:45 +0100

x86/apic: Rename variables and functions related to x86_io_apic_ops

The names of x86_io_apic_ops and its two member variables are
misleading:

The ->read() member is to read IO_APIC reg, while ->disable()
which is called by native_disable_io_apic()/irq_remapping_disable_io_apic()
is actually used to restore boot IRQ mode, not to disable the IO-APIC.

So rename x86_io_apic_ops to 'x86_apic_ops' since it doesn't only
handle the IO-APIC, but also the local APIC.

Also rename its member variables and the related callbacks.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-6-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h  | 6 +++---
 arch/x86/include/asm/x86_init.h | 8 
 arch/x86/kernel/apic/io_apic.c  | 4 ++--
 arch/x86/kernel/x86_init.c  | 6 +++---
 arch/x86/xen/apic.c | 2 +-
 drivers/iommu/irq_remapping.c   | 4 ++--
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 8018fc4..fd20a23 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -183,11 +183,11 @@ extern void disable_ioapic_support(void);
 
 extern void __init io_apic_init_mappings(void);
 extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
-extern void native_disable_io_apic(void);
+extern void native_restore_boot_irq_mode(void);
 
 static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
 {
-   return x86_io_apic_ops.read(apic, reg);
+   return x86_apic_ops.io_apic_read(apic, reg);
 }
 
 extern void setup_IO_APIC(void);
@@ -229,7 +229,7 @@ static inline void mp_save_irq(struct mpc_intsrc *m) { }
 static inline void disable_ioapic_support(void) { }
 static inline void io_apic_init_mappings(void) { }
 #define native_io_apic_readNULL
-#define native_disable_io_apic NULL
+#define native_restore_boot_irq_mode   NULL
 
 static inline void setup_IO_APIC(void) { }
 static inline void enable_IO_APIC(void) { }
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index fc2f082..8830605 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -274,16 +274,16 @@ struct x86_msi_ops {
void (*restore_msi_irqs)(struct pci_dev *dev);
 };
 
-struct x86_io_apic_ops {
-   unsigned int(*read)   (unsigned int apic, unsigned int reg);
-   void(*disable)(void);
+struct x86_apic_ops {
+   unsigned int(*io_apic_read)   (unsigned int apic, unsigned int reg);
+   void(*restore)(void);
 };
 
 extern struct x86_init_ops x86_init;
 extern struct x86_cpuinit_ops x86_cpuinit;
 extern struct x86_platform_ops x86_platform;
 extern struct x86_msi_ops x86_msi;
-extern struct x86_io_apic_ops x86_io_apic_ops;
+extern struct x86_apic_ops x86_apic_ops;
 
 extern void x86_early_init_platform_quirks(void);
 extern void x86_init_noop(void);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 9d86b10..68129f1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1410,7 +1410,7 @@ void __init enable_IO_APIC(void)
clear_IO_APIC();
 }
 
-void native_disable_io_apic(void)
+void native_restore_boot_irq_mode(void)
 {
/*
 * If the i8259 is routed through an IOAPIC
@@ -1443,7 +1443,7 @@ void restore_boot_irq_mode(void)
if (!nr_legacy_irqs())
return;
 
-   x86_io_apic_ops.disable();
+   x86_apic_ops.restore();
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 1151ccd..2bccd03 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -146,7 +146,7 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
 }
 #endif
 
-struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
-   .read   = native_io_apic_read,
-   .disable= native_disable_io_apic,
+struct x86_apic_ops x86_apic_ops __ro_after_init = {
+   .io_apic_read   = native_io_apic_read,
+   .restore= native_restore_boot_irq_mode,
 };
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index de58533..2163888 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -215,7 +215,7 @@ static void __init xen_apic_check(void)
 }
 void __init xen_init_apic(void)
 {
-   x86_io_apic_ops.read = xen_io_apic_read;
+   x86_apic_ops.io_apic_read =

[tip:x86/apic] x86/apic: Remove the (now) unused disable_IO_APIC() function

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  50374b96d2d30c03c8d42b3f8846d8938748d454
Gitweb: https://git.kernel.org/tip/50374b96d2d30c03c8d42b3f8846d8938748d454
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:54 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:45 +0100

x86/apic: Remove the (now) unused disable_IO_APIC() function

No one uses it anymore.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-5-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h |  1 -
 arch/x86/kernel/apic/io_apic.c | 13 -
 arch/x86/kernel/machine_kexec_32.c |  5 ++---
 arch/x86/kernel/machine_kexec_64.c |  5 ++---
 4 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 2ae1b424c..8018fc4 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -192,7 +192,6 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
-extern void disable_IO_APIC(void);
 extern void clear_IO_APIC(void);
 extern void restore_boot_irq_mode(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 2d7cd2d..9d86b10 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1438,19 +1438,6 @@ void native_disable_io_apic(void)
disconnect_bsp_APIC(ioapic_i8259.pin != -1);
 }
 
-/*
- * Not an __init, needed by the reboot code
- */
-void disable_IO_APIC(void)
-{
-   /*
-* Clear the IO-APIC before rebooting:
-*/
-   clear_IO_APIC();
-
-   restore_boot_irq_mode();
-}
-
 void restore_boot_irq_mode(void)
 {
if (!nr_legacy_irqs())
diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index 4cd79d8..60cdec6 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -195,9 +195,8 @@ void machine_kexec(struct kimage *image)
/*
 * We need to put APICs in legacy mode so that we can
 * get timer interrupts in second kernel. kexec/kdump
-* paths already have calls to disable_IO_APIC() in
-* one form or other. kexec jump path also need
-* one.
+* paths already have calls to restore_boot_irq_mode()
+* in one form or other. kexec jump path also need one.
 */
clear_IO_APIC();
restore_boot_irq_mode();
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 2ab14b9..5ffbc55 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -293,9 +293,8 @@ void machine_kexec(struct kimage *image)
/*
 * We need to put APICs in legacy mode so that we can
 * get timer interrupts in second kernel. kexec/kdump
-* paths already have calls to disable_IO_APIC() in
-* one form or other. kexec jump path also need
-* one.
+* paths already have calls to restore_boot_irq_mode()
+* in one form or other. kexec jump path also need one.
 */
clear_IO_APIC();
restore_boot_irq_mode();

[tip:x86/apic] x86/apic: Remove the (now) unused disable_IO_APIC() function

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  50374b96d2d30c03c8d42b3f8846d8938748d454
Gitweb: https://git.kernel.org/tip/50374b96d2d30c03c8d42b3f8846d8938748d454
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:54 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:45 +0100

x86/apic: Remove the (now) unused disable_IO_APIC() function

No one uses it anymore.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-5-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h |  1 -
 arch/x86/kernel/apic/io_apic.c | 13 -
 arch/x86/kernel/machine_kexec_32.c |  5 ++---
 arch/x86/kernel/machine_kexec_64.c |  5 ++---
 4 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 2ae1b424c..8018fc4 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -192,7 +192,6 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
-extern void disable_IO_APIC(void);
 extern void clear_IO_APIC(void);
 extern void restore_boot_irq_mode(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 2d7cd2d..9d86b10 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1438,19 +1438,6 @@ void native_disable_io_apic(void)
disconnect_bsp_APIC(ioapic_i8259.pin != -1);
 }
 
-/*
- * Not an __init, needed by the reboot code
- */
-void disable_IO_APIC(void)
-{
-   /*
-* Clear the IO-APIC before rebooting:
-*/
-   clear_IO_APIC();
-
-   restore_boot_irq_mode();
-}
-
 void restore_boot_irq_mode(void)
 {
if (!nr_legacy_irqs())
diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index 4cd79d8..60cdec6 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -195,9 +195,8 @@ void machine_kexec(struct kimage *image)
/*
 * We need to put APICs in legacy mode so that we can
 * get timer interrupts in second kernel. kexec/kdump
-* paths already have calls to disable_IO_APIC() in
-* one form or other. kexec jump path also need
-* one.
+* paths already have calls to restore_boot_irq_mode()
+* in one form or other. kexec jump path also need one.
 */
clear_IO_APIC();
restore_boot_irq_mode();
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 2ab14b9..5ffbc55 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -293,9 +293,8 @@ void machine_kexec(struct kimage *image)
/*
 * We need to put APICs in legacy mode so that we can
 * get timer interrupts in second kernel. kexec/kdump
-* paths already have calls to disable_IO_APIC() in
-* one form or other. kexec jump path also need
-* one.
+* paths already have calls to restore_boot_irq_mode()
+* in one form or other. kexec jump path also need one.
 */
clear_IO_APIC();
restore_boot_irq_mode();

[tip:x86/apic] x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  339b2ae0cd5d4a58f9efe06e4ee36adbeca59228
Gitweb: https://git.kernel.org/tip/339b2ae0cd5d4a58f9efe06e4ee36adbeca59228
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:53 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:45 +0100

x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump

This is a regression fix.

Before, to fix erratum AVR31, the following commit:

  522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC")

... moved the lapic_shutdown() call to after disable_IO_APIC() in the reboot
and kexec/kdump code paths.

This introduced the following regression: disable_IO_APIC() not only clears
the IO-APIC, but it also restores boot IRQ mode by setting the
LAPIC/APIC/IMCR, calling lapic_shutdown() after disable_IO_APIC() will
disable LAPIC and ruin the possible virtual wire mode setting which
the code has been trying to do all along.

The consequence is that a KVM guest kernel always prints the warning below
during kexec/kdump as the kernel boots up:

  [0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 
setup_local_APIC+0x228/0x330
  []
  [0.001000] Call Trace:
  [0.001000]  apic_bsp_setup+0x56/0x74
  [0.001000]  x86_late_time_init+0x11/0x16
  [0.001000]  start_kernel+0x3c9/0x486
  [0.001000]  secondary_startup_64+0xa5/0xb0
  []
  [0.001000] masked ExtINT on CPU#0

To fix this, just call clear_IO_APIC() to stop the IO-APIC where
disable_IO_APIC() was called, and call restore_boot_irq_mode() to
restore boot IRQ mode before a reboot or a kexec/kdump jump.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: sta...@vger.kernel.org
Cc: uober...@redhat.com
Fixes: commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the 
local APIC")
Link: http://lkml.kernel.org/r/20180214054656.3780-4-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/crash.c  | 3 ++-
 arch/x86/kernel/reboot.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 10e74d4..1f66804 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -199,9 +199,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 #ifdef CONFIG_X86_IO_APIC
/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
ioapic_zap_locks();
-   disable_IO_APIC();
+   clear_IO_APIC();
 #endif
lapic_shutdown();
+   restore_boot_irq_mode();
 #ifdef CONFIG_HPET_TIMER
hpet_disable();
 #endif
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 2126b9d..725624b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -666,7 +666,7 @@ void native_machine_shutdown(void)
 * Even without the erratum, it still makes sense to quiet IO APIC
 * before disabling Local APIC.
 */
-   disable_IO_APIC();
+   clear_IO_APIC();
 #endif
 
 #ifdef CONFIG_SMP
@@ -680,6 +680,7 @@ void native_machine_shutdown(void)
 #endif
 
lapic_shutdown();
+   restore_boot_irq_mode();
 
 #ifdef CONFIG_HPET_TIMER
hpet_disable();

[tip:x86/apic] x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  339b2ae0cd5d4a58f9efe06e4ee36adbeca59228
Gitweb: https://git.kernel.org/tip/339b2ae0cd5d4a58f9efe06e4ee36adbeca59228
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:53 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:45 +0100

x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump

This is a regression fix.

Before, to fix erratum AVR31, the following commit:

  522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC")

... moved the lapic_shutdown() call to after disable_IO_APIC() in the reboot
and kexec/kdump code paths.

This introduced the following regression: disable_IO_APIC() not only clears
the IO-APIC, but it also restores boot IRQ mode by setting the
LAPIC/APIC/IMCR, calling lapic_shutdown() after disable_IO_APIC() will
disable LAPIC and ruin the possible virtual wire mode setting which
the code has been trying to do all along.

The consequence is that a KVM guest kernel always prints the warning below
during kexec/kdump as the kernel boots up:

  [0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 
setup_local_APIC+0x228/0x330
  []
  [0.001000] Call Trace:
  [0.001000]  apic_bsp_setup+0x56/0x74
  [0.001000]  x86_late_time_init+0x11/0x16
  [0.001000]  start_kernel+0x3c9/0x486
  [0.001000]  secondary_startup_64+0xa5/0xb0
  []
  [0.001000] masked ExtINT on CPU#0

To fix this, just call clear_IO_APIC() to stop the IO-APIC where
disable_IO_APIC() was called, and call restore_boot_irq_mode() to
restore boot IRQ mode before a reboot or a kexec/kdump jump.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: sta...@vger.kernel.org
Cc: uober...@redhat.com
Fixes: commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the 
local APIC")
Link: http://lkml.kernel.org/r/20180214054656.3780-4-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/crash.c  | 3 ++-
 arch/x86/kernel/reboot.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 10e74d4..1f66804 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -199,9 +199,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 #ifdef CONFIG_X86_IO_APIC
/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
ioapic_zap_locks();
-   disable_IO_APIC();
+   clear_IO_APIC();
 #endif
lapic_shutdown();
+   restore_boot_irq_mode();
 #ifdef CONFIG_HPET_TIMER
hpet_disable();
 #endif
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 2126b9d..725624b 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -666,7 +666,7 @@ void native_machine_shutdown(void)
 * Even without the erratum, it still makes sense to quiet IO APIC
 * before disabling Local APIC.
 */
-   disable_IO_APIC();
+   clear_IO_APIC();
 #endif
 
 #ifdef CONFIG_SMP
@@ -680,6 +680,7 @@ void native_machine_shutdown(void)
 #endif
 
lapic_shutdown();
+   restore_boot_irq_mode();
 
 #ifdef CONFIG_HPET_TIMER
hpet_disable();

[tip:x86/apic] x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  3c9e76dbea004b2c7c3ce872022ceaf5ff0dae79
Gitweb: https://git.kernel.org/tip/3c9e76dbea004b2c7c3ce872022ceaf5ff0dae79
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:52 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:44 +0100

x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y

Split  following patches disable_IO_APIC() will be broken up into
clear_IO_APIC() and restore_boot_irq_mode().

These two functions will be called separately where they are needed
to fix a regression introduced by:

  522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").

While the CONFIG_KEXEC_JUMP=y code doesn't call lapic_shutdown() before jump
like kexec/kdump, so it's not impacted by commit 522e66464467.

Hence here change clear_IO_APIC() as public, and replace disable_IO_APIC()
with clear_IO_APIC() and restore_boot_irq_mode() to keep CONFIG_KEXEC_JUMP=y
code unchanged in essence. No functional change.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-3-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h | 1 +
 arch/x86/kernel/apic/io_apic.c | 2 +-
 arch/x86/kernel/machine_kexec_32.c | 3 ++-
 arch/x86/kernel/machine_kexec_64.c | 3 ++-
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 4e3bb13..2ae1b424c 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -193,6 +193,7 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
+extern void clear_IO_APIC(void);
 extern void restore_boot_irq_mode(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
 extern void print_IO_APICs(void);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b73b6b..2d7cd2d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned 
int pin)
   mpc_ioapic_id(apic), pin);
 }
 
-static void clear_IO_APIC (void)
+void clear_IO_APIC (void)
 {
int apic, pin;
 
diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index edfede7..4cd79d8 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -199,7 +199,8 @@ void machine_kexec(struct kimage *image)
 * one form or other. kexec jump path also need
 * one.
 */
-   disable_IO_APIC();
+   clear_IO_APIC();
+   restore_boot_irq_mode();
 #endif
}
 
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 1f790cf..2ab14b9 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -297,7 +297,8 @@ void machine_kexec(struct kimage *image)
 * one form or other. kexec jump path also need
 * one.
 */
-   disable_IO_APIC();
+   clear_IO_APIC();
+   restore_boot_irq_mode();
 #endif
}

[tip:x86/apic] x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  3c9e76dbea004b2c7c3ce872022ceaf5ff0dae79
Gitweb: https://git.kernel.org/tip/3c9e76dbea004b2c7c3ce872022ceaf5ff0dae79
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:52 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:44 +0100

x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y

Split  following patches disable_IO_APIC() will be broken up into
clear_IO_APIC() and restore_boot_irq_mode().

These two functions will be called separately where they are needed
to fix a regression introduced by:

  522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").

While the CONFIG_KEXEC_JUMP=y code doesn't call lapic_shutdown() before jump
like kexec/kdump, so it's not impacted by commit 522e66464467.

Hence here change clear_IO_APIC() as public, and replace disable_IO_APIC()
with clear_IO_APIC() and restore_boot_irq_mode() to keep CONFIG_KEXEC_JUMP=y
code unchanged in essence. No functional change.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-3-...@redhat.com
[ Rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h | 1 +
 arch/x86/kernel/apic/io_apic.c | 2 +-
 arch/x86/kernel/machine_kexec_32.c | 3 ++-
 arch/x86/kernel/machine_kexec_64.c | 3 ++-
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index 4e3bb13..2ae1b424c 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -193,6 +193,7 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
+extern void clear_IO_APIC(void);
 extern void restore_boot_irq_mode(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
 extern void print_IO_APICs(void);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 7b73b6b..2d7cd2d 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned 
int pin)
   mpc_ioapic_id(apic), pin);
 }
 
-static void clear_IO_APIC (void)
+void clear_IO_APIC (void)
 {
int apic, pin;
 
diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index edfede7..4cd79d8 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -199,7 +199,8 @@ void machine_kexec(struct kimage *image)
 * one form or other. kexec jump path also need
 * one.
 */
-   disable_IO_APIC();
+   clear_IO_APIC();
+   restore_boot_irq_mode();
 #endif
}
 
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 1f790cf..2ab14b9 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -297,7 +297,8 @@ void machine_kexec(struct kimage *image)
 * one form or other. kexec jump path also need
 * one.
 */
-   disable_IO_APIC();
+   clear_IO_APIC();
+   restore_boot_irq_mode();
 #endif
}

[tip:x86/apic] x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  ce279cdc04aafd5c41ae49f941ee2c3342e35e3e
Gitweb: https://git.kernel.org/tip/ce279cdc04aafd5c41ae49f941ee2c3342e35e3e
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:51 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:29 +0100

x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()

This is a preparation patch. Split out the code which restores boot
irq mode from disable_IO_APIC() into the new restore_boot_irq_mode()
function.

No functional changes.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-2-...@redhat.com
[ Build fix for !CONFIG_IO_APIC and rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h | 2 ++
 arch/x86/kernel/apic/io_apic.c | 5 +
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a8834dd..4e3bb13 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -193,6 +193,7 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
+extern void restore_boot_irq_mode(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
 extern void print_IO_APICs(void);
 #else  /* !CONFIG_X86_IO_APIC */
@@ -232,6 +233,7 @@ static inline void io_apic_init_mappings(void) { }
 
 static inline void setup_IO_APIC(void) { }
 static inline void enable_IO_APIC(void) { }
+static inline void restore_boot_irq_mode(void) { }
 
 #endif
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8ad2e41..7b73b6b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1448,6 +1448,11 @@ void disable_IO_APIC(void)
 */
clear_IO_APIC();
 
+   restore_boot_irq_mode();
+}
+
+void restore_boot_irq_mode(void)
+{
if (!nr_legacy_irqs())
return;

[tip:x86/apic] x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()

2018-02-17 Thread tip-bot for Baoquan He

Commit-ID:  ce279cdc04aafd5c41ae49f941ee2c3342e35e3e
Gitweb: https://git.kernel.org/tip/ce279cdc04aafd5c41ae49f941ee2c3342e35e3e
Author: Baoquan He 
AuthorDate: Wed, 14 Feb 2018 13:46:51 +0800
Committer:  Ingo Molnar 
CommitDate: Sat, 17 Feb 2018 11:47:29 +0100

x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC()

This is a preparation patch. Split out the code which restores boot
irq mode from disable_IO_APIC() into the new restore_boot_irq_mode()
function.

No functional changes.

Signed-off-by: Baoquan He 
Reviewed-by: Eric W. Biederman 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: douly.f...@cn.fujitsu.com
Cc: j...@8bytes.org
Cc: pra...@redhat.com
Cc: uober...@redhat.com
Link: http://lkml.kernel.org/r/20180214054656.3780-2-...@redhat.com
[ Build fix for !CONFIG_IO_APIC and rewrote the changelog. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/io_apic.h | 2 ++
 arch/x86/kernel/apic/io_apic.c | 5 +
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a8834dd..4e3bb13 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -193,6 +193,7 @@ static inline unsigned int io_apic_read(unsigned int apic, 
unsigned int reg)
 extern void setup_IO_APIC(void);
 extern void enable_IO_APIC(void);
 extern void disable_IO_APIC(void);
+extern void restore_boot_irq_mode(void);
 extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
 extern void print_IO_APICs(void);
 #else  /* !CONFIG_X86_IO_APIC */
@@ -232,6 +233,7 @@ static inline void io_apic_init_mappings(void) { }
 
 static inline void setup_IO_APIC(void) { }
 static inline void enable_IO_APIC(void) { }
+static inline void restore_boot_irq_mode(void) { }
 
 #endif
 
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8ad2e41..7b73b6b 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1448,6 +1448,11 @@ void disable_IO_APIC(void)
 */
clear_IO_APIC();
 
+   restore_boot_irq_mode();
+}
+
+void restore_boot_irq_mode(void)
+{
if (!nr_legacy_irqs())
return;

[tip:x86/mm] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'

2017-10-30 Thread tip-bot for Baoquan He

Commit-ID:  15670bfe19905b1dcbb63137f40d718b59d84479
Gitweb: https://git.kernel.org/tip/15670bfe19905b1dcbb63137f40d718b59d84479
Author: Baoquan He 
AuthorDate: Sat, 28 Oct 2017 09:30:38 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 30 Oct 2017 10:30:23 +0100

x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 
'nr_pages'

register_page_bootmem_memmap()'s 3rd 'size' parameter is named
in a somewhat misleading fashion - rename it to 'nr_pages' which
makes the units of it much clearer.

Meanwhile rename the existing local variable 'nr_pages' to
'nr_pmd_pages', a more expressive name, to avoid conflict with
new function parameter 'nr_pages'.

(Also clean up the unnecessary parentheses in which get_order() is called.)

Signed-off-by: Baoquan He 
Acked-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: a...@linux-foundation.org
Link: http://lkml.kernel.org/r/1509154238-23250-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 10 +-
 include/linux/mm.h|  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 048fbe8..adcea90 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
 
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && 
defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
 void register_page_bootmem_memmap(unsigned long section_nr,
- struct page *start_page, unsigned long size)
+ struct page *start_page, unsigned long 
nr_pages)
 {
unsigned long addr = (unsigned long)start_page;
-   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long next;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
-   unsigned int nr_pages;
+   unsigned int nr_pmd_pages;
struct page *page;
 
for (; addr < end; addr = next) {
@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long 
section_nr,
if (pmd_none(*pmd))
continue;
 
-   nr_pages = 1 << (get_order(PMD_SIZE));
+   nr_pmd_pages = 1 << get_order(PMD_SIZE);
page = pmd_page(*pmd);
-   while (nr_pages--)
+   while (nr_pmd_pages--)
get_page_bootmem(section_nr, page++,
 SECTION_INFO);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99d..b2c7045 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2495,7 +2495,7 @@ void vmemmap_populate_print_last(void);
 void vmemmap_free(unsigned long start, unsigned long end);
 #endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
- unsigned long size);
+ unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,

[tip:x86/mm] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages'

2017-10-30 Thread tip-bot for Baoquan He

Commit-ID:  15670bfe19905b1dcbb63137f40d718b59d84479
Gitweb: https://git.kernel.org/tip/15670bfe19905b1dcbb63137f40d718b59d84479
Author: Baoquan He 
AuthorDate: Sat, 28 Oct 2017 09:30:38 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 30 Oct 2017 10:30:23 +0100

x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 
'nr_pages'

register_page_bootmem_memmap()'s 3rd 'size' parameter is named
in a somewhat misleading fashion - rename it to 'nr_pages' which
makes the units of it much clearer.

Meanwhile rename the existing local variable 'nr_pages' to
'nr_pmd_pages', a more expressive name, to avoid conflict with
new function parameter 'nr_pages'.

(Also clean up the unnecessary parentheses in which get_order() is called.)

Signed-off-by: Baoquan He 
Acked-by: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: a...@linux-foundation.org
Link: http://lkml.kernel.org/r/1509154238-23250-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 10 +-
 include/linux/mm.h|  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 048fbe8..adcea90 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node)
 
 #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && 
defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
 void register_page_bootmem_memmap(unsigned long section_nr,
- struct page *start_page, unsigned long size)
+ struct page *start_page, unsigned long 
nr_pages)
 {
unsigned long addr = (unsigned long)start_page;
-   unsigned long end = (unsigned long)(start_page + size);
+   unsigned long end = (unsigned long)(start_page + nr_pages);
unsigned long next;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
-   unsigned int nr_pages;
+   unsigned int nr_pmd_pages;
struct page *page;
 
for (; addr < end; addr = next) {
@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long 
section_nr,
if (pmd_none(*pmd))
continue;
 
-   nr_pages = 1 << (get_order(PMD_SIZE));
+   nr_pmd_pages = 1 << get_order(PMD_SIZE);
page = pmd_page(*pmd);
-   while (nr_pages--)
+   while (nr_pmd_pages--)
get_page_bootmem(section_nr, page++,
 SECTION_INFO);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99d..b2c7045 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2495,7 +2495,7 @@ void vmemmap_populate_print_last(void);
 void vmemmap_free(unsigned long start, unsigned long end);
 #endif
 void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
- unsigned long size);
+ unsigned long nr_pages);
 
 enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,

[tip:x86/boot] efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor

2017-08-17 Thread tip-bot for Baoquan He

Commit-ID:  02e43c2dcd3b3cf7244f6dda65a07e8dacadaf8d
Gitweb: http://git.kernel.org/tip/02e43c2dcd3b3cf7244f6dda65a07e8dacadaf8d
Author: Baoquan He 
AuthorDate: Wed, 16 Aug 2017 21:46:51 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 17 Aug 2017 10:50:57 +0200

efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor

The existing map iteration helper for_each_efi_memory_desc_in_map can
only be used after the kernel initializes the EFI subsystem to set up
struct efi_memory_map.

Before that we also need iterate map descriptors which are stored in several
intermediate structures, like struct efi_boot_memmap for arch independent
usage and struct efi_info for x86 arch only.

Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
replace several places where that primitive is open coded.

Signed-off-by: Baoquan He 
[ Various improvements to the text. ]
Acked-by: Matt Fleming 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: ard.biesheu...@linaro.org
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: linux-...@vger.kernel.org
Cc: n-horigu...@ah.jp.nec.com
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/eboot.c   |  2 +-
 drivers/firmware/efi/libstub/efi-stub-helper.c |  4 ++--
 include/linux/efi.h| 22 ++
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index c3e869e..e007887 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -767,7 +767,7 @@ static efi_status_t setup_e820(struct boot_params *params,
m |= (u64)efi->efi_memmap_hi << 32;
 #endif
 
-   d = (efi_memory_desc_t *)(m + (i * efi->efi_memdesc_size));
+   d = efi_early_memdesc_ptr(m, efi->efi_memdesc_size, i);
switch (d->type) {
case EFI_RESERVED_TYPE:
case EFI_RUNTIME_SERVICES_CODE:
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
b/drivers/firmware/efi/libstub/efi-stub-helper.c
index b018436..50a9cab 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -205,7 +205,7 @@ again:
unsigned long m = (unsigned long)map;
u64 start, end;
 
-   desc = (efi_memory_desc_t *)(m + (i * desc_size));
+   desc = efi_early_memdesc_ptr(m, desc_size, i);
if (desc->type != EFI_CONVENTIONAL_MEMORY)
continue;
 
@@ -298,7 +298,7 @@ efi_status_t efi_low_alloc(efi_system_table_t 
*sys_table_arg,
unsigned long m = (unsigned long)map;
u64 start, end;
 
-   desc = (efi_memory_desc_t *)(m + (i * desc_size));
+   desc = efi_early_memdesc_ptr(m, desc_size, i);
 
if (desc->type != EFI_CONVENTIONAL_MEMORY)
continue;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8269bcb..a686ca9 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1020,6 +1020,28 @@ extern int efi_memattr_init(void);
 extern int efi_memattr_apply_permissions(struct mm_struct *mm,
 efi_memattr_perm_setter fn);
 
+/*
+ * efi_early_memdesc_ptr - get the n-th EFI memmap descriptor
+ * @map: the start of efi memmap
+ * @desc_size: the size of space for each EFI memmap descriptor
+ * @n: the index of efi memmap descriptor
+ *
+ * EFI boot service provides the GetMemoryMap() function to get a copy of the
+ * current memory map which is an array of memory descriptors, each of
+ * which describes a contiguous block of memory. It also gets the size of the
+ * map, and the size of each descriptor, etc.
+ *
+ * Note that per section 6.2 of UEFI Spec 2.6 Errata A, the returned size of
+ * each descriptor might not be equal to sizeof(efi_memory_memdesc_t),
+ * since efi_memory_memdesc_t may be extended in the future. Thus the OS
+ * MUST use the returned size of the descriptor to find the start of each
+ * efi_memory_memdesc_t in the memory map array. This should only be used
+ * during bootup since for_each_efi_memory_desc_xxx() is available after the
+ * kernel initializes the EFI subsystem to set up struct efi_memory_map.
+ */
+#define efi_early_memdesc_ptr(map, desc_size, n)   \
+   (efi_memory_desc_t *)((void *)(map) + ((n) * (desc_size)))
+
 /* Iterate through an efi_memory_map */
 #define for_each_efi_memory_desc_in_map(m, md)\
for ((md) = (m)->map;  \

[tip:x86/boot] efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor

2017-08-17 Thread tip-bot for Baoquan He

Commit-ID:  02e43c2dcd3b3cf7244f6dda65a07e8dacadaf8d
Gitweb: http://git.kernel.org/tip/02e43c2dcd3b3cf7244f6dda65a07e8dacadaf8d
Author: Baoquan He 
AuthorDate: Wed, 16 Aug 2017 21:46:51 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 17 Aug 2017 10:50:57 +0200

efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor

The existing map iteration helper for_each_efi_memory_desc_in_map can
only be used after the kernel initializes the EFI subsystem to set up
struct efi_memory_map.

Before that we also need iterate map descriptors which are stored in several
intermediate structures, like struct efi_boot_memmap for arch independent
usage and struct efi_info for x86 arch only.

Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
replace several places where that primitive is open coded.

Signed-off-by: Baoquan He 
[ Various improvements to the text. ]
Acked-by: Matt Fleming 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: ard.biesheu...@linaro.org
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: linux-...@vger.kernel.org
Cc: n-horigu...@ah.jp.nec.com
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/eboot.c   |  2 +-
 drivers/firmware/efi/libstub/efi-stub-helper.c |  4 ++--
 include/linux/efi.h| 22 ++
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index c3e869e..e007887 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -767,7 +767,7 @@ static efi_status_t setup_e820(struct boot_params *params,
m |= (u64)efi->efi_memmap_hi << 32;
 #endif
 
-   d = (efi_memory_desc_t *)(m + (i * efi->efi_memdesc_size));
+   d = efi_early_memdesc_ptr(m, efi->efi_memdesc_size, i);
switch (d->type) {
case EFI_RESERVED_TYPE:
case EFI_RUNTIME_SERVICES_CODE:
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c 
b/drivers/firmware/efi/libstub/efi-stub-helper.c
index b018436..50a9cab 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -205,7 +205,7 @@ again:
unsigned long m = (unsigned long)map;
u64 start, end;
 
-   desc = (efi_memory_desc_t *)(m + (i * desc_size));
+   desc = efi_early_memdesc_ptr(m, desc_size, i);
if (desc->type != EFI_CONVENTIONAL_MEMORY)
continue;
 
@@ -298,7 +298,7 @@ efi_status_t efi_low_alloc(efi_system_table_t 
*sys_table_arg,
unsigned long m = (unsigned long)map;
u64 start, end;
 
-   desc = (efi_memory_desc_t *)(m + (i * desc_size));
+   desc = efi_early_memdesc_ptr(m, desc_size, i);
 
if (desc->type != EFI_CONVENTIONAL_MEMORY)
continue;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8269bcb..a686ca9 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1020,6 +1020,28 @@ extern int efi_memattr_init(void);
 extern int efi_memattr_apply_permissions(struct mm_struct *mm,
 efi_memattr_perm_setter fn);
 
+/*
+ * efi_early_memdesc_ptr - get the n-th EFI memmap descriptor
+ * @map: the start of efi memmap
+ * @desc_size: the size of space for each EFI memmap descriptor
+ * @n: the index of efi memmap descriptor
+ *
+ * EFI boot service provides the GetMemoryMap() function to get a copy of the
+ * current memory map which is an array of memory descriptors, each of
+ * which describes a contiguous block of memory. It also gets the size of the
+ * map, and the size of each descriptor, etc.
+ *
+ * Note that per section 6.2 of UEFI Spec 2.6 Errata A, the returned size of
+ * each descriptor might not be equal to sizeof(efi_memory_memdesc_t),
+ * since efi_memory_memdesc_t may be extended in the future. Thus the OS
+ * MUST use the returned size of the descriptor to find the start of each
+ * efi_memory_memdesc_t in the memory map array. This should only be used
+ * during bootup since for_each_efi_memory_desc_xxx() is available after the
+ * kernel initializes the EFI subsystem to set up struct efi_memory_map.
+ */
+#define efi_early_memdesc_ptr(map, desc_size, n)   \
+   (efi_memory_desc_t *)((void *)(map) + ((n) * (desc_size)))
+
 /* Iterate through an efi_memory_map */
 #define for_each_efi_memory_desc_in_map(m, md)\
for ((md) = (m)->map;  \

[tip:x86/boot] x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address

2017-08-17 Thread tip-bot for Baoquan He

Commit-ID:  c05cd79750fbe5415cda896bb99350603cc995ed
Gitweb: http://git.kernel.org/tip/c05cd79750fbe5415cda896bb99350603cc995ed
Author: Baoquan He 
AuthorDate: Mon, 14 Aug 2017 22:54:24 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 17 Aug 2017 10:51:35 +0200

x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address

Currently KASLR will parse all e820 entries of RAM type and add all
candidate positions into the slots array. After that we choose one slot
randomly as the new position which the kernel will be decompressed into
and run at.

On systems with EFI enabled, e820 memory regions are coming from EFI
memory regions by combining adjacent regions.

These EFI memory regions have various attributes, and the "mirrored"
attribute is one of them. The physical memory region whose descriptors
in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
mirrored. The address range mirroring feature of the kernel arranges such
mirrored regions into normal zones and other regions into movable zones.

With the mirroring feature enabled, the code and data of the kernel can only
be located in the more reliable mirrored regions. However, the current KASLR
code doesn't check EFI memory entries, and could choose a new kernel position
in non-mirrored regions. This will break the intended functionality of the
address range mirroring feature.

To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
region to process for adding candidate of randomization slot. If EFI is disabled
or no mirrored region found, still process the e820 memory map.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: ard.biesheu...@linaro.org
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: linux-...@vger.kernel.org
Cc: m...@codeblueprint.co.uk
Cc: n-horigu...@ah.jp.nec.com
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-...@redhat.com
[ Rewrote most of the text. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 68 ++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 99c7194f..7de23bb 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -37,7 +37,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 /* Macros used by the included decompressor code below. */
 #define STATIC
@@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector *entry,
}
 }
 
+#ifdef CONFIG_EFI
+/*
+ * Returns true if mirror region found (and must have been processed
+ * for slots adding)
+ */
+static bool
+process_efi_entries(unsigned long minimum, unsigned long image_size)
+{
+   struct efi_info *e = _params->efi_info;
+   bool efi_mirror_found = false;
+   struct mem_vector region;
+   efi_memory_desc_t *md;
+   unsigned long pmap;
+   char *signature;
+   u32 nr_desc;
+   int i;
+
+   signature = (char *)>efi_loader_signature;
+   if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
+   strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
+   return false;
+
+#ifdef CONFIG_X86_32
+   /* Can't handle data above 4GB at this time */
+   if (e->efi_memmap_hi) {
+   warn("EFI memmap is above 4GB, can't be handled now on x86_32. 
EFI should be disabled.\n");
+   return false;
+   }
+   pmap =  e->efi_memmap;
+#else
+   pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
+#endif
+
+   nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
+   for (i = 0; i < nr_desc; i++) {
+   md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
+   if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
+   region.start = md->phys_addr;
+   region.size = md->num_pages << EFI_PAGE_SHIFT;
+   process_mem_region(, minimum, image_size);
+   efi_mirror_found = true;
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted EFI scan (slot_areas 
full)!\n");
+   break;
+   }
+   }
+   }
+
+   return efi_mirror_found;
+}
+#else
+static inline bool
+process_efi_entries(unsigned long minimum, unsigned long image_size)
+{
+   return false;
+}
+#endif
+
 static void process_e820_entries(unsigned long minimum,
 unsigned long image_size)
 {
@@ -586,13 +647,16 @@ static unsigned long find_random_phys_addr(unsigned long 
minimum,
 {
/* Check if we had too many memmaps. */
if

[tip:x86/boot] x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address

2017-08-17 Thread tip-bot for Baoquan He

Commit-ID:  c05cd79750fbe5415cda896bb99350603cc995ed
Gitweb: http://git.kernel.org/tip/c05cd79750fbe5415cda896bb99350603cc995ed
Author: Baoquan He 
AuthorDate: Mon, 14 Aug 2017 22:54:24 +0800
Committer:  Ingo Molnar 
CommitDate: Thu, 17 Aug 2017 10:51:35 +0200

x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address

Currently KASLR will parse all e820 entries of RAM type and add all
candidate positions into the slots array. After that we choose one slot
randomly as the new position which the kernel will be decompressed into
and run at.

On systems with EFI enabled, e820 memory regions are coming from EFI
memory regions by combining adjacent regions.

These EFI memory regions have various attributes, and the "mirrored"
attribute is one of them. The physical memory region whose descriptors
in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
mirrored. The address range mirroring feature of the kernel arranges such
mirrored regions into normal zones and other regions into movable zones.

With the mirroring feature enabled, the code and data of the kernel can only
be located in the more reliable mirrored regions. However, the current KASLR
code doesn't check EFI memory entries, and could choose a new kernel position
in non-mirrored regions. This will break the intended functionality of the
address range mirroring feature.

To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
region to process for adding candidate of randomization slot. If EFI is disabled
or no mirrored region found, still process the e820 memory map.

Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: ard.biesheu...@linaro.org
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: keesc...@chromium.org
Cc: linux-...@vger.kernel.org
Cc: m...@codeblueprint.co.uk
Cc: n-horigu...@ah.jp.nec.com
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-...@redhat.com
[ Rewrote most of the text. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 68 ++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 99c7194f..7de23bb 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -37,7 +37,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 /* Macros used by the included decompressor code below. */
 #define STATIC
@@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector *entry,
}
 }
 
+#ifdef CONFIG_EFI
+/*
+ * Returns true if mirror region found (and must have been processed
+ * for slots adding)
+ */
+static bool
+process_efi_entries(unsigned long minimum, unsigned long image_size)
+{
+   struct efi_info *e = _params->efi_info;
+   bool efi_mirror_found = false;
+   struct mem_vector region;
+   efi_memory_desc_t *md;
+   unsigned long pmap;
+   char *signature;
+   u32 nr_desc;
+   int i;
+
+   signature = (char *)>efi_loader_signature;
+   if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
+   strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
+   return false;
+
+#ifdef CONFIG_X86_32
+   /* Can't handle data above 4GB at this time */
+   if (e->efi_memmap_hi) {
+   warn("EFI memmap is above 4GB, can't be handled now on x86_32. 
EFI should be disabled.\n");
+   return false;
+   }
+   pmap =  e->efi_memmap;
+#else
+   pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
+#endif
+
+   nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
+   for (i = 0; i < nr_desc; i++) {
+   md = efi_early_memdesc_ptr(pmap, e->efi_memdesc_size, i);
+   if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
+   region.start = md->phys_addr;
+   region.size = md->num_pages << EFI_PAGE_SHIFT;
+   process_mem_region(, minimum, image_size);
+   efi_mirror_found = true;
+
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted EFI scan (slot_areas 
full)!\n");
+   break;
+   }
+   }
+   }
+
+   return efi_mirror_found;
+}
+#else
+static inline bool
+process_efi_entries(unsigned long minimum, unsigned long image_size)
+{
+   return false;
+}
+#endif
+
 static void process_e820_entries(unsigned long minimum,
 unsigned long image_size)
 {
@@ -586,13 +647,16 @@ static unsigned long find_random_phys_addr(unsigned long 
minimum,
 {
/* Check if we had too many memmaps. */
if (memmap_too_large) {
-   debug_putstr("Aborted e820 scan (more than 4 memmap= args)!\n");
+   debug_putstr("Aborted memory entries

[tip:x86/boot] x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()

2017-07-18 Thread tip-bot for Baoquan He

Commit-ID:  87891b01b54210763117f0a67b022cd94de6cd13
Gitweb: http://git.kernel.org/tip/87891b01b54210763117f0a67b022cd94de6cd13
Author: Baoquan He 
AuthorDate: Sun, 9 Jul 2017 20:37:40 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Jul 2017 11:11:11 +0200

x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()

This makes process_e820_entry() be able to process any kind of memory
region.

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: m...@codeblueprint.co.uk
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-3-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 1485f48..36ff9f7 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -479,31 +479,31 @@ static unsigned long slots_fetch_random(void)
return 0;
 }
 
-static void process_e820_entry(struct boot_e820_entry *entry,
+static void process_e820_entry(struct mem_vector *entry,
   unsigned long minimum,
   unsigned long image_size)
 {
struct mem_vector region, overlap;
struct slot_area slot_area;
unsigned long start_orig, end;
-   struct boot_e820_entry cur_entry;
+   struct mem_vector cur_entry;
 
/* On 32-bit, ignore entries entirely above our maximum. */
-   if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
+   if (IS_ENABLED(CONFIG_X86_32) && entry->start >= KERNEL_IMAGE_SIZE)
return;
 
/* Ignore entries entirely below our minimum. */
-   if (entry->addr + entry->size < minimum)
+   if (entry->start + entry->size < minimum)
return;
 
/* Ignore entries above memory limit */
-   end = min(entry->size + entry->addr, mem_limit);
-   if (entry->addr >= end)
+   end = min(entry->size + entry->start, mem_limit);
+   if (entry->start >= end)
return;
-   cur_entry.addr = entry->addr;
-   cur_entry.size = end - entry->addr;
+   cur_entry.start = entry->start;
+   cur_entry.size = end - entry->start;
 
-   region.start = cur_entry.addr;
+   region.start = cur_entry.start;
region.size = cur_entry.size;
 
/* Give up if slot area array is full. */
@@ -518,7 +518,7 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
/* Did we raise the address above this e820 region? */
-   if (region.start > cur_entry.addr + cur_entry.size)
+   if (region.start > cur_entry.start + cur_entry.size)
return;
 
/* Reduce size by any delta from the original address. */
@@ -562,6 +562,7 @@ static void process_e820_entries(unsigned long minimum,
 unsigned long image_size)
 {
int i;
+   struct mem_vector region;
struct boot_e820_entry *entry;
 
/* Verify potential e820 positions, appending to slots list. */
@@ -570,7 +571,9 @@ static void process_e820_entries(unsigned long minimum,
/* Skip non-RAM entries. */
if (entry->type != E820_TYPE_RAM)
continue;
-   process_e820_entry(entry, minimum, image_size);
+   region.start = entry->addr;
+   region.size = entry->size;
+   process_e820_entry(, minimum, image_size);
if (slot_area_index == MAX_SLOT_AREA) {
debug_putstr("Aborted e820 scan (slot_areas full)!\n");
break;

[tip:x86/boot] x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()

2017-07-18 Thread tip-bot for Baoquan He

Commit-ID:  87891b01b54210763117f0a67b022cd94de6cd13
Gitweb: http://git.kernel.org/tip/87891b01b54210763117f0a67b022cd94de6cd13
Author: Baoquan He 
AuthorDate: Sun, 9 Jul 2017 20:37:40 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Jul 2017 11:11:11 +0200

x86/boot/KASLR: Switch to pass struct mem_vector to process_e820_entry()

This makes process_e820_entry() be able to process any kind of memory
region.

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: m...@codeblueprint.co.uk
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-3-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 1485f48..36ff9f7 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -479,31 +479,31 @@ static unsigned long slots_fetch_random(void)
return 0;
 }
 
-static void process_e820_entry(struct boot_e820_entry *entry,
+static void process_e820_entry(struct mem_vector *entry,
   unsigned long minimum,
   unsigned long image_size)
 {
struct mem_vector region, overlap;
struct slot_area slot_area;
unsigned long start_orig, end;
-   struct boot_e820_entry cur_entry;
+   struct mem_vector cur_entry;
 
/* On 32-bit, ignore entries entirely above our maximum. */
-   if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
+   if (IS_ENABLED(CONFIG_X86_32) && entry->start >= KERNEL_IMAGE_SIZE)
return;
 
/* Ignore entries entirely below our minimum. */
-   if (entry->addr + entry->size < minimum)
+   if (entry->start + entry->size < minimum)
return;
 
/* Ignore entries above memory limit */
-   end = min(entry->size + entry->addr, mem_limit);
-   if (entry->addr >= end)
+   end = min(entry->size + entry->start, mem_limit);
+   if (entry->start >= end)
return;
-   cur_entry.addr = entry->addr;
-   cur_entry.size = end - entry->addr;
+   cur_entry.start = entry->start;
+   cur_entry.size = end - entry->start;
 
-   region.start = cur_entry.addr;
+   region.start = cur_entry.start;
region.size = cur_entry.size;
 
/* Give up if slot area array is full. */
@@ -518,7 +518,7 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
/* Did we raise the address above this e820 region? */
-   if (region.start > cur_entry.addr + cur_entry.size)
+   if (region.start > cur_entry.start + cur_entry.size)
return;
 
/* Reduce size by any delta from the original address. */
@@ -562,6 +562,7 @@ static void process_e820_entries(unsigned long minimum,
 unsigned long image_size)
 {
int i;
+   struct mem_vector region;
struct boot_e820_entry *entry;
 
/* Verify potential e820 positions, appending to slots list. */
@@ -570,7 +571,9 @@ static void process_e820_entries(unsigned long minimum,
/* Skip non-RAM entries. */
if (entry->type != E820_TYPE_RAM)
continue;
-   process_e820_entry(entry, minimum, image_size);
+   region.start = entry->addr;
+   region.size = entry->size;
+   process_e820_entry(, minimum, image_size);
if (slot_area_index == MAX_SLOT_AREA) {
debug_putstr("Aborted e820 scan (slot_areas full)!\n");
break;

[tip:x86/boot] x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()

2017-07-18 Thread tip-bot for Baoquan He

Commit-ID:  27aac20574110abfd594175a668dc58b23b2b14a
Gitweb: http://git.kernel.org/tip/27aac20574110abfd594175a668dc58b23b2b14a
Author: Baoquan He 
AuthorDate: Sun, 9 Jul 2017 20:37:41 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Jul 2017 11:11:12 +0200

x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()

Now process_e820_entry() is not limited to e820 entry processing, rename
it to process_mem_region(). And adjust the code comment accordingly.

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: m...@codeblueprint.co.uk
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-4-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 36ff9f7..99c7194f 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -479,7 +479,7 @@ static unsigned long slots_fetch_random(void)
return 0;
 }
 
-static void process_e820_entry(struct mem_vector *entry,
+static void process_mem_region(struct mem_vector *entry,
   unsigned long minimum,
   unsigned long image_size)
 {
@@ -517,7 +517,7 @@ static void process_e820_entry(struct mem_vector *entry,
/* Potentially raise address to meet alignment needs. */
region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
-   /* Did we raise the address above this e820 region? */
+   /* Did we raise the address above the passed in memory entry? */
if (region.start > cur_entry.start + cur_entry.size)
return;
 
@@ -573,7 +573,7 @@ static void process_e820_entries(unsigned long minimum,
continue;
region.start = entry->addr;
region.size = entry->size;
-   process_e820_entry(, minimum, image_size);
+   process_mem_region(, minimum, image_size);
if (slot_area_index == MAX_SLOT_AREA) {
debug_putstr("Aborted e820 scan (slot_areas full)!\n");
break;

[tip:x86/boot] x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()

2017-07-18 Thread tip-bot for Baoquan He

Commit-ID:  27aac20574110abfd594175a668dc58b23b2b14a
Gitweb: http://git.kernel.org/tip/27aac20574110abfd594175a668dc58b23b2b14a
Author: Baoquan He 
AuthorDate: Sun, 9 Jul 2017 20:37:41 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Jul 2017 11:11:12 +0200

x86/boot/KASLR: Rename process_e820_entry() into process_mem_region()

Now process_e820_entry() is not limited to e820 entry processing, rename
it to process_mem_region(). And adjust the code comment accordingly.

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: m...@codeblueprint.co.uk
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-4-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 36ff9f7..99c7194f 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -479,7 +479,7 @@ static unsigned long slots_fetch_random(void)
return 0;
 }
 
-static void process_e820_entry(struct mem_vector *entry,
+static void process_mem_region(struct mem_vector *entry,
   unsigned long minimum,
   unsigned long image_size)
 {
@@ -517,7 +517,7 @@ static void process_e820_entry(struct mem_vector *entry,
/* Potentially raise address to meet alignment needs. */
region.start = ALIGN(region.start, CONFIG_PHYSICAL_ALIGN);
 
-   /* Did we raise the address above this e820 region? */
+   /* Did we raise the address above the passed in memory entry? */
if (region.start > cur_entry.start + cur_entry.size)
return;
 
@@ -573,7 +573,7 @@ static void process_e820_entries(unsigned long minimum,
continue;
region.start = entry->addr;
region.size = entry->size;
-   process_e820_entry(, minimum, image_size);
+   process_mem_region(, minimum, image_size);
if (slot_area_index == MAX_SLOT_AREA) {
debug_putstr("Aborted e820 scan (slot_areas full)!\n");
break;

[tip:x86/boot] x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()

2017-07-18 Thread tip-bot for Baoquan He

Commit-ID:  f62995c92a29e4d9331382b8b2461eef3b9c7c6b
Gitweb: http://git.kernel.org/tip/f62995c92a29e4d9331382b8b2461eef3b9c7c6b
Author: Baoquan He 
AuthorDate: Sun, 9 Jul 2017 20:37:39 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Jul 2017 11:11:11 +0200

x86/boot/KASLR: Wrap e820 entries walking code into new function 
process_e820_entries()

The original function process_e820_entry() only takes care of each
e820 entry passed.

And move the E820_TYPE_RAM checking logic into process_e820_entries().

And remove the redundent local variable 'addr' definition in
find_random_phys_addr().

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: m...@codeblueprint.co.uk
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 91f27ab..1485f48 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -488,10 +488,6 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
unsigned long start_orig, end;
struct boot_e820_entry cur_entry;
 
-   /* Skip non-RAM entries. */
-   if (entry->type != E820_TYPE_RAM)
-   return;
-
/* On 32-bit, ignore entries entirely above our maximum. */
if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
return;
@@ -562,12 +558,29 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
}
 }
 
-static unsigned long find_random_phys_addr(unsigned long minimum,
-  unsigned long image_size)
+static void process_e820_entries(unsigned long minimum,
+unsigned long image_size)
 {
int i;
-   unsigned long addr;
+   struct boot_e820_entry *entry;
+
+   /* Verify potential e820 positions, appending to slots list. */
+   for (i = 0; i < boot_params->e820_entries; i++) {
+   entry = _params->e820_table[i];
+   /* Skip non-RAM entries. */
+   if (entry->type != E820_TYPE_RAM)
+   continue;
+   process_e820_entry(entry, minimum, image_size);
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted e820 scan (slot_areas full)!\n");
+   break;
+   }
+   }
+}
 
+static unsigned long find_random_phys_addr(unsigned long minimum,
+  unsigned long image_size)
+{
/* Check if we had too many memmaps. */
if (memmap_too_large) {
debug_putstr("Aborted e820 scan (more than 4 memmap= args)!\n");
@@ -577,16 +590,7 @@ static unsigned long find_random_phys_addr(unsigned long 
minimum,
/* Make sure minimum is aligned. */
minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
 
-   /* Verify potential e820 positions, appending to slots list. */
-   for (i = 0; i < boot_params->e820_entries; i++) {
-   process_e820_entry(_params->e820_table[i], minimum,
-  image_size);
-   if (slot_area_index == MAX_SLOT_AREA) {
-   debug_putstr("Aborted e820 scan (slot_areas full)!\n");
-   break;
-   }
-   }
-
+   process_e820_entries(minimum, image_size);
return slots_fetch_random();
 }

[tip:x86/boot] x86/boot/KASLR: Wrap e820 entries walking code into new function process_e820_entries()

2017-07-18 Thread tip-bot for Baoquan He

Commit-ID:  f62995c92a29e4d9331382b8b2461eef3b9c7c6b
Gitweb: http://git.kernel.org/tip/f62995c92a29e4d9331382b8b2461eef3b9c7c6b
Author: Baoquan He 
AuthorDate: Sun, 9 Jul 2017 20:37:39 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Jul 2017 11:11:11 +0200

x86/boot/KASLR: Wrap e820 entries walking code into new function 
process_e820_entries()

The original function process_e820_entry() only takes care of each
e820 entry passed.

And move the E820_TYPE_RAM checking logic into process_e820_entries().

And remove the redundent local variable 'addr' definition in
find_random_phys_addr().

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: fanc.f...@cn.fujitsu.com
Cc: izumi.t...@jp.fujitsu.com
Cc: m...@codeblueprint.co.uk
Cc: thgar...@google.com
Link: http://lkml.kernel.org/r/1499603862-11516-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 91f27ab..1485f48 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -488,10 +488,6 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
unsigned long start_orig, end;
struct boot_e820_entry cur_entry;
 
-   /* Skip non-RAM entries. */
-   if (entry->type != E820_TYPE_RAM)
-   return;
-
/* On 32-bit, ignore entries entirely above our maximum. */
if (IS_ENABLED(CONFIG_X86_32) && entry->addr >= KERNEL_IMAGE_SIZE)
return;
@@ -562,12 +558,29 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
}
 }
 
-static unsigned long find_random_phys_addr(unsigned long minimum,
-  unsigned long image_size)
+static void process_e820_entries(unsigned long minimum,
+unsigned long image_size)
 {
int i;
-   unsigned long addr;
+   struct boot_e820_entry *entry;
+
+   /* Verify potential e820 positions, appending to slots list. */
+   for (i = 0; i < boot_params->e820_entries; i++) {
+   entry = _params->e820_table[i];
+   /* Skip non-RAM entries. */
+   if (entry->type != E820_TYPE_RAM)
+   continue;
+   process_e820_entry(entry, minimum, image_size);
+   if (slot_area_index == MAX_SLOT_AREA) {
+   debug_putstr("Aborted e820 scan (slot_areas full)!\n");
+   break;
+   }
+   }
+}
 
+static unsigned long find_random_phys_addr(unsigned long minimum,
+  unsigned long image_size)
+{
/* Check if we had too many memmaps. */
if (memmap_too_large) {
debug_putstr("Aborted e820 scan (more than 4 memmap= args)!\n");
@@ -577,16 +590,7 @@ static unsigned long find_random_phys_addr(unsigned long 
minimum,
/* Make sure minimum is aligned. */
minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
 
-   /* Verify potential e820 positions, appending to slots list. */
-   for (i = 0; i < boot_params->e820_entries; i++) {
-   process_e820_entry(_params->e820_table[i], minimum,
-  image_size);
-   if (slot_area_index == MAX_SLOT_AREA) {
-   debug_putstr("Aborted e820 scan (slot_areas full)!\n");
-   break;
-   }
-   }
-
+   process_e820_entries(minimum, image_size);
return slots_fetch_random();
 }

[tip:x86/urgent] x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization

2017-06-30 Thread tip-bot for Baoquan He

Commit-ID:  b892cb873ced2af57dc5a018557d128c53ed6ae0
Gitweb: http://git.kernel.org/tip/b892cb873ced2af57dc5a018557d128c53ed6ae0
Author: Baoquan He 
AuthorDate: Tue, 27 Jun 2017 20:39:05 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Jun 2017 08:53:14 +0200

x86/boot/KASLR: Add checking for the offset of kernel virtual address 
randomization

For kernel text KASLR, the virtual address is confined to area of 1G,
[0x8000, 0xc000). For the implemenataion of
virtual address randomization, we only randomize to get an offset
between 16M and 1G, then add this offset to the starting address,
0x8000. Here 16M is the offset which is decided at linking
stage. So the amount of the local variable 'virt_addr' which respresents
the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.

Add a debug check for the offset. If out of bounds, print error
message and hang there.

Suggested-by: Ingo Molnar 
Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/misc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b3c5a5f0..6008fa9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -390,6 +390,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
 #ifdef CONFIG_X86_64
if (heap > 0x3fffUL)
error("Destination address too large");
+   if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
+   error("Destination virtual address is beyond the kernel mapping 
area");
 #else
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fff))
error("Destination address too large");

[tip:x86/urgent] x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug

2017-06-30 Thread tip-bot for Baoquan He

Commit-ID:  8eabf42ae5237e6b699aeac687b5b629e3537c8d
Gitweb: http://git.kernel.org/tip/8eabf42ae5237e6b699aeac687b5b629e3537c8d
Author: Baoquan He 
AuthorDate: Tue, 27 Jun 2017 20:39:06 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Jun 2017 08:53:14 +0200

x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug

Kernel text KASLR is separated into physical address and virtual
address randomization. And for virtual address randomization, we
only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
but not the original kernel loading address 'output'.

The bug will cause kernel boot failure if kernel is loaded at a different
position than the address, 16M, which is decided at compiled time.
Kexec/kdump is such practical case.

To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
value.

Tested-by: Dave Young 
Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately")
Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 3 ---
 arch/x86/boot/compressed/misc.c  | 4 ++--
 arch/x86/boot/compressed/misc.h  | 2 --
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54c24f0..56a7e92 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -564,9 +564,6 @@ void choose_random_location(unsigned long input,
 {
unsigned long random_addr, min_addr;
 
-   /* By default, keep output position unchanged. */
-   *virt_addr = *output;
-
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
return;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6008fa9..00241c8 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -338,7 +338,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
  unsigned long output_len)
 {
const unsigned long kernel_total_size = VO__end - VO__text;
-   unsigned long virt_addr = (unsigned long)output;
+   unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
 
/* Retain x86 boot parameters pointer passed from startup_32/64. */
boot_params = rmode;
@@ -399,7 +399,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
 #ifndef CONFIG_RELOCATABLE
if ((unsigned long)output != LOAD_PHYSICAL_ADDR)
error("Destination address does not match LOAD_PHYSICAL_ADDR");
-   if ((unsigned long)output != virt_addr)
+   if (virt_addr != LOAD_PHYSICAL_ADDR)
error("Destination virtual address changed when not 
relocatable");
 #endif
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 1c8355e..766a521 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -81,8 +81,6 @@ static inline void choose_random_location(unsigned long input,
  unsigned long output_size,
  unsigned long *virt_addr)
 {
-   /* No change from existing output location. */
-   *virt_addr = *output;
 }
 #endif

[tip:x86/urgent] x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization

2017-06-30 Thread tip-bot for Baoquan He

Commit-ID:  b892cb873ced2af57dc5a018557d128c53ed6ae0
Gitweb: http://git.kernel.org/tip/b892cb873ced2af57dc5a018557d128c53ed6ae0
Author: Baoquan He 
AuthorDate: Tue, 27 Jun 2017 20:39:05 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Jun 2017 08:53:14 +0200

x86/boot/KASLR: Add checking for the offset of kernel virtual address 
randomization

For kernel text KASLR, the virtual address is confined to area of 1G,
[0x8000, 0xc000). For the implemenataion of
virtual address randomization, we only randomize to get an offset
between 16M and 1G, then add this offset to the starting address,
0x8000. Here 16M is the offset which is decided at linking
stage. So the amount of the local variable 'virt_addr' which respresents
the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.

Add a debug check for the offset. If out of bounds, print error
message and hang there.

Suggested-by: Ingo Molnar 
Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/misc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index b3c5a5f0..6008fa9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -390,6 +390,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
 #ifdef CONFIG_X86_64
if (heap > 0x3fffUL)
error("Destination address too large");
+   if (virt_addr + max(output_len, kernel_total_size) > KERNEL_IMAGE_SIZE)
+   error("Destination virtual address is beyond the kernel mapping 
area");
 #else
if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fff))
error("Destination address too large");

[tip:x86/urgent] x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug

2017-06-30 Thread tip-bot for Baoquan He

Commit-ID:  8eabf42ae5237e6b699aeac687b5b629e3537c8d
Gitweb: http://git.kernel.org/tip/8eabf42ae5237e6b699aeac687b5b629e3537c8d
Author: Baoquan He 
AuthorDate: Tue, 27 Jun 2017 20:39:06 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Jun 2017 08:53:14 +0200

x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug

Kernel text KASLR is separated into physical address and virtual
address randomization. And for virtual address randomization, we
only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
but not the original kernel loading address 'output'.

The bug will cause kernel boot failure if kernel is loaded at a different
position than the address, 16M, which is decided at compiled time.
Kexec/kdump is such practical case.

To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
value.

Tested-by: Dave Young 
Signed-off-by: Baoquan He 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Fixes: 8391c73 ("x86/KASLR: Randomize virtual address separately")
Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 3 ---
 arch/x86/boot/compressed/misc.c  | 4 ++--
 arch/x86/boot/compressed/misc.h  | 2 --
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54c24f0..56a7e92 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -564,9 +564,6 @@ void choose_random_location(unsigned long input,
 {
unsigned long random_addr, min_addr;
 
-   /* By default, keep output position unchanged. */
-   *virt_addr = *output;
-
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
return;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6008fa9..00241c8 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -338,7 +338,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
  unsigned long output_len)
 {
const unsigned long kernel_total_size = VO__end - VO__text;
-   unsigned long virt_addr = (unsigned long)output;
+   unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
 
/* Retain x86 boot parameters pointer passed from startup_32/64. */
boot_params = rmode;
@@ -399,7 +399,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
 #ifndef CONFIG_RELOCATABLE
if ((unsigned long)output != LOAD_PHYSICAL_ADDR)
error("Destination address does not match LOAD_PHYSICAL_ADDR");
-   if ((unsigned long)output != virt_addr)
+   if (virt_addr != LOAD_PHYSICAL_ADDR)
error("Destination virtual address changed when not 
relocatable");
 #endif
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 1c8355e..766a521 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -81,8 +81,6 @@ static inline void choose_random_location(unsigned long input,
  unsigned long output_size,
  unsigned long *virt_addr)
 {
-   /* No change from existing output location. */
-   *virt_addr = *output;
 }
 #endif

[tip:efi/urgent] x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled

2017-05-28 Thread tip-bot for Baoquan He

Commit-ID:  94133e46a0f5ca3f138479806104ab4a8cb0455e
Gitweb: http://git.kernel.org/tip/94133e46a0f5ca3f138479806104ab4a8cb0455e
Author: Baoquan He 
AuthorDate: Fri, 26 May 2017 12:36:50 +0100
Committer:  Ingo Molnar 
CommitDate: Sun, 28 May 2017 11:06:16 +0200

x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled

For EFI with the 'efi=old_map' kernel option specified, the kernel will panic
when KASLR is enabled:

  BUG: unable to handle kernel paging request at 7febd57e
  IP: 0x7febd57e
  PGD 1025a067
  PUD 0

  Oops: 0010 [#1] SMP
  Call Trace:
   efi_enter_virtual_mode()
   start_kernel()
   x86_64_start_reservations()
   x86_64_start_kernel()
   start_cpu()

The root cause is that the identity mapping is not built correctly
in the 'efi=old_map' case.

On 'nokaslr' kernels, PAGE_OFFSET is 0x8800 which is PGDIR_SIZE
aligned. We can borrow the PUD table from the direct mappings safely. Given a
physical address X, we have pud_index(X) == pud_index(__va(X)).

However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical
address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry
from direct mapping to build identity mapping, instead we need to copy the
PUD entries one by one from the direct mapping.

Fix it.

Signed-off-by: Baoquan He 
Signed-off-by: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Young 
Cc: Frank Ramsay 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Russ Anderson 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-5-m...@codeblueprint.co.uk
[ Fixed and reworded the changelog and code comments to be more readable. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 79 +-
 1 file changed, 71 insertions(+), 8 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c488625..eb8dff1 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -71,11 +71,13 @@ static void __init early_code_mapping_set_exec(int 
executable)
 
 pgd_t * __init efi_call_phys_prolog(void)
 {
-   unsigned long vaddress;
-   pgd_t *save_pgd;
+   unsigned long vaddr, addr_pgd, addr_p4d, addr_pud;
+   pgd_t *save_pgd, *pgd_k, *pgd_efi;
+   p4d_t *p4d, *p4d_k, *p4d_efi;
+   pud_t *pud;
 
int pgd;
-   int n_pgds;
+   int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
save_pgd = (pgd_t *)read_cr3();
@@ -88,10 +90,49 @@ pgd_t * __init efi_call_phys_prolog(void)
n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
save_pgd = kmalloc_array(n_pgds, sizeof(*save_pgd), GFP_KERNEL);
 
+   /*
+* Build 1:1 identity mapping for efi=old_map usage. Note that
+* PAGE_OFFSET is PGDIR_SIZE aligned when KASLR is disabled, while
+* it is PUD_SIZE ALIGNED with KASLR enabled. So for a given physical
+* address X, the pud_index(X) != pud_index(__va(X)), we can only copy
+* PUD entry of __va(X) to fill in pud entry of X to build 1:1 mapping.
+* This means here we can only reuse the PMD tables of the direct 
mapping.
+*/
for (pgd = 0; pgd < n_pgds; pgd++) {
-   save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
-   vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
-   set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), 
*pgd_offset_k(vaddress));
+   addr_pgd = (unsigned long)(pgd * PGDIR_SIZE);
+   vaddr = (unsigned long)__va(pgd * PGDIR_SIZE);
+   pgd_efi = pgd_offset_k(addr_pgd);
+   save_pgd[pgd] = *pgd_efi;
+
+   p4d = p4d_alloc(_mm, pgd_efi, addr_pgd);
+   if (!p4d) {
+   pr_err("Failed to allocate p4d table!\n");
+   goto out;
+   }
+
+   for (i = 0; i < PTRS_PER_P4D; i++) {
+   addr_p4d = addr_pgd + i * P4D_SIZE;
+   p4d_efi = p4d + p4d_index(addr_p4d);
+
+   pud = pud_alloc(_mm, p4d_efi, addr_p4d);
+   if (!pud) {
+   pr_err("Failed to allocate pud table!\n");
+   goto out;
+   }
+
+   for (j = 0; j < PTRS_PER_PUD; j++) {
+   addr_pud = addr_p4d + j * PUD_SIZE;
+
+   if (addr_pud > (max_pfn << PAGE_SHIFT))
+

[tip:efi/urgent] x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled

2017-05-28 Thread tip-bot for Baoquan He

Commit-ID:  94133e46a0f5ca3f138479806104ab4a8cb0455e
Gitweb: http://git.kernel.org/tip/94133e46a0f5ca3f138479806104ab4a8cb0455e
Author: Baoquan He 
AuthorDate: Fri, 26 May 2017 12:36:50 +0100
Committer:  Ingo Molnar 
CommitDate: Sun, 28 May 2017 11:06:16 +0200

x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled

For EFI with the 'efi=old_map' kernel option specified, the kernel will panic
when KASLR is enabled:

  BUG: unable to handle kernel paging request at 7febd57e
  IP: 0x7febd57e
  PGD 1025a067
  PUD 0

  Oops: 0010 [#1] SMP
  Call Trace:
   efi_enter_virtual_mode()
   start_kernel()
   x86_64_start_reservations()
   x86_64_start_kernel()
   start_cpu()

The root cause is that the identity mapping is not built correctly
in the 'efi=old_map' case.

On 'nokaslr' kernels, PAGE_OFFSET is 0x8800 which is PGDIR_SIZE
aligned. We can borrow the PUD table from the direct mappings safely. Given a
physical address X, we have pud_index(X) == pud_index(__va(X)).

However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical
address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry
from direct mapping to build identity mapping, instead we need to copy the
PUD entries one by one from the direct mapping.

Fix it.

Signed-off-by: Baoquan He 
Signed-off-by: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Young 
Cc: Frank Ramsay 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Russ Anderson 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20170526113652.21339-5-m...@codeblueprint.co.uk
[ Fixed and reworded the changelog and code comments to be more readable. ]
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 79 +-
 1 file changed, 71 insertions(+), 8 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c488625..eb8dff1 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -71,11 +71,13 @@ static void __init early_code_mapping_set_exec(int 
executable)
 
 pgd_t * __init efi_call_phys_prolog(void)
 {
-   unsigned long vaddress;
-   pgd_t *save_pgd;
+   unsigned long vaddr, addr_pgd, addr_p4d, addr_pud;
+   pgd_t *save_pgd, *pgd_k, *pgd_efi;
+   p4d_t *p4d, *p4d_k, *p4d_efi;
+   pud_t *pud;
 
int pgd;
-   int n_pgds;
+   int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
save_pgd = (pgd_t *)read_cr3();
@@ -88,10 +90,49 @@ pgd_t * __init efi_call_phys_prolog(void)
n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
save_pgd = kmalloc_array(n_pgds, sizeof(*save_pgd), GFP_KERNEL);
 
+   /*
+* Build 1:1 identity mapping for efi=old_map usage. Note that
+* PAGE_OFFSET is PGDIR_SIZE aligned when KASLR is disabled, while
+* it is PUD_SIZE ALIGNED with KASLR enabled. So for a given physical
+* address X, the pud_index(X) != pud_index(__va(X)), we can only copy
+* PUD entry of __va(X) to fill in pud entry of X to build 1:1 mapping.
+* This means here we can only reuse the PMD tables of the direct 
mapping.
+*/
for (pgd = 0; pgd < n_pgds; pgd++) {
-   save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
-   vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
-   set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), 
*pgd_offset_k(vaddress));
+   addr_pgd = (unsigned long)(pgd * PGDIR_SIZE);
+   vaddr = (unsigned long)__va(pgd * PGDIR_SIZE);
+   pgd_efi = pgd_offset_k(addr_pgd);
+   save_pgd[pgd] = *pgd_efi;
+
+   p4d = p4d_alloc(_mm, pgd_efi, addr_pgd);
+   if (!p4d) {
+   pr_err("Failed to allocate p4d table!\n");
+   goto out;
+   }
+
+   for (i = 0; i < PTRS_PER_P4D; i++) {
+   addr_p4d = addr_pgd + i * P4D_SIZE;
+   p4d_efi = p4d + p4d_index(addr_p4d);
+
+   pud = pud_alloc(_mm, p4d_efi, addr_p4d);
+   if (!pud) {
+   pr_err("Failed to allocate pud table!\n");
+   goto out;
+   }
+
+   for (j = 0; j < PTRS_PER_PUD; j++) {
+   addr_pud = addr_p4d + j * PUD_SIZE;
+
+   if (addr_pud > (max_pfn << PAGE_SHIFT))
+   break;
+
+   vaddr = (unsigned long)__va(addr_pud);
+
+   pgd_k = pgd_offset_k(vaddr);
+   p4d_k = p4d_offset(pgd_k, vaddr);
+   pud[j] = *pud_offset(p4d_k, vaddr);
+   }
+   }

[tip:x86/boot] Documentation/kernel-parameters.txt: Update 'memmap=' boot option description

2017-05-24 Thread tip-bot for Baoquan He

Commit-ID:  8fcc9bc3eaa2ef8345e2b4b22e3a88804ac46337
Gitweb: http://git.kernel.org/tip/8fcc9bc3eaa2ef8345e2b4b22e3a88804ac46337
Author: Baoquan He 
AuthorDate: Sat, 13 May 2017 13:46:30 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 09:50:27 +0200

Documentation/kernel-parameters.txt: Update 'memmap=' boot option description

In commit:

  9710f581bb4c ("x86, mm: Let "memmap=" take more entries one time")

... 'memmap=' was changed to adopt multiple, comma delimited values in a
single entry, so update the related description.

In the special case of only specifying size value without an offset,
like memmap=nn[KMG], memmap behaves similarly to mem=nn[KMG], so update
it too here.

Furthermore, for memmap=nn[KMG]$ss[KMG], an escape character needs be added
before '$' for some bootloaders. E.g in grub2, if we specify memmap=100M$5G
as suggested by the documentation, "memmap=100MG" gets passed to the kernel.

Clarify all this.

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Cc: m.miz...@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-4-git-send-email-...@redhat.com
[ Various spelling fixes. ]
Signed-off-by: Ingo Molnar 
---
 Documentation/admin-guide/kernel-parameters.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 15f79c2..4e4c340 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2127,6 +2127,12 @@
memmap=nn[KMG]@ss[KMG]
[KNL] Force usage of a specific region of memory.
Region of memory to be used is from ss to ss+nn.
+   If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG],
+   which limits max address to nn[KMG].
+   Multiple different regions can be specified,
+   comma delimited.
+   Example:
+   memmap=100M@2G,100M#3G,1G!1024G
 
memmap=nn[KMG]#ss[KMG]
[KNL,ACPI] Mark specific memory as ACPI data.
@@ -2139,6 +2145,9 @@
 memmap=64K$0x1869
 or
 memmap=0x1$0x1869
+   Some bootloaders may need an escape character before 
'$',
+   like Grub2, otherwise '$' and the following number
+   will be eaten.
 
memmap=nn[KMG]!ss[KMG]
[KNL,X86] Mark specific memory as protected.

[tip:x86/boot] Documentation/kernel-parameters.txt: Update 'memmap=' boot option description

2017-05-24 Thread tip-bot for Baoquan He

Commit-ID:  8fcc9bc3eaa2ef8345e2b4b22e3a88804ac46337
Gitweb: http://git.kernel.org/tip/8fcc9bc3eaa2ef8345e2b4b22e3a88804ac46337
Author: Baoquan He 
AuthorDate: Sat, 13 May 2017 13:46:30 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 09:50:27 +0200

Documentation/kernel-parameters.txt: Update 'memmap=' boot option description

In commit:

  9710f581bb4c ("x86, mm: Let "memmap=" take more entries one time")

... 'memmap=' was changed to adopt multiple, comma delimited values in a
single entry, so update the related description.

In the special case of only specifying size value without an offset,
like memmap=nn[KMG], memmap behaves similarly to mem=nn[KMG], so update
it too here.

Furthermore, for memmap=nn[KMG]$ss[KMG], an escape character needs be added
before '$' for some bootloaders. E.g in grub2, if we specify memmap=100M$5G
as suggested by the documentation, "memmap=100MG" gets passed to the kernel.

Clarify all this.

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Cc: m.miz...@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-4-git-send-email-...@redhat.com
[ Various spelling fixes. ]
Signed-off-by: Ingo Molnar 
---
 Documentation/admin-guide/kernel-parameters.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 15f79c2..4e4c340 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2127,6 +2127,12 @@
memmap=nn[KMG]@ss[KMG]
[KNL] Force usage of a specific region of memory.
Region of memory to be used is from ss to ss+nn.
+   If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG],
+   which limits max address to nn[KMG].
+   Multiple different regions can be specified,
+   comma delimited.
+   Example:
+   memmap=100M@2G,100M#3G,1G!1024G
 
memmap=nn[KMG]#ss[KMG]
[KNL,ACPI] Mark specific memory as ACPI data.
@@ -2139,6 +2145,9 @@
 memmap=64K$0x1869
 or
 memmap=0x1$0x1869
+   Some bootloaders may need an escape character before 
'$',
+   like Grub2, otherwise '$' and the following number
+   will be eaten.
 
memmap=nn[KMG]!ss[KMG]
[KNL,X86] Mark specific memory as protected.

[tip:x86/boot] x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options

2017-05-24 Thread tip-bot for Baoquan He

Commit-ID:  4cdba14f84c9102c4434384731cd61018b970d59
Gitweb: http://git.kernel.org/tip/4cdba14f84c9102c4434384731cd61018b970d59
Author: Baoquan He 
AuthorDate: Sat, 13 May 2017 13:46:29 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 09:50:27 +0200

x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot 
options

The 'mem=' boot option limits the max address a system can use - any memory
region above the limit will be removed.

Furthermore, the 'memmap=nn[KMG]' variant (with no offset specified) has the 
same
behaviour as 'mem='.

KASLR needs to consider this when choosing the random position for
decompressing the kernel. Do it.

Tested-by: Masayoshi Mizuma 
Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Link: http://lkml.kernel.org/r/1494654390-23861-3-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 68 +---
 1 file changed, 50 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 106e13b..e0eba12 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -88,6 +88,10 @@ struct mem_vector {
 static bool memmap_too_large;
 
 
+/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
+unsigned long long mem_limit = ULLONG_MAX;
+
+
 enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
MEM_AVOID_INITRD,
@@ -138,16 +142,23 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;
 
switch (*p) {
-   case '@':
-   /* Skip this region, usable */
-   *start = 0;
-   *size = 0;
-   return 0;
case '#':
case '$':
case '!':
*start = memparse(p + 1, );
return 0;
+   case '@':
+   /* memmap=nn@ss specifies usable region, should be skipped */
+   *size = 0;
+   /* Fall through */
+   default:
+   /*
+* If w/o offset, only size specified, memmap=nn[KMG] has the
+* same behaviour as mem=nn[KMG]. It limits the max address
+* system can use. Region above the limit should be avoided.
+*/
+   *start = 0;
+   return 0;
}
 
return -EINVAL;
@@ -173,9 +184,14 @@ static void mem_avoid_memmap(char *str)
if (rc < 0)
break;
str = k;
-   /* A usable region that should not be skipped */
-   if (size == 0)
+
+   if (start == 0) {
+   /* Store the specified memory limit if size > 0 */
+   if (size > 0)
+   mem_limit = size;
+
continue;
+   }
 
mem_avoid[MEM_AVOID_MEMMAP_BEGIN + i].start = start;
mem_avoid[MEM_AVOID_MEMMAP_BEGIN + i].size = size;
@@ -187,19 +203,15 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }
 
-
-/*
- * handle_mem_memmap will also cover 'mem=' issue in next patch. Will remove
- * this note later.
- */
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
char *tmp_cmdline;
char *param, *val;
+   u64 mem_size;
 
-   if (!strstr(args, "memmap="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
return 0;
 
tmp_cmdline = malloc(len + 1);
@@ -222,8 +234,20 @@ static int handle_mem_memmap(void)
return -1;
}
 
-   if (!strcmp(param, "memmap"))
+   if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (!strcmp(param, "mem")) {
+   char *p = val;
+
+   if (!strcmp(p, "nopentium"))
+   continue;
+   mem_size = memparse(p, );
+   if (mem_size == 0) {
+   free(tmp_cmdline);
+   return -EINVAL;
+   }
+   mem_limit = mem_size;
+   }
}
 
free(tmp_cmdline);
@@ -460,7 +484,8 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
 {
struct mem_vector region, overlap;
struct slot_area slot_area;
-   unsigned long start_orig;
+   unsigned long start_orig, end;
+

[tip:x86/boot] x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot options

2017-05-24 Thread tip-bot for Baoquan He

Commit-ID:  4cdba14f84c9102c4434384731cd61018b970d59
Gitweb: http://git.kernel.org/tip/4cdba14f84c9102c4434384731cd61018b970d59
Author: Baoquan He 
AuthorDate: Sat, 13 May 2017 13:46:29 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 09:50:27 +0200

x86/KASLR: Handle the memory limit specified by the 'memmap=' and 'mem=' boot 
options

The 'mem=' boot option limits the max address a system can use - any memory
region above the limit will be removed.

Furthermore, the 'memmap=nn[KMG]' variant (with no offset specified) has the 
same
behaviour as 'mem='.

KASLR needs to consider this when choosing the random position for
decompressing the kernel. Do it.

Tested-by: Masayoshi Mizuma 
Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Link: http://lkml.kernel.org/r/1494654390-23861-3-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 68 +---
 1 file changed, 50 insertions(+), 18 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 106e13b..e0eba12 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -88,6 +88,10 @@ struct mem_vector {
 static bool memmap_too_large;
 
 
+/* Store memory limit specified by "mem=nn[KMG]" or "memmap=nn[KMG]" */
+unsigned long long mem_limit = ULLONG_MAX;
+
+
 enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
MEM_AVOID_INITRD,
@@ -138,16 +142,23 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;
 
switch (*p) {
-   case '@':
-   /* Skip this region, usable */
-   *start = 0;
-   *size = 0;
-   return 0;
case '#':
case '$':
case '!':
*start = memparse(p + 1, );
return 0;
+   case '@':
+   /* memmap=nn@ss specifies usable region, should be skipped */
+   *size = 0;
+   /* Fall through */
+   default:
+   /*
+* If w/o offset, only size specified, memmap=nn[KMG] has the
+* same behaviour as mem=nn[KMG]. It limits the max address
+* system can use. Region above the limit should be avoided.
+*/
+   *start = 0;
+   return 0;
}
 
return -EINVAL;
@@ -173,9 +184,14 @@ static void mem_avoid_memmap(char *str)
if (rc < 0)
break;
str = k;
-   /* A usable region that should not be skipped */
-   if (size == 0)
+
+   if (start == 0) {
+   /* Store the specified memory limit if size > 0 */
+   if (size > 0)
+   mem_limit = size;
+
continue;
+   }
 
mem_avoid[MEM_AVOID_MEMMAP_BEGIN + i].start = start;
mem_avoid[MEM_AVOID_MEMMAP_BEGIN + i].size = size;
@@ -187,19 +203,15 @@ static void mem_avoid_memmap(char *str)
memmap_too_large = true;
 }
 
-
-/*
- * handle_mem_memmap will also cover 'mem=' issue in next patch. Will remove
- * this note later.
- */
 static int handle_mem_memmap(void)
 {
char *args = (char *)get_cmd_line_ptr();
size_t len = strlen((char *)args);
char *tmp_cmdline;
char *param, *val;
+   u64 mem_size;
 
-   if (!strstr(args, "memmap="))
+   if (!strstr(args, "memmap=") && !strstr(args, "mem="))
return 0;
 
tmp_cmdline = malloc(len + 1);
@@ -222,8 +234,20 @@ static int handle_mem_memmap(void)
return -1;
}
 
-   if (!strcmp(param, "memmap"))
+   if (!strcmp(param, "memmap")) {
mem_avoid_memmap(val);
+   } else if (!strcmp(param, "mem")) {
+   char *p = val;
+
+   if (!strcmp(p, "nopentium"))
+   continue;
+   mem_size = memparse(p, );
+   if (mem_size == 0) {
+   free(tmp_cmdline);
+   return -EINVAL;
+   }
+   mem_limit = mem_size;
+   }
}
 
free(tmp_cmdline);
@@ -460,7 +484,8 @@ static void process_e820_entry(struct boot_e820_entry 
*entry,
 {
struct mem_vector region, overlap;
struct slot_area slot_area;
-   unsigned long start_orig;
+   unsigned long start_orig, end;
+   struct boot_e820_entry cur_entry;
 
/* Skip non-RAM entries. */
if (entry->type != E820_TYPE_RAM)
@@ -474,8 +499,15 @@ static void process_e820_entry(struct boot_e820_entry

[tip:x86/boot] x86/KASLR: Parse all 'memmap=' boot option entries

2017-05-24 Thread tip-bot for Baoquan He

Commit-ID:  d52e7d5a952c5e35783f96e8c5b7fcffbb0d7c60
Gitweb: http://git.kernel.org/tip/d52e7d5a952c5e35783f96e8c5b7fcffbb0d7c60
Author: Baoquan He 
AuthorDate: Sat, 13 May 2017 13:46:28 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 09:50:27 +0200

x86/KASLR: Parse all 'memmap=' boot option entries

In commit:

  f28442497b5c ("x86/boot: Fix KASLR and memmap= collision")

... the memmap= option is parsed so that KASLR can avoid those reserved
regions. It uses cmdline_find_option() to get the value if memmap=
is specified, however the problem is that cmdline_find_option() can only
find the last entry if multiple memmap entries are provided. This
is not correct.

Address this by checking each command line token for a "memmap=" match
and parse each instance instead of using cmdline_find_option().

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Cc: m.miz...@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/cmdline.c |   2 +-
 arch/x86/boot/compressed/kaslr.c   | 136 ++---
 arch/x86/boot/string.c |   8 +++
 3 files changed, 91 insertions(+), 55 deletions(-)

diff --git a/arch/x86/boot/compressed/cmdline.c 
b/arch/x86/boot/compressed/cmdline.c
index 73ccf63..9dc1ce6 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -13,7 +13,7 @@ static inline char rdfs8(addr_t addr)
return *((char *)(fs + addr));
 }
 #include "../cmdline.c"
-static unsigned long get_cmd_line_ptr(void)
+unsigned long get_cmd_line_ptr(void)
 {
unsigned long cmd_line_ptr = boot_params->hdr.cmd_line_ptr;
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54c24f0..106e13b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -9,16 +9,41 @@
  * contain the entire properly aligned running kernel image.
  *
  */
+
+/*
+ * isspace() in linux/ctype.h is expected by next_args() to filter
+ * out "space/lf/tab". While boot/ctype.h conflicts with linux/ctype.h,
+ * since isdigit() is implemented in both of them. Hence disable it
+ * here.
+ */
+#define BOOT_CTYPE_H
+
+/*
+ * _ctype[] in lib/ctype.c is needed by isspace() of linux/ctype.h.
+ * While both lib/ctype.c and lib/cmdline.c will bring EXPORT_SYMBOL
+ * which is meaningless and will cause compiling error in some cases.
+ * So do not include linux/export.h and define EXPORT_SYMBOL(sym)
+ * as empty.
+ */
+#define _LINUX_EXPORT_H
+#define EXPORT_SYMBOL(sym)
+
 #include "misc.h"
 #include "error.h"
-#include "../boot.h"
 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
+/* Macros used by the included decompressor code below. */
+#define STATIC
+#include 
+
+extern unsigned long get_cmd_line_ptr(void);
+
 /* Simplified build-specific string for starting entropy. */
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
@@ -62,6 +87,7 @@ struct mem_vector {
 
 static bool memmap_too_large;
 
+
 enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
MEM_AVOID_INITRD,
@@ -85,49 +111,14 @@ static bool mem_overlaps(struct mem_vector *one, struct 
mem_vector *two)
return true;
 }
 
-/**
- * _memparse - Parse a string with mem suffixes into a number
- * @ptr: Where parse begins
- * @retptr: (output) Optional pointer to next char after parse completes
- *
- * Parses a string into a number.  The number stored at @ptr is
- * potentially suffixed with K, M, G, T, P, E.
- */
-static unsigned long long _memparse(const char *ptr, char **retptr)
+char *skip_spaces(const char *str)
 {
-   char *endptr;   /* Local pointer to end of parsed string */
-
-   unsigned long long ret = simple_strtoull(ptr, , 0);
-
-   switch (*endptr) {
-   case 'E':
-   case 'e':
-   ret <<= 10;
-   case 'P':
-   case 'p':
-   ret <<= 10;
-   case 'T':
-   case 't':
-   ret <<= 10;
-   case 'G':
-   case 'g':
-   ret <<= 10;
-   case 'M':
-   case 'm':
-   ret <<= 10;
-   case 'K':
-   case 'k':
-   ret <<= 10;
-   endptr++;
-   default:
-   break;
-   }
-
-   if (retptr)
-   *retptr = endptr;
-
-   return ret;
+   while (isspace(*str))
+   ++str;
+   return (char *)str;
 }
+#include "../../../../lib/ctype.c"
+#include "../../../../lib/cmdline.c"
 
 static int
 parse_memmap(char *p, unsigned long long

[tip:x86/boot] x86/KASLR: Parse all 'memmap=' boot option entries

2017-05-24 Thread tip-bot for Baoquan He

Commit-ID:  d52e7d5a952c5e35783f96e8c5b7fcffbb0d7c60
Gitweb: http://git.kernel.org/tip/d52e7d5a952c5e35783f96e8c5b7fcffbb0d7c60
Author: Baoquan He 
AuthorDate: Sat, 13 May 2017 13:46:28 +0800
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 09:50:27 +0200

x86/KASLR: Parse all 'memmap=' boot option entries

In commit:

  f28442497b5c ("x86/boot: Fix KASLR and memmap= collision")

... the memmap= option is parsed so that KASLR can avoid those reserved
regions. It uses cmdline_find_option() to get the value if memmap=
is specified, however the problem is that cmdline_find_option() can only
find the last entry if multiple memmap entries are provided. This
is not correct.

Address this by checking each command line token for a "memmap=" match
and parse each instance instead of using cmdline_find_option().

Signed-off-by: Baoquan He 
Acked-by: Kees Cook 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: douly.f...@cn.fujitsu.com
Cc: dyo...@redhat.com
Cc: m.miz...@jp.fujitsu.com
Link: http://lkml.kernel.org/r/1494654390-23861-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/cmdline.c |   2 +-
 arch/x86/boot/compressed/kaslr.c   | 136 ++---
 arch/x86/boot/string.c |   8 +++
 3 files changed, 91 insertions(+), 55 deletions(-)

diff --git a/arch/x86/boot/compressed/cmdline.c 
b/arch/x86/boot/compressed/cmdline.c
index 73ccf63..9dc1ce6 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -13,7 +13,7 @@ static inline char rdfs8(addr_t addr)
return *((char *)(fs + addr));
 }
 #include "../cmdline.c"
-static unsigned long get_cmd_line_ptr(void)
+unsigned long get_cmd_line_ptr(void)
 {
unsigned long cmd_line_ptr = boot_params->hdr.cmd_line_ptr;
 
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54c24f0..106e13b 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -9,16 +9,41 @@
  * contain the entire properly aligned running kernel image.
  *
  */
+
+/*
+ * isspace() in linux/ctype.h is expected by next_args() to filter
+ * out "space/lf/tab". While boot/ctype.h conflicts with linux/ctype.h,
+ * since isdigit() is implemented in both of them. Hence disable it
+ * here.
+ */
+#define BOOT_CTYPE_H
+
+/*
+ * _ctype[] in lib/ctype.c is needed by isspace() of linux/ctype.h.
+ * While both lib/ctype.c and lib/cmdline.c will bring EXPORT_SYMBOL
+ * which is meaningless and will cause compiling error in some cases.
+ * So do not include linux/export.h and define EXPORT_SYMBOL(sym)
+ * as empty.
+ */
+#define _LINUX_EXPORT_H
+#define EXPORT_SYMBOL(sym)
+
 #include "misc.h"
 #include "error.h"
-#include "../boot.h"
 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
+/* Macros used by the included decompressor code below. */
+#define STATIC
+#include 
+
+extern unsigned long get_cmd_line_ptr(void);
+
 /* Simplified build-specific string for starting entropy. */
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
@@ -62,6 +87,7 @@ struct mem_vector {
 
 static bool memmap_too_large;
 
+
 enum mem_avoid_index {
MEM_AVOID_ZO_RANGE = 0,
MEM_AVOID_INITRD,
@@ -85,49 +111,14 @@ static bool mem_overlaps(struct mem_vector *one, struct 
mem_vector *two)
return true;
 }
 
-/**
- * _memparse - Parse a string with mem suffixes into a number
- * @ptr: Where parse begins
- * @retptr: (output) Optional pointer to next char after parse completes
- *
- * Parses a string into a number.  The number stored at @ptr is
- * potentially suffixed with K, M, G, T, P, E.
- */
-static unsigned long long _memparse(const char *ptr, char **retptr)
+char *skip_spaces(const char *str)
 {
-   char *endptr;   /* Local pointer to end of parsed string */
-
-   unsigned long long ret = simple_strtoull(ptr, , 0);
-
-   switch (*endptr) {
-   case 'E':
-   case 'e':
-   ret <<= 10;
-   case 'P':
-   case 'p':
-   ret <<= 10;
-   case 'T':
-   case 't':
-   ret <<= 10;
-   case 'G':
-   case 'g':
-   ret <<= 10;
-   case 'M':
-   case 'm':
-   ret <<= 10;
-   case 'K':
-   case 'k':
-   ret <<= 10;
-   endptr++;
-   default:
-   break;
-   }
-
-   if (retptr)
-   *retptr = endptr;
-
-   return ret;
+   while (isspace(*str))
+   ++str;
+   return (char *)str;
 }
+#include "../../../../lib/ctype.c"
+#include "../../../../lib/cmdline.c"
 
 static int
 parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
@@ -142,7 +133,7 @@ parse_memmap(char *p, unsigned long long *start, unsigned 
long long *size)
return -EINVAL;

[tip:x86/urgent] x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()

2017-05-05 Thread tip-bot for Baoquan He

Commit-ID:  fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7
Gitweb: http://git.kernel.org/tip/fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7
Author: Baoquan He 
AuthorDate: Thu, 4 May 2017 10:25:47 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 5 May 2017 08:21:24 +0200

x86/mm: Fix boot crash caused by incorrect loop count calculation in 
sync_global_pgds()

Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
will make the system hang intermittently during boot. While adding 'nokaslr'
won't.

The back trace is:

 Oops:  [#1] SMP

 RIP: memcpy_erms()
 [  ]
 Call Trace:
  pmem_rw_page()
  bdev_read_page()
  do_mpage_readpage()
  mpage_readpages()
  blkdev_readpages()
  __do_page_cache_readahead()
  force_page_cache_readahead()
  page_cache_sync_readahead()
  generic_file_read_iter()
  blkdev_read_iter()
  __vfs_read()
  vfs_read()
  SyS_read()
  entry_SYSCALL_64_fastpath()

This crash happens because the for loop count calculation in sync_global_pgds()
is not correct. When a mapping area crosses PGD entries, we should
calculate the starting address of region which next PGD covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel PGD init_mm.pgd.

In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one PGD entry. With KASLR enabled,
this area could cross two PGD entries, then the next PGD entry won't
be synced to all other processes. That is why we saw empty PGD.

Fix it.

Reported-by: Jeff Moyer 
Signed-off-by: Baoquan He 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jinbum Park 
Cc: Josh Poimboeuf 
Cc: Kees Cook 
Cc: Kirill A. Shutemov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: Yasuaki Ishimatsu 
Cc: Yinghai Lu 
Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 745e5e1..97fe887 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -94,10 +94,10 @@ __setup("noexec32=", nonx32_setup);
  */
 void sync_global_pgds(unsigned long start, unsigned long end)
 {
-   unsigned long address;
+   unsigned long addr;
 
-   for (address = start; address <= end; address += PGDIR_SIZE) {
-   pgd_t *pgd_ref = pgd_offset_k(address);
+   for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
+   pgd_t *pgd_ref = pgd_offset_k(addr);
const p4d_t *p4d_ref;
struct page *page;
 
@@ -106,7 +106,7 @@ void sync_global_pgds(unsigned long start, unsigned long 
end)
 * handle synchonization on p4d level.
 */
BUILD_BUG_ON(pgd_none(*pgd_ref));
-   p4d_ref = p4d_offset(pgd_ref, address);
+   p4d_ref = p4d_offset(pgd_ref, addr);
 
if (p4d_none(*p4d_ref))
continue;
@@ -117,8 +117,8 @@ void sync_global_pgds(unsigned long start, unsigned long 
end)
p4d_t *p4d;
spinlock_t *pgt_lock;
 
-   pgd = (pgd_t *)page_address(page) + pgd_index(address);
-   p4d = p4d_offset(pgd, address);
+   pgd = (pgd_t *)page_address(page) + pgd_index(addr);
+   p4d = p4d_offset(pgd, addr);
/* the pgt_lock only for Xen */
pgt_lock = _page_get_mm(page)->page_table_lock;
spin_lock(pgt_lock);

[tip:x86/urgent] x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds()

2017-05-05 Thread tip-bot for Baoquan He

Commit-ID:  fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7
Gitweb: http://git.kernel.org/tip/fc5f9d5f151c9fff21d3d1d2907b888a5aec3ff7
Author: Baoquan He 
AuthorDate: Thu, 4 May 2017 10:25:47 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 5 May 2017 08:21:24 +0200

x86/mm: Fix boot crash caused by incorrect loop count calculation in 
sync_global_pgds()

Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling KASLR
will make the system hang intermittently during boot. While adding 'nokaslr'
won't.

The back trace is:

 Oops:  [#1] SMP

 RIP: memcpy_erms()
 [  ]
 Call Trace:
  pmem_rw_page()
  bdev_read_page()
  do_mpage_readpage()
  mpage_readpages()
  blkdev_readpages()
  __do_page_cache_readahead()
  force_page_cache_readahead()
  page_cache_sync_readahead()
  generic_file_read_iter()
  blkdev_read_iter()
  __vfs_read()
  vfs_read()
  SyS_read()
  entry_SYSCALL_64_fastpath()

This crash happens because the for loop count calculation in sync_global_pgds()
is not correct. When a mapping area crosses PGD entries, we should
calculate the starting address of region which next PGD covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is an exact multiple of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel PGD init_mm.pgd.

In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one PGD entry. With KASLR enabled,
this area could cross two PGD entries, then the next PGD entry won't
be synced to all other processes. That is why we saw empty PGD.

Fix it.

Reported-by: Jeff Moyer 
Signed-off-by: Baoquan He 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dan Williams 
Cc: Dave Hansen 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jinbum Park 
Cc: Josh Poimboeuf 
Cc: Kees Cook 
Cc: Kirill A. Shutemov 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: Yasuaki Ishimatsu 
Cc: Yinghai Lu 
Link: http://lkml.kernel.org/r/1493864747-8506-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/init_64.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 745e5e1..97fe887 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -94,10 +94,10 @@ __setup("noexec32=", nonx32_setup);
  */
 void sync_global_pgds(unsigned long start, unsigned long end)
 {
-   unsigned long address;
+   unsigned long addr;
 
-   for (address = start; address <= end; address += PGDIR_SIZE) {
-   pgd_t *pgd_ref = pgd_offset_k(address);
+   for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
+   pgd_t *pgd_ref = pgd_offset_k(addr);
const p4d_t *p4d_ref;
struct page *page;
 
@@ -106,7 +106,7 @@ void sync_global_pgds(unsigned long start, unsigned long 
end)
 * handle synchonization on p4d level.
 */
BUILD_BUG_ON(pgd_none(*pgd_ref));
-   p4d_ref = p4d_offset(pgd_ref, address);
+   p4d_ref = p4d_offset(pgd_ref, addr);
 
if (p4d_none(*p4d_ref))
continue;
@@ -117,8 +117,8 @@ void sync_global_pgds(unsigned long start, unsigned long 
end)
p4d_t *p4d;
spinlock_t *pgt_lock;
 
-   pgd = (pgd_t *)page_address(page) + pgd_index(address);
-   p4d = p4d_offset(pgd, address);
+   pgd = (pgd_t *)page_address(page) + pgd_index(addr);
+   p4d = p4d_offset(pgd, addr);
/* the pgt_lock only for Xen */
pgt_lock = _page_get_mm(page)->page_table_lock;
spin_lock(pgt_lock);

[tip:x86/boot] x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails

2017-04-28 Thread tip-bot for Baoquan He

Commit-ID:  da63b6b20077469bd6bd96e07991ce145fc4fbc4
Gitweb: http://git.kernel.org/tip/da63b6b20077469bd6bd96e07991ce145fc4fbc4
Author: Baoquan He 
AuthorDate: Thu, 27 Apr 2017 15:42:20 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 28 Apr 2017 08:31:15 +0200

x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails

Dave found that a kdump kernel with KASLR enabled will reset to the BIOS
immediately if physical randomization failed to find a new position for
the kernel. A kernel with the 'nokaslr' option works in this case.

The reason is that KASLR will install a new page table for the identity
mapping, while it missed building it for the original kernel location
if KASLR physical randomization fails.

This only happens in the kexec/kdump kernel, because the identity mapping
has been built for kexec/kdump in the 1st kernel for the whole memory by
calling init_pgtable(). Here if physical randomizaiton fails, it won't build
the identity mapping for the original area of the kernel but change to a
new page table '_pgtable'. Then the kernel will triple fault immediately
caused by no identity mappings.

The normal kernel won't see this bug, because it comes here via startup_32()
and CR3 will be set to _pgtable already. In startup_32() the identity
mapping is built for the 0~4G area. In KASLR we just append to the existing
area instead of entirely overwriting it for on-demand identity mapping
building. So the identity mapping for the original area of kernel is still
there.

To fix it we just switch to the new identity mapping page table when physical
KASLR succeeds. Otherwise we keep the old page table unchanged just like
"nokaslr" does.

Signed-off-by: Baoquan He 
Signed-off-by: Dave Young 
Acked-by: Kees Cook 
Cc: Borislav Petkov 
Cc: Dave Jiang 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 6d9a546..54c24f0 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -597,10 +597,17 @@ void choose_random_location(unsigned long input,
add_identity_map(random_addr, output_size);
*output = random_addr;
}
+
+   /*
+* This loads the identity mapping page table.
+* This should only be done if a new physical address
+* is found for the kernel, otherwise we should keep
+* the old page table to make it be like the "nokaslr"
+* case.
+*/
+   finalize_identity_maps();
}
 
-   /* This actually loads the identity pagetable on x86_64. */
-   finalize_identity_maps();
 
/* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
if (IS_ENABLED(CONFIG_X86_64))

[tip:x86/boot] x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails

2017-04-28 Thread tip-bot for Baoquan He

Commit-ID:  da63b6b20077469bd6bd96e07991ce145fc4fbc4
Gitweb: http://git.kernel.org/tip/da63b6b20077469bd6bd96e07991ce145fc4fbc4
Author: Baoquan He 
AuthorDate: Thu, 27 Apr 2017 15:42:20 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 28 Apr 2017 08:31:15 +0200

x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails

Dave found that a kdump kernel with KASLR enabled will reset to the BIOS
immediately if physical randomization failed to find a new position for
the kernel. A kernel with the 'nokaslr' option works in this case.

The reason is that KASLR will install a new page table for the identity
mapping, while it missed building it for the original kernel location
if KASLR physical randomization fails.

This only happens in the kexec/kdump kernel, because the identity mapping
has been built for kexec/kdump in the 1st kernel for the whole memory by
calling init_pgtable(). Here if physical randomizaiton fails, it won't build
the identity mapping for the original area of the kernel but change to a
new page table '_pgtable'. Then the kernel will triple fault immediately
caused by no identity mappings.

The normal kernel won't see this bug, because it comes here via startup_32()
and CR3 will be set to _pgtable already. In startup_32() the identity
mapping is built for the 0~4G area. In KASLR we just append to the existing
area instead of entirely overwriting it for on-demand identity mapping
building. So the identity mapping for the original area of kernel is still
there.

To fix it we just switch to the new identity mapping page table when physical
KASLR succeeds. Otherwise we keep the old page table unchanged just like
"nokaslr" does.

Signed-off-by: Baoquan He 
Signed-off-by: Dave Young 
Acked-by: Kees Cook 
Cc: Borislav Petkov 
Cc: Dave Jiang 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Garnier 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 6d9a546..54c24f0 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -597,10 +597,17 @@ void choose_random_location(unsigned long input,
add_identity_map(random_addr, output_size);
*output = random_addr;
}
+
+   /*
+* This loads the identity mapping page table.
+* This should only be done if a new physical address
+* is found for the kernel, otherwise we should keep
+* the old page table to make it be like the "nokaslr"
+* case.
+*/
+   finalize_identity_maps();
}
 
-   /* This actually loads the identity pagetable on x86_64. */
-   finalize_identity_maps();
 
/* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
if (IS_ENABLED(CONFIG_X86_64))

[tip:x86/boot] boot/param: Move next_arg() function to lib/cmdline.c for later reuse

2017-04-18 Thread tip-bot for Baoquan He

Commit-ID:  f51b17c8d90f85456579c3192ab59ee031835634
Gitweb: http://git.kernel.org/tip/f51b17c8d90f85456579c3192ab59ee031835634
Author: Baoquan He 
AuthorDate: Mon, 17 Apr 2017 21:34:56 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Apr 2017 10:37:13 +0200

boot/param: Move next_arg() function to lib/cmdline.c for later reuse

next_arg() will be used to parse boot parameters in the x86/boot/compressed 
code,
so move it to lib/cmdline.c for better code reuse.

No change in functionality.

Signed-off-by: Baoquan He 
Cc: Andrew Morton 
Cc: Gustavo Padovan 
Cc: Jens Axboe 
Cc: Jessica Yu 
Cc: Johannes Berg 
Cc: Josh Triplett 
Cc: Larry Finger 
Cc: Linus Torvalds 
Cc: Niklas Söderlund 
Cc: Peter Zijlstra 
Cc: Petr Mladek 
Cc: Rasmus Villemoes 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: dave.ji...@intel.com
Cc: dyo...@redhat.com
Cc: keesc...@chromium.org
Cc: zijun_hu 
Link: http://lkml.kernel.org/r/1492436099-4017-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 include/linux/kernel.h |  1 +
 kernel/params.c| 52 -
 lib/cmdline.c  | 57 ++
 3 files changed, 58 insertions(+), 52 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4c26dc3..7ae2567 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -438,6 +438,7 @@ extern int get_option(char **str, int *pint);
 extern char *get_options(const char *str, int nints, int *ints);
 extern unsigned long long memparse(const char *ptr, char **retptr);
 extern bool parse_option_str(const char *str, const char *option);
+extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
 extern int core_kernel_data(unsigned long addr);
diff --git a/kernel/params.c b/kernel/params.c
index a6d6149..60b2d81 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -160,58 +160,6 @@ static int parse_one(char *param,
return -ENOENT;
 }
 
-/* You can use " around spaces, but can't escape ". */
-/* Hyphens and underscores equivalent in parameter names. */
-static char *next_arg(char *args, char **param, char **val)
-{
-   unsigned int i, equals = 0;
-   int in_quote = 0, quoted = 0;
-   char *next;
-
-   if (*args == '"') {
-   args++;
-   in_quote = 1;
-   quoted = 1;
-   }
-
-   for (i = 0; args[i]; i++) {
-   if (isspace(args[i]) && !in_quote)
-   break;
-   if (equals == 0) {
-   if (args[i] == '=')
-   equals = i;
-   }
-   if (args[i] == '"')
-   in_quote = !in_quote;
-   }
-
-   *param = args;
-   if (!equals)
-   *val = NULL;
-   else {
-   args[equals] = '\0';
-   *val = args + equals + 1;
-
-   /* Don't include quotes in value. */
-   if (**val == '"') {
-   (*val)++;
-   if (args[i-1] == '"')
-   args[i-1] = '\0';
-   }
-   }
-   if (quoted && args[i-1] == '"')
-   args[i-1] = '\0';
-
-   if (args[i]) {
-   args[i] = '\0';
-   next = args + i + 1;
-   } else
-   next = args + i;
-
-   /* Chew up trailing spaces. */
-   return skip_spaces(next);
-}
-
 /* Args looks like "foo=bar,bar2 baz=fuz wiz". */
 char *parse_args(const char *doing,
 char *args,
diff --git a/lib/cmdline.c b/lib/cmdline.c
index 8f13cf7..3c6432df 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * If a hyphen was found in get_option, this will handle the
@@ -189,3 +190,59 @@ bool parse_option_str(const char *str, const char *option)
 
return false;
 }
+
+/*
+ * Parse a string to get a param value pair.
+ * You can use " around spaces, but can't escape ".
+ * Hyphens and underscores equivalent in parameter names.
+ */
+char *next_arg(char *args, char **param, char **val)
+{
+   unsigned int i, equals = 0;
+   int in_quote = 0, quoted = 0;
+   char *next;
+
+   if (*args == '"') {
+   args++;
+   in_quote = 1;
+   quoted = 1;
+   }
+
+   for (i = 0; args[i]; i++) {
+   if (isspace(args[i]) && !in_quote)
+   break;
+   if (equals == 0) {
+

[tip:x86/boot] boot/param: Move next_arg() function to lib/cmdline.c for later reuse

2017-04-18 Thread tip-bot for Baoquan He

Commit-ID:  f51b17c8d90f85456579c3192ab59ee031835634
Gitweb: http://git.kernel.org/tip/f51b17c8d90f85456579c3192ab59ee031835634
Author: Baoquan He 
AuthorDate: Mon, 17 Apr 2017 21:34:56 +0800
Committer:  Ingo Molnar 
CommitDate: Tue, 18 Apr 2017 10:37:13 +0200

boot/param: Move next_arg() function to lib/cmdline.c for later reuse

next_arg() will be used to parse boot parameters in the x86/boot/compressed 
code,
so move it to lib/cmdline.c for better code reuse.

No change in functionality.

Signed-off-by: Baoquan He 
Cc: Andrew Morton 
Cc: Gustavo Padovan 
Cc: Jens Axboe 
Cc: Jessica Yu 
Cc: Johannes Berg 
Cc: Josh Triplett 
Cc: Larry Finger 
Cc: Linus Torvalds 
Cc: Niklas Söderlund 
Cc: Peter Zijlstra 
Cc: Petr Mladek 
Cc: Rasmus Villemoes 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Cc: dave.ji...@intel.com
Cc: dyo...@redhat.com
Cc: keesc...@chromium.org
Cc: zijun_hu 
Link: http://lkml.kernel.org/r/1492436099-4017-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 include/linux/kernel.h |  1 +
 kernel/params.c| 52 -
 lib/cmdline.c  | 57 ++
 3 files changed, 58 insertions(+), 52 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4c26dc3..7ae2567 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -438,6 +438,7 @@ extern int get_option(char **str, int *pint);
 extern char *get_options(const char *str, int nints, int *ints);
 extern unsigned long long memparse(const char *ptr, char **retptr);
 extern bool parse_option_str(const char *str, const char *option);
+extern char *next_arg(char *args, char **param, char **val);
 
 extern int core_kernel_text(unsigned long addr);
 extern int core_kernel_data(unsigned long addr);
diff --git a/kernel/params.c b/kernel/params.c
index a6d6149..60b2d81 100644
--- a/kernel/params.c
+++ b/kernel/params.c
@@ -160,58 +160,6 @@ static int parse_one(char *param,
return -ENOENT;
 }
 
-/* You can use " around spaces, but can't escape ". */
-/* Hyphens and underscores equivalent in parameter names. */
-static char *next_arg(char *args, char **param, char **val)
-{
-   unsigned int i, equals = 0;
-   int in_quote = 0, quoted = 0;
-   char *next;
-
-   if (*args == '"') {
-   args++;
-   in_quote = 1;
-   quoted = 1;
-   }
-
-   for (i = 0; args[i]; i++) {
-   if (isspace(args[i]) && !in_quote)
-   break;
-   if (equals == 0) {
-   if (args[i] == '=')
-   equals = i;
-   }
-   if (args[i] == '"')
-   in_quote = !in_quote;
-   }
-
-   *param = args;
-   if (!equals)
-   *val = NULL;
-   else {
-   args[equals] = '\0';
-   *val = args + equals + 1;
-
-   /* Don't include quotes in value. */
-   if (**val == '"') {
-   (*val)++;
-   if (args[i-1] == '"')
-   args[i-1] = '\0';
-   }
-   }
-   if (quoted && args[i-1] == '"')
-   args[i-1] = '\0';
-
-   if (args[i]) {
-   args[i] = '\0';
-   next = args + i + 1;
-   } else
-   next = args + i;
-
-   /* Chew up trailing spaces. */
-   return skip_spaces(next);
-}
-
 /* Args looks like "foo=bar,bar2 baz=fuz wiz". */
 char *parse_args(const char *doing,
 char *args,
diff --git a/lib/cmdline.c b/lib/cmdline.c
index 8f13cf7..3c6432df 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * If a hyphen was found in get_option, this will handle the
@@ -189,3 +190,59 @@ bool parse_option_str(const char *str, const char *option)
 
return false;
 }
+
+/*
+ * Parse a string to get a param value pair.
+ * You can use " around spaces, but can't escape ".
+ * Hyphens and underscores equivalent in parameter names.
+ */
+char *next_arg(char *args, char **param, char **val)
+{
+   unsigned int i, equals = 0;
+   int in_quote = 0, quoted = 0;
+   char *next;
+
+   if (*args == '"') {
+   args++;
+   in_quote = 1;
+   quoted = 1;
+   }
+
+   for (i = 0; args[i]; i++) {
+   if (isspace(args[i]) && !in_quote)
+   break;
+   if (equals == 0) {
+   if (args[i] == '=')
+   equals = i;
+   }
+   if (args[i] == '"')
+   in_quote = !in_quote;
+   }
+
+   *param = args;
+   if (!equals)
+   *val = NULL;
+   else {
+   args[equals] = '\0';
+   *val = args + equals + 1;
+
+   /* Don't include quotes in value. */
+

[tip:efi/core] x86/efi: Clean up a minor mistake in comment

2017-04-05 Thread tip-bot for Baoquan He

Commit-ID:  b1d1776139698d7522dfd46aa81a056f030ddaf7
Gitweb: http://git.kernel.org/tip/b1d1776139698d7522dfd46aa81a056f030ddaf7
Author: Baoquan He 
AuthorDate: Tue, 4 Apr 2017 17:02:43 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Apr 2017 12:27:26 +0200

x86/efi: Clean up a minor mistake in comment

EFI allocates runtime services regions from EFI_VA_START, -4G, down
to -68G, EFI_VA_END - 64G altogether, top-down.

The mechanism was introduced in commit:

  d2f7cbe7b26a7 ("x86/efi: Runtime services virtual mapping")

Fix the comment that still says bottom-up.

Signed-off-by: Baoquan He 
Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a4695da..6cbf9e0 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -47,7 +47,7 @@
 #include 
 
 /*
- * We allocate runtime services regions bottom-up, starting from -4G, i.e.
+ * We allocate runtime services regions top-down, starting from -4G, i.e.
  * 0x___ and limit EFI VA mapping space to 64G.
  */
 static u64 efi_va = EFI_VA_START;

[tip:efi/core] x86/efi: Clean up a minor mistake in comment

2017-04-05 Thread tip-bot for Baoquan He

Commit-ID:  b1d1776139698d7522dfd46aa81a056f030ddaf7
Gitweb: http://git.kernel.org/tip/b1d1776139698d7522dfd46aa81a056f030ddaf7
Author: Baoquan He 
AuthorDate: Tue, 4 Apr 2017 17:02:43 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Apr 2017 12:27:26 +0200

x86/efi: Clean up a minor mistake in comment

EFI allocates runtime services regions from EFI_VA_START, -4G, down
to -68G, EFI_VA_END - 64G altogether, top-down.

The mechanism was introduced in commit:

  d2f7cbe7b26a7 ("x86/efi: Runtime services virtual mapping")

Fix the comment that still says bottom-up.

Signed-off-by: Baoquan He 
Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a4695da..6cbf9e0 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -47,7 +47,7 @@
 #include 
 
 /*
- * We allocate runtime services regions bottom-up, starting from -4G, i.e.
+ * We allocate runtime services regions top-down, starting from -4G, i.e.
  * 0x___ and limit EFI VA mapping space to 64G.
  */
 static u64 efi_va = EFI_VA_START;

[tip:efi/core] x86/efi: Clean up a minor mistake in comment

2017-04-05 Thread tip-bot for Baoquan He

Commit-ID:  b22c3d7d98ec76870ac4cdeb7cc1593f2d371f5a
Gitweb: http://git.kernel.org/tip/b22c3d7d98ec76870ac4cdeb7cc1593f2d371f5a
Author: Baoquan He 
AuthorDate: Tue, 4 Apr 2017 17:02:43 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Apr 2017 09:27:52 +0200

x86/efi: Clean up a minor mistake in comment

EFI allocates runtime services regions from EFI_VA_START, -4G, down
to -68G, EFI_VA_END - 64G altogether, top-down.

The mechanism was introduced in commit:

  d2f7cbe7b26a7 ("x86/efi: Runtime services virtual mapping")

Fix the comment that still says bottom-up.

Signed-off-by: Baoquan He 
Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index d56dd864..4e043a8 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -47,7 +47,7 @@
 #include 
 
 /*
- * We allocate runtime services regions bottom-up, starting from -4G, i.e.
+ * We allocate runtime services regions top-down, starting from -4G, i.e.
  * 0x___ and limit EFI VA mapping space to 64G.
  */
 static u64 efi_va = EFI_VA_START;

[tip:efi/core] x86/efi: Clean up a minor mistake in comment

2017-04-05 Thread tip-bot for Baoquan He

Commit-ID:  b22c3d7d98ec76870ac4cdeb7cc1593f2d371f5a
Gitweb: http://git.kernel.org/tip/b22c3d7d98ec76870ac4cdeb7cc1593f2d371f5a
Author: Baoquan He 
AuthorDate: Tue, 4 Apr 2017 17:02:43 +0100
Committer:  Ingo Molnar 
CommitDate: Wed, 5 Apr 2017 09:27:52 +0200

x86/efi: Clean up a minor mistake in comment

EFI allocates runtime services regions from EFI_VA_START, -4G, down
to -68G, EFI_VA_END - 64G altogether, top-down.

The mechanism was introduced in commit:

  d2f7cbe7b26a7 ("x86/efi: Runtime services virtual mapping")

Fix the comment that still says bottom-up.

Signed-off-by: Baoquan He 
Signed-off-by: Ard Biesheuvel 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20170404160245.27812-10-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/efi_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index d56dd864..4e043a8 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -47,7 +47,7 @@
 #include 
 
 /*
- * We allocate runtime services regions bottom-up, starting from -4G, i.e.
+ * We allocate runtime services regions top-down, starting from -4G, i.e.
  * 0x___ and limit EFI VA mapping space to 64G.
  */
 static u64 efi_va = EFI_VA_START;

[tip:x86/urgent] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization

2017-03-24 Thread tip-bot for Baoquan He

Commit-ID:  a46f60d76004965e5669dbf3fc21ef3bc3632eb4
Gitweb: http://git.kernel.org/tip/a46f60d76004965e5669dbf3fc21ef3bc3632eb4
Author: Baoquan He 
AuthorDate: Fri, 24 Mar 2017 12:59:52 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 24 Mar 2017 09:04:27 +0100

x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization

Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffef - fffe (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.

Signed-off-by: Baoquan He 
Reviewed-by: Bhupesh Sharma 
Acked-by: Dave Young 
Acked-by: Thomas Garnier 
Cc:  #4.8+
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Masahiro Yamada 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/kaslr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 887e571..aed2064 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -48,7 +48,7 @@ static const unsigned long vaddr_start = __PAGE_OFFSET_BASE;
 #if defined(CONFIG_X86_ESPFIX64)
 static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
 #elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_START;
+static const unsigned long vaddr_end = EFI_VA_END;
 #else
 static const unsigned long vaddr_end = __START_KERNEL_map;
 #endif
@@ -105,7 +105,7 @@ void __init kernel_randomize_memory(void)
 */
BUILD_BUG_ON(vaddr_start >= vaddr_end);
BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
-vaddr_end >= EFI_VA_START);
+vaddr_end >= EFI_VA_END);
BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
  IS_ENABLED(CONFIG_EFI)) &&
 vaddr_end >= __START_KERNEL_map);

[tip:x86/urgent] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization

2017-03-24 Thread tip-bot for Baoquan He

Commit-ID:  a46f60d76004965e5669dbf3fc21ef3bc3632eb4
Gitweb: http://git.kernel.org/tip/a46f60d76004965e5669dbf3fc21ef3bc3632eb4
Author: Baoquan He 
AuthorDate: Fri, 24 Mar 2017 12:59:52 +0800
Committer:  Ingo Molnar 
CommitDate: Fri, 24 Mar 2017 09:04:27 +0100

x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization

Currently KASLR is enabled on three regions: the direct mapping of physical
memory, vamlloc and vmemmap. However the EFI region is also mistakenly
included for VA space randomization because of misusing EFI_VA_START macro
and assuming EFI_VA_START < EFI_VA_END.

(This breaks kexec and possibly other things that rely on stable addresses.)

The EFI region is reserved for EFI runtime services virtual mapping which
should not be included in KASLR ranges. In Documentation/x86/x86_64/mm.txt,
we can see:

  ffef - fffe (=64 GB) EFI region mapping space

EFI uses the space from -4G to -64G thus EFI_VA_START > EFI_VA_END,
Here EFI_VA_START = -4G, and EFI_VA_END = -64G.

Changing EFI_VA_START to EFI_VA_END in mm/kaslr.c fixes this problem.

Signed-off-by: Baoquan He 
Reviewed-by: Bhupesh Sharma 
Acked-by: Dave Young 
Acked-by: Thomas Garnier 
Cc:  #4.8+
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Ard Biesheuvel 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Masahiro Yamada 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/1490331592-31860-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/mm/kaslr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 887e571..aed2064 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -48,7 +48,7 @@ static const unsigned long vaddr_start = __PAGE_OFFSET_BASE;
 #if defined(CONFIG_X86_ESPFIX64)
 static const unsigned long vaddr_end = ESPFIX_BASE_ADDR;
 #elif defined(CONFIG_EFI)
-static const unsigned long vaddr_end = EFI_VA_START;
+static const unsigned long vaddr_end = EFI_VA_END;
 #else
 static const unsigned long vaddr_end = __START_KERNEL_map;
 #endif
@@ -105,7 +105,7 @@ void __init kernel_randomize_memory(void)
 */
BUILD_BUG_ON(vaddr_start >= vaddr_end);
BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) &&
-vaddr_end >= EFI_VA_START);
+vaddr_end >= EFI_VA_END);
BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) ||
  IS_ENABLED(CONFIG_EFI)) &&
 vaddr_end >= __START_KERNEL_map);

[tip:x86/apic] x86/apic, ACPI: Fix incorrect assignment when handling apic/x2apic entries

2016-08-15 Thread tip-bot for Baoquan He

Commit-ID:  31b02dd718712f4c45afbeea7fbd187ecb1b202c
Gitweb: http://git.kernel.org/tip/31b02dd718712f4c45afbeea7fbd187ecb1b202c
Author: Baoquan He 
AuthorDate: Fri, 12 Aug 2016 15:21:47 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 15 Aug 2016 08:53:44 +0200

x86/apic, ACPI: Fix incorrect assignment when handling apic/x2apic entries

By pure accident the bug makes no functional difference, because the only
expression where we are using these values is (!count && !x2count), in which
the variables are interchangeable, but it makes sense to fix the bug
nevertheless.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/1470986507-24191-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/acpi/boot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2087bea..1ad5fe2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1018,8 +1018,8 @@ static int __init acpi_parse_madt_lapic_entries(void)
return ret;
}
 
-   x2count = madt_proc[0].count;
-   count = madt_proc[1].count;
+   count = madt_proc[0].count;
+   x2count = madt_proc[1].count;
}
if (!count && !x2count) {
printk(KERN_ERR PREFIX "No LAPIC entries present\n");

[tip:x86/apic] x86/apic, ACPI: Fix incorrect assignment when handling apic/x2apic entries

2016-08-15 Thread tip-bot for Baoquan He

Commit-ID:  31b02dd718712f4c45afbeea7fbd187ecb1b202c
Gitweb: http://git.kernel.org/tip/31b02dd718712f4c45afbeea7fbd187ecb1b202c
Author: Baoquan He 
AuthorDate: Fri, 12 Aug 2016 15:21:47 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 15 Aug 2016 08:53:44 +0200

x86/apic, ACPI: Fix incorrect assignment when handling apic/x2apic entries

By pure accident the bug makes no functional difference, because the only
expression where we are using these values is (!count && !x2count), in which
the variables are interchangeable, but it makes sense to fix the bug
nevertheless.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/1470986507-24191-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/acpi/boot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2087bea..1ad5fe2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1018,8 +1018,8 @@ static int __init acpi_parse_madt_lapic_entries(void)
return ret;
}
 
-   x2count = madt_proc[0].count;
-   count = madt_proc[1].count;
+   count = madt_proc[0].count;
+   x2count = madt_proc[1].count;
}
if (!count && !x2count) {
printk(KERN_ERR PREFIX "No LAPIC entries present\n");

[tip:x86/apic] x86/apic, ACPI: Remove the repeated lapic address override entry parsing

2016-08-15 Thread tip-bot for Baoquan He

Commit-ID:  6de421198c75d95088331e6a480e952292b0e121
Gitweb: http://git.kernel.org/tip/6de421198c75d95088331e6a480e952292b0e121
Author: Baoquan He 
AuthorDate: Fri, 12 Aug 2016 14:57:13 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 15 Aug 2016 08:53:37 +0200

x86/apic, ACPI: Remove the repeated lapic address override entry parsing

The ACPI MADT has a 32-bit field providing lapic address at which
each processor can access its lapic information. MADT also contains
an optional entry to provide a 64-bit address to override the 32-bit
one. However the current code does the lapic address override entry
parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
all MADT entries.

So in this patch we remove the repeated code in the 2nd part.

Meanwhile print lapic override entry information like other MADT entry,
this will be added to boot log.

This patch is not supposed to change any runtime behavior, other than
improving kernel messages.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/acpi/boot.c | 17 ++---
 arch/x86/kernel/apic/apic.c |  2 +-
 2 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 90d84c3..2087bea 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -282,6 +282,8 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * 
header,
if (BAD_MADT_ENTRY(lapic_addr_ovr, end))
return -EINVAL;
 
+   acpi_table_print_madt_entry(header);
+
acpi_lapic_addr = lapic_addr_ovr->address;
 
return 0;
@@ -998,21 +1000,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
if (!boot_cpu_has(X86_FEATURE_APIC))
return -ENODEV;
 
-   /*
-* Note that the LAPIC address is obtained from the MADT (32-bit value)
-* and (optionally) overridden by a LAPIC_ADDR_OVR entry (64-bit value).
-*/
-
-   count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_OVERRIDE,
- acpi_parse_lapic_addr_ovr, 0);
-   if (count < 0) {
-   printk(KERN_ERR PREFIX
-  "Error parsing LAPIC address override entry\n");
-   return count;
-   }
-
-   register_lapic_address(acpi_lapic_addr);
-
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_SAPIC,
  acpi_parse_sapic, MAX_LOCAL_APIC);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index cea4fc1..63b7484 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1825,7 +1825,7 @@ void __init register_lapic_address(unsigned long address)
if (!x2apic_mode) {
set_fixmap_nocache(FIX_APIC_BASE, address);
apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
-   APIC_BASE, mp_lapic_addr);
+   APIC_BASE, address);
}
if (boot_cpu_physical_apicid == -1U) {
boot_cpu_physical_apicid  = read_apic_id();

[tip:x86/apic] x86/apic, ACPI: Remove the repeated lapic address override entry parsing

2016-08-15 Thread tip-bot for Baoquan He

Commit-ID:  6de421198c75d95088331e6a480e952292b0e121
Gitweb: http://git.kernel.org/tip/6de421198c75d95088331e6a480e952292b0e121
Author: Baoquan He 
AuthorDate: Fri, 12 Aug 2016 14:57:13 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 15 Aug 2016 08:53:37 +0200

x86/apic, ACPI: Remove the repeated lapic address override entry parsing

The ACPI MADT has a 32-bit field providing lapic address at which
each processor can access its lapic information. MADT also contains
an optional entry to provide a 64-bit address to override the 32-bit
one. However the current code does the lapic address override entry
parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
all MADT entries.

So in this patch we remove the repeated code in the 2nd part.

Meanwhile print lapic override entry information like other MADT entry,
this will be added to boot log.

This patch is not supposed to change any runtime behavior, other than
improving kernel messages.

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/acpi/boot.c | 17 ++---
 arch/x86/kernel/apic/apic.c |  2 +-
 2 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 90d84c3..2087bea 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -282,6 +282,8 @@ acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * 
header,
if (BAD_MADT_ENTRY(lapic_addr_ovr, end))
return -EINVAL;
 
+   acpi_table_print_madt_entry(header);
+
acpi_lapic_addr = lapic_addr_ovr->address;
 
return 0;
@@ -998,21 +1000,6 @@ static int __init acpi_parse_madt_lapic_entries(void)
if (!boot_cpu_has(X86_FEATURE_APIC))
return -ENODEV;
 
-   /*
-* Note that the LAPIC address is obtained from the MADT (32-bit value)
-* and (optionally) overridden by a LAPIC_ADDR_OVR entry (64-bit value).
-*/
-
-   count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_OVERRIDE,
- acpi_parse_lapic_addr_ovr, 0);
-   if (count < 0) {
-   printk(KERN_ERR PREFIX
-  "Error parsing LAPIC address override entry\n");
-   return count;
-   }
-
-   register_lapic_address(acpi_lapic_addr);
-
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_SAPIC,
  acpi_parse_sapic, MAX_LOCAL_APIC);
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index cea4fc1..63b7484 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1825,7 +1825,7 @@ void __init register_lapic_address(unsigned long address)
if (!x2apic_mode) {
set_fixmap_nocache(FIX_APIC_BASE, address);
apic_printk(APIC_VERBOSE, "mapped APIC to %16lx (%16lx)\n",
-   APIC_BASE, mp_lapic_addr);
+   APIC_BASE, address);
}
if (boot_cpu_physical_apicid == -1U) {
boot_cpu_physical_apicid  = read_apic_id();

[tip:x86/apic] x86/mm/numa: Open code function early_get_boot_cpu_id()

2016-08-15 Thread tip-bot for Baoquan He

Commit-ID:  a91bf718dbc993ea582cd53c0cb711a0839b4603
Gitweb: http://git.kernel.org/tip/a91bf718dbc993ea582cd53c0cb711a0839b4603
Author: Baoquan He 
AuthorDate: Fri, 12 Aug 2016 14:57:12 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 15 Aug 2016 08:51:54 +0200

x86/mm/numa: Open code function early_get_boot_cpu_id()

Previously early_acpi_boot_init() was called in early_get_boot_cpu_id()
to get the value for boot_cpu_physical_apicid. Now early_acpi_boot_init()
has been taken out and moved to setup_arch(), the name of
early_get_boot_cpu_id() doesn't match its implementation anymore, and
only the getting boot-time SMP configuration code was left.

So in this patch we open code it.

Also move the smp_found_config check into default_get_smp_config to
simplify code, because both early_get_smp_config() and get_smp_config()
call x86_init.mpparse.get_smp_config().

Also remove the redundent CONFIG_X86_MPPARSE #ifdef check when we call
early_get_smp_config().

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/mpparse.c |  3 +++
 arch/x86/kernel/setup.c   |  3 +--
 arch/x86/mm/amdtopology.c | 22 +-
 3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 068c4a9..0f8d204 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -499,6 +499,9 @@ void __init default_get_smp_config(unsigned int early)
 {
struct mpf_intel *mpf = mpf_found;
 
+   if (!smp_found_config)
+   return;
+
if (!mpf)
return;
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0fa60f5..cbf5634 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1221,8 +1221,7 @@ void __init setup_arch(char **cmdline_p)
/*
 * get boot-time SMP configuration:
 */
-   if (smp_found_config)
-   get_smp_config();
+   get_smp_config();
 
prefill_possible_map();
 
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index ba47524..d1c7de0 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -52,21 +52,6 @@ static __init int find_northbridge(void)
return -ENOENT;
 }
 
-static __init void early_get_boot_cpu_id(void)
-{
-   /*
-* need to get the APIC ID of the BSP so can use that to
-* create apicid_to_node in amd_scan_nodes()
-*/
-#ifdef CONFIG_X86_MPPARSE
-   /*
-* get boot-time SMP configuration:
-*/
-   if (smp_found_config)
-   early_get_smp_config();
-#endif
-}
-
 int __init amd_numa_init(void)
 {
u64 start = PFN_PHYS(0);
@@ -180,8 +165,11 @@ int __init amd_numa_init(void)
cores = 1 << bits;
apicid_base = 0;
 
-   /* get the APIC ID of the BSP early for systems with apicid lifting */
-   early_get_boot_cpu_id();
+   /*
+* get boot-time SMP configuration:
+*/
+   early_get_smp_config();
+
if (boot_cpu_physical_apicid > 0) {
pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
apicid_base = boot_cpu_physical_apicid;

[tip:x86/apic] x86/mm/numa: Open code function early_get_boot_cpu_id()

2016-08-15 Thread tip-bot for Baoquan He

Commit-ID:  a91bf718dbc993ea582cd53c0cb711a0839b4603
Gitweb: http://git.kernel.org/tip/a91bf718dbc993ea582cd53c0cb711a0839b4603
Author: Baoquan He 
AuthorDate: Fri, 12 Aug 2016 14:57:12 +0800
Committer:  Ingo Molnar 
CommitDate: Mon, 15 Aug 2016 08:51:54 +0200

x86/mm/numa: Open code function early_get_boot_cpu_id()

Previously early_acpi_boot_init() was called in early_get_boot_cpu_id()
to get the value for boot_cpu_physical_apicid. Now early_acpi_boot_init()
has been taken out and moved to setup_arch(), the name of
early_get_boot_cpu_id() doesn't match its implementation anymore, and
only the getting boot-time SMP configuration code was left.

So in this patch we open code it.

Also move the smp_found_config check into default_get_smp_config to
simplify code, because both early_get_smp_config() and get_smp_config()
call x86_init.mpparse.get_smp_config().

Also remove the redundent CONFIG_X86_MPPARSE #ifdef check when we call
early_get_smp_config().

Signed-off-by: Baoquan He 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-a...@vger.kernel.org
Cc: r...@rjwysocki.net
Link: http://lkml.kernel.org/r/1470985033-22493-1-git-send-email-...@redhat.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/mpparse.c |  3 +++
 arch/x86/kernel/setup.c   |  3 +--
 arch/x86/mm/amdtopology.c | 22 +-
 3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index 068c4a9..0f8d204 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -499,6 +499,9 @@ void __init default_get_smp_config(unsigned int early)
 {
struct mpf_intel *mpf = mpf_found;
 
+   if (!smp_found_config)
+   return;
+
if (!mpf)
return;
 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0fa60f5..cbf5634 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1221,8 +1221,7 @@ void __init setup_arch(char **cmdline_p)
/*
 * get boot-time SMP configuration:
 */
-   if (smp_found_config)
-   get_smp_config();
+   get_smp_config();
 
prefill_possible_map();
 
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index ba47524..d1c7de0 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -52,21 +52,6 @@ static __init int find_northbridge(void)
return -ENOENT;
 }
 
-static __init void early_get_boot_cpu_id(void)
-{
-   /*
-* need to get the APIC ID of the BSP so can use that to
-* create apicid_to_node in amd_scan_nodes()
-*/
-#ifdef CONFIG_X86_MPPARSE
-   /*
-* get boot-time SMP configuration:
-*/
-   if (smp_found_config)
-   early_get_smp_config();
-#endif
-}
-
 int __init amd_numa_init(void)
 {
u64 start = PFN_PHYS(0);
@@ -180,8 +165,11 @@ int __init amd_numa_init(void)
cores = 1 << bits;
apicid_base = 0;
 
-   /* get the APIC ID of the BSP early for systems with apicid lifting */
-   early_get_boot_cpu_id();
+   /*
+* get boot-time SMP configuration:
+*/
+   early_get_smp_config();
+
if (boot_cpu_physical_apicid > 0) {
pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
apicid_base = boot_cpu_physical_apicid;

[tip:x86/boot] x86/KASLR: Randomize virtual address separately

2016-06-26 Thread tip-bot for Baoquan He

Commit-ID:  8391c73c96f28d4e8c40fd401fd0c9c04391b44a
Gitweb: http://git.kernel.org/tip/8391c73c96f28d4e8c40fd401fd0c9c04391b44a
Author: Baoquan He 
AuthorDate: Wed, 25 May 2016 15:45:32 -0700
Committer:  Ingo Molnar 
CommitDate: Sun, 26 Jun 2016 12:32:04 +0200

x86/KASLR: Randomize virtual address separately

The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.

Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.

[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He 
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 41 +
 arch/x86/boot/compressed/misc.c  | 49 
 arch/x86/boot/compressed/misc.h  | 22 ++
 3 files changed, 64 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54037c9..5550546 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -463,17 +463,20 @@ static unsigned long find_random_virt_addr(unsigned long 
minimum,
  * Since this function examines addresses much more numerically,
  * it takes the input and output pointers as 'unsigned long'.
  */
-unsigned char *choose_random_location(unsigned long input,
- unsigned long input_size,
- unsigned long output,
- unsigned long output_size)
+void choose_random_location(unsigned long input,
+   unsigned long input_size,
+   unsigned long *output,
+   unsigned long output_size,
+   unsigned long *virt_addr)
 {
-   unsigned long choice = output;
unsigned long random_addr;
 
+   /* By default, keep output position unchanged. */
+   *virt_addr = *output;
+
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
-   goto out;
+   return;
}
 
boot_params->hdr.loadflags |= KASLR_FLAG;
@@ -482,25 +485,25 @@ unsigned char *choose_random_location(unsigned long input,
initialize_identity_maps();
 
/* Record the various known unsafe memory ranges. */
-   mem_avoid_init(input, input_size, output);
+   mem_avoid_init(input, input_size, *output);
 
/* Walk e820 and find a random address. */
-   random_addr = find_random_phys_addr(output, output_size);
+   random_addr = find_random_phys_addr(*output, output_size);
if (!random_addr) {
warn("KASLR disabled: could not find suitable E820 region!");
-   goto out;
+   } else {
+   /* Update the new physical address location. */
+   if (*output != random_addr) {
+   add_identity_map(random_addr, output_size);
+   *output = random_addr;
+   }
}
 
-   /* Always enforce the minimum. */
-   if (random_addr < choice)
-   goto out;
-
-   choice = random_addr;
-
-

[tip:x86/boot] x86/KASLR: Randomize virtual address separately

2016-06-26 Thread tip-bot for Baoquan He

Commit-ID:  8391c73c96f28d4e8c40fd401fd0c9c04391b44a
Gitweb: http://git.kernel.org/tip/8391c73c96f28d4e8c40fd401fd0c9c04391b44a
Author: Baoquan He 
AuthorDate: Wed, 25 May 2016 15:45:32 -0700
Committer:  Ingo Molnar 
CommitDate: Sun, 26 Jun 2016 12:32:04 +0200

x86/KASLR: Randomize virtual address separately

The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.

Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.

[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He 
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 41 +
 arch/x86/boot/compressed/misc.c  | 49 
 arch/x86/boot/compressed/misc.h  | 22 ++
 3 files changed, 64 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54037c9..5550546 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -463,17 +463,20 @@ static unsigned long find_random_virt_addr(unsigned long 
minimum,
  * Since this function examines addresses much more numerically,
  * it takes the input and output pointers as 'unsigned long'.
  */
-unsigned char *choose_random_location(unsigned long input,
- unsigned long input_size,
- unsigned long output,
- unsigned long output_size)
+void choose_random_location(unsigned long input,
+   unsigned long input_size,
+   unsigned long *output,
+   unsigned long output_size,
+   unsigned long *virt_addr)
 {
-   unsigned long choice = output;
unsigned long random_addr;
 
+   /* By default, keep output position unchanged. */
+   *virt_addr = *output;
+
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
-   goto out;
+   return;
}
 
boot_params->hdr.loadflags |= KASLR_FLAG;
@@ -482,25 +485,25 @@ unsigned char *choose_random_location(unsigned long input,
initialize_identity_maps();
 
/* Record the various known unsafe memory ranges. */
-   mem_avoid_init(input, input_size, output);
+   mem_avoid_init(input, input_size, *output);
 
/* Walk e820 and find a random address. */
-   random_addr = find_random_phys_addr(output, output_size);
+   random_addr = find_random_phys_addr(*output, output_size);
if (!random_addr) {
warn("KASLR disabled: could not find suitable E820 region!");
-   goto out;
+   } else {
+   /* Update the new physical address location. */
+   if (*output != random_addr) {
+   add_identity_map(random_addr, output_size);
+   *output = random_addr;
+   }
}
 
-   /* Always enforce the minimum. */
-   if (random_addr < choice)
-   goto out;
-
-   choice = random_addr;
-
-   add_identity_map(choice, output_size);
-
/* This actually loads the identity pagetable on x86_64. */
finalize_identity_maps();
-out:
-   return (unsigned char *)choice;
+
+   /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
+   if (IS_ENABLED(CONFIG_X86_64))
+   random_addr = find_random_virt_addr(LOAD_PHYSICAL_ADDR, 
output_size);
+

[tip:x86/boot] x86/KASLR: Randomize virtual address separately

2016-06-17 Thread tip-bot for Baoquan He

Commit-ID:  ad908dc080e2d8ab26391d0013d2c8157ca0e2da
Gitweb: http://git.kernel.org/tip/ad908dc080e2d8ab26391d0013d2c8157ca0e2da
Author: Baoquan He 
AuthorDate: Wed, 25 May 2016 15:45:32 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 17 Jun 2016 11:03:47 +0200

x86/KASLR: Randomize virtual address separately

The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.

Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.

[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He 
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 41 +
 arch/x86/boot/compressed/misc.c  | 49 
 arch/x86/boot/compressed/misc.h  | 22 ++
 3 files changed, 64 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54037c9..5550546 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -463,17 +463,20 @@ static unsigned long find_random_virt_addr(unsigned long 
minimum,
  * Since this function examines addresses much more numerically,
  * it takes the input and output pointers as 'unsigned long'.
  */
-unsigned char *choose_random_location(unsigned long input,
- unsigned long input_size,
- unsigned long output,
- unsigned long output_size)
+void choose_random_location(unsigned long input,
+   unsigned long input_size,
+   unsigned long *output,
+   unsigned long output_size,
+   unsigned long *virt_addr)
 {
-   unsigned long choice = output;
unsigned long random_addr;
 
+   /* By default, keep output position unchanged. */
+   *virt_addr = *output;
+
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
-   goto out;
+   return;
}
 
boot_params->hdr.loadflags |= KASLR_FLAG;
@@ -482,25 +485,25 @@ unsigned char *choose_random_location(unsigned long input,
initialize_identity_maps();
 
/* Record the various known unsafe memory ranges. */
-   mem_avoid_init(input, input_size, output);
+   mem_avoid_init(input, input_size, *output);
 
/* Walk e820 and find a random address. */
-   random_addr = find_random_phys_addr(output, output_size);
+   random_addr = find_random_phys_addr(*output, output_size);
if (!random_addr) {
warn("KASLR disabled: could not find suitable E820 region!");
-   goto out;
+   } else {
+   /* Update the new physical address location. */
+   if (*output != random_addr) {
+   add_identity_map(random_addr, output_size);
+   *output = random_addr;
+   }
}
 
-   /* Always enforce the minimum. */
-   if (random_addr < choice)
-   goto out;
-
-   choice = random_addr;
-
-

[tip:x86/boot] x86/KASLR: Randomize virtual address separately

2016-06-17 Thread tip-bot for Baoquan He

Commit-ID:  ad908dc080e2d8ab26391d0013d2c8157ca0e2da
Gitweb: http://git.kernel.org/tip/ad908dc080e2d8ab26391d0013d2c8157ca0e2da
Author: Baoquan He 
AuthorDate: Wed, 25 May 2016 15:45:32 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 17 Jun 2016 11:03:47 +0200

x86/KASLR: Randomize virtual address separately

The current KASLR implementation randomizes the physical and virtual
addresses of the kernel together (both are offset by the same amount). It
calculates the delta of the physical address where vmlinux was linked
to load and where it is finally loaded. If the delta is not equal to 0
(i.e. the kernel was relocated), relocation handling needs be done.

On 64-bit, this patch randomizes both the physical address where kernel
is decompressed and the virtual address where kernel text is mapped and
will execute from. We now have two values being chosen, so the function
arguments are reorganized to pass by pointer so they can be directly
updated. Since relocation handling only depends on the virtual address,
we must check the virtual delta, not the physical delta for processing
kernel relocations. This also populates the page table for the new
virtual address range. 32-bit does not support a separate virtual address,
so it continues to use the physical offset for its virtual offset.

Additionally updates the sanity checks done on the resulting kernel
addresses since they are potentially separate now.

[kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
[kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
Signed-off-by: Baoquan He 
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 41 +
 arch/x86/boot/compressed/misc.c  | 49 
 arch/x86/boot/compressed/misc.h  | 22 ++
 3 files changed, 64 insertions(+), 48 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 54037c9..5550546 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -463,17 +463,20 @@ static unsigned long find_random_virt_addr(unsigned long 
minimum,
  * Since this function examines addresses much more numerically,
  * it takes the input and output pointers as 'unsigned long'.
  */
-unsigned char *choose_random_location(unsigned long input,
- unsigned long input_size,
- unsigned long output,
- unsigned long output_size)
+void choose_random_location(unsigned long input,
+   unsigned long input_size,
+   unsigned long *output,
+   unsigned long output_size,
+   unsigned long *virt_addr)
 {
-   unsigned long choice = output;
unsigned long random_addr;
 
+   /* By default, keep output position unchanged. */
+   *virt_addr = *output;
+
if (cmdline_find_option_bool("nokaslr")) {
warn("KASLR disabled: 'nokaslr' on cmdline.");
-   goto out;
+   return;
}
 
boot_params->hdr.loadflags |= KASLR_FLAG;
@@ -482,25 +485,25 @@ unsigned char *choose_random_location(unsigned long input,
initialize_identity_maps();
 
/* Record the various known unsafe memory ranges. */
-   mem_avoid_init(input, input_size, output);
+   mem_avoid_init(input, input_size, *output);
 
/* Walk e820 and find a random address. */
-   random_addr = find_random_phys_addr(output, output_size);
+   random_addr = find_random_phys_addr(*output, output_size);
if (!random_addr) {
warn("KASLR disabled: could not find suitable E820 region!");
-   goto out;
+   } else {
+   /* Update the new physical address location. */
+   if (*output != random_addr) {
+   add_identity_map(random_addr, output_size);
+   *output = random_addr;
+   }
}
 
-   /* Always enforce the minimum. */
-   if (random_addr < choice)
-   goto out;
-
-   choice = random_addr;
-
-   add_identity_map(choice, output_size);
-
/* This actually loads the identity pagetable on x86_64. */
finalize_identity_maps();
-out:
-   return (unsigned char *)choice;
+
+   /* Pick random virtual address starting from LOAD_PHYSICAL_ADDR. */
+   if (IS_ENABLED(CONFIG_X86_64))
+   random_addr = find_random_virt_addr(LOAD_PHYSICAL_ADDR, 
output_size);
+

[tip:x86/boot] x86/KASLR: Add virtual address choosing function

2016-05-10 Thread tip-bot for Baoquan He

Commit-ID:  071a74930e60d1fa51207d71f00a35b4f9d4d179
Gitweb: http://git.kernel.org/tip/071a74930e60d1fa51207d71f00a35b4f9d4d179
Author: Baoquan He 
AuthorDate: Mon, 9 May 2016 13:22:08 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 10 May 2016 10:12:06 +0200

x86/KASLR: Add virtual address choosing function

To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_addr() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.

For clarity and readability, find_random_addr() is renamed to
find_random_phys_addr() and has "size" renamed to "image_size" to match
find_random_virt_addr().

Signed-off-by: Baoquan He 
[ Rewrote changelog, refactored slot calculation for readability. ]
[ Renamed find_random_phys_addr() and size argument. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Yinghai Lu 
Cc: kernel-harden...@lists.openwall.com
Cc: lasse.col...@tukaani.org
Link: 
http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index e55ebcb..016a4f4 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -417,8 +417,8 @@ static void process_e820_entry(struct e820entry *entry,
}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
- unsigned long size)
+static unsigned long find_random_phys_addr(unsigned long minimum,
+  unsigned long image_size)
 {
int i;
unsigned long addr;
@@ -428,12 +428,36 @@ static unsigned long find_random_addr(unsigned long 
minimum,
 
/* Verify potential e820 positions, appending to slots list. */
for (i = 0; i < boot_params->e820_entries; i++) {
-   process_e820_entry(_params->e820_map[i], minimum, size);
+   process_e820_entry(_params->e820_map[i], minimum,
+  image_size);
}
 
return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_addr(unsigned long minimum,
+  unsigned long image_size)
+{
+   unsigned long slots, random_addr;
+
+   /* Make sure minimum is aligned. */
+   minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+   /* Align image_size for easy slot calculations. */
+   image_size = ALIGN(image_size, CONFIG_PHYSICAL_ALIGN);
+
+   /*
+* There are how many CONFIG_PHYSICAL_ALIGN-sized slots
+* that can hold image_size within the range of minimum to
+* KERNEL_IMAGE_SIZE?
+*/
+   slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
+CONFIG_PHYSICAL_ALIGN + 1;
+
+   random_addr = get_random_long() % slots;
+
+   return random_addr * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 /*
  * Since this function examines addresses much more numerically,
  * it takes the input and output pointers as 'unsigned long'.
@@ -464,7 +488,7 @@ unsigned char *choose_random_location(unsigned long input,
mem_avoid_init(input, input_size, output);
 
/* Walk e820 and find a random address. */
-   random_addr = find_random_addr(output, output_size);
+   random_addr = find_random_phys_addr(output, output_size);
if (!random_addr) {
warn("KASLR disabled: could not find suitable E820 region!");
goto out;

[tip:x86/boot] x86/KASLR: Add virtual address choosing function

2016-05-10 Thread tip-bot for Baoquan He

Commit-ID:  071a74930e60d1fa51207d71f00a35b4f9d4d179
Gitweb: http://git.kernel.org/tip/071a74930e60d1fa51207d71f00a35b4f9d4d179
Author: Baoquan He 
AuthorDate: Mon, 9 May 2016 13:22:08 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 10 May 2016 10:12:06 +0200

x86/KASLR: Add virtual address choosing function

To support randomizing the kernel virtual address separately from the
physical address, this patch adds find_random_virt_addr() to choose
a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
Since this address is virtual, not physical, we can place the kernel
anywhere in this region, as long as it is aligned and (in the case of
kernel being larger than the slot size) placed with enough room to load
the entire kernel image.

For clarity and readability, find_random_addr() is renamed to
find_random_phys_addr() and has "size" renamed to "image_size" to match
find_random_virt_addr().

Signed-off-by: Baoquan He 
[ Rewrote changelog, refactored slot calculation for readability. ]
[ Renamed find_random_phys_addr() and size argument. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Yinghai Lu 
Cc: kernel-harden...@lists.openwall.com
Cc: lasse.col...@tukaani.org
Link: 
http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 32 
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index e55ebcb..016a4f4 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -417,8 +417,8 @@ static void process_e820_entry(struct e820entry *entry,
}
 }
 
-static unsigned long find_random_addr(unsigned long minimum,
- unsigned long size)
+static unsigned long find_random_phys_addr(unsigned long minimum,
+  unsigned long image_size)
 {
int i;
unsigned long addr;
@@ -428,12 +428,36 @@ static unsigned long find_random_addr(unsigned long 
minimum,
 
/* Verify potential e820 positions, appending to slots list. */
for (i = 0; i < boot_params->e820_entries; i++) {
-   process_e820_entry(_params->e820_map[i], minimum, size);
+   process_e820_entry(_params->e820_map[i], minimum,
+  image_size);
}
 
return slots_fetch_random();
 }
 
+static unsigned long find_random_virt_addr(unsigned long minimum,
+  unsigned long image_size)
+{
+   unsigned long slots, random_addr;
+
+   /* Make sure minimum is aligned. */
+   minimum = ALIGN(minimum, CONFIG_PHYSICAL_ALIGN);
+   /* Align image_size for easy slot calculations. */
+   image_size = ALIGN(image_size, CONFIG_PHYSICAL_ALIGN);
+
+   /*
+* There are how many CONFIG_PHYSICAL_ALIGN-sized slots
+* that can hold image_size within the range of minimum to
+* KERNEL_IMAGE_SIZE?
+*/
+   slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
+CONFIG_PHYSICAL_ALIGN + 1;
+
+   random_addr = get_random_long() % slots;
+
+   return random_addr * CONFIG_PHYSICAL_ALIGN + minimum;
+}
+
 /*
  * Since this function examines addresses much more numerically,
  * it takes the input and output pointers as 'unsigned long'.
@@ -464,7 +488,7 @@ unsigned char *choose_random_location(unsigned long input,
mem_avoid_init(input, input_size, output);
 
/* Walk e820 and find a random address. */
-   random_addr = find_random_addr(output, output_size);
+   random_addr = find_random_phys_addr(output, output_size);
if (!random_addr) {
warn("KASLR disabled: could not find suitable E820 region!");
goto out;

[tip:x86/boot] x86/KASLR: Add 'struct slot_area' to manage random_addr slots

2016-05-10 Thread tip-bot for Baoquan He

Commit-ID:  c401cf1524153f9c2ede7ab8ece403513925770a
Gitweb: http://git.kernel.org/tip/c401cf1524153f9c2ede7ab8ece403513925770a
Author: Baoquan He 
AuthorDate: Mon, 9 May 2016 13:22:06 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 10 May 2016 10:12:04 +0200

x86/KASLR: Add 'struct slot_area' to manage random_addr slots

In order to support KASLR moving the kernel anywhere in physical memory
(which could be up to 64TB), we need to handle counting the potential
randomization locations in a more efficient manner.

In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x100. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time. This method would cost too much memory and it's
also very inefficient to get and save the slot information into the slot
array one by one.

This patch introduces 'struct slot_area' to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas[] will avoid the mem_avoid[] regions.

Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this level of fragmentation will be extremely rare, but we
choose a suitably large value (100) for the array. If setup_data forces
the slot_area array to become highly fragmented and there are more
slots available beyond the first 100 found, the rest will be ignored
for KASLR selection.

The function store_slot_info() is used to calculate the number of slots
available in the passed-in memory region and stores it into slot_areas[]
after adjusting for alignment and size requirements.

Signed-off-by: Baoquan He 
[ Rewrote changelog, squashed with new functions. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Yinghai Lu 
Cc: kernel-harden...@lists.openwall.com
Cc: lasse.col...@tukaani.org
Link: 
http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index f15d7b8..81edf99 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -308,8 +308,37 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 }
 
 static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+   unsigned long addr;
+   int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
+static void store_slot_info(struct mem_vector *region, unsigned long 
image_size)
+{
+   struct slot_area slot_area;
+
+   if (slot_area_index == MAX_SLOT_AREA)
+   return;
+
+   slot_area.addr = region->start;
+   slot_area.num = (region->size - image_size) /
+   CONFIG_PHYSICAL_ALIGN + 1;
+
+   if (slot_area.num > 0) {
+   slot_areas[slot_area_index++] = slot_area;
+   slot_max += slot_area.num;
+   }
+}
+
 static void slots_append(unsigned long addr)
 {
/* Overflowing the slots list should be impossible. */

[tip:x86/boot] x86/KASLR: Add 'struct slot_area' to manage random_addr slots

2016-05-10 Thread tip-bot for Baoquan He

Commit-ID:  c401cf1524153f9c2ede7ab8ece403513925770a
Gitweb: http://git.kernel.org/tip/c401cf1524153f9c2ede7ab8ece403513925770a
Author: Baoquan He 
AuthorDate: Mon, 9 May 2016 13:22:06 -0700
Committer:  Ingo Molnar 
CommitDate: Tue, 10 May 2016 10:12:04 +0200

x86/KASLR: Add 'struct slot_area' to manage random_addr slots

In order to support KASLR moving the kernel anywhere in physical memory
(which could be up to 64TB), we need to handle counting the potential
randomization locations in a more efficient manner.

In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
randomization slots if CONFIG_PHYSICAL_ALIGN is 0x100. Currently
the starting address of candidate positions is stored into the slots[]
array, one at a time. This method would cost too much memory and it's
also very inefficient to get and save the slot information into the slot
array one by one.

This patch introduces 'struct slot_area' to manage each contiguous region
of randomization slots. Each slot_area will contain the starting address
and how many available slots are in this area. As with the original code,
the slot_areas[] will avoid the mem_avoid[] regions.

Since setup_data is a linked list, it could contain an unknown number
of memory regions to be avoided, which could cause us to fragment
the contiguous memory that the slot_area array is tracking. In normal
operation this level of fragmentation will be extremely rare, but we
choose a suitably large value (100) for the array. If setup_data forces
the slot_area array to become highly fragmented and there are more
slots available beyond the first 100 found, the rest will be ignored
for KASLR selection.

The function store_slot_info() is used to calculate the number of slots
available in the passed-in memory region and stores it into slot_areas[]
after adjusting for alignment and size requirements.

Signed-off-by: Baoquan He 
[ Rewrote changelog, squashed with new functions. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Yinghai Lu 
Cc: kernel-harden...@lists.openwall.com
Cc: lasse.col...@tukaani.org
Link: 
http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index f15d7b8..81edf99 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -308,8 +308,37 @@ static bool mem_avoid_overlap(struct mem_vector *img)
 }
 
 static unsigned long slots[KERNEL_IMAGE_SIZE / CONFIG_PHYSICAL_ALIGN];
+
+struct slot_area {
+   unsigned long addr;
+   int num;
+};
+
+#define MAX_SLOT_AREA 100
+
+static struct slot_area slot_areas[MAX_SLOT_AREA];
+
 static unsigned long slot_max;
 
+static unsigned long slot_area_index;
+
+static void store_slot_info(struct mem_vector *region, unsigned long 
image_size)
+{
+   struct slot_area slot_area;
+
+   if (slot_area_index == MAX_SLOT_AREA)
+   return;
+
+   slot_area.addr = region->start;
+   slot_area.num = (region->size - image_size) /
+   CONFIG_PHYSICAL_ALIGN + 1;
+
+   if (slot_area.num > 0) {
+   slot_areas[slot_area_index++] = slot_area;
+   slot_max += slot_area.num;
+   }
+}
+
 static void slots_append(unsigned long addr)
 {
/* Overflowing the slots list should be impossible. */

[tip:x86/boot] x86/KASLR: Handle kernel relocations above 2G correctly

2016-04-29 Thread tip-bot for Baoquan He

Commit-ID:  6f9af75faa1df61e1ee5bea8a787a90605bb528d
Gitweb: http://git.kernel.org/tip/6f9af75faa1df61e1ee5bea8a787a90605bb528d
Author: Baoquan He 
AuthorDate: Thu, 28 Apr 2016 17:09:03 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 29 Apr 2016 09:58:26 +0200

x86/KASLR: Handle kernel relocations above 2G correctly

When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.

To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.

Signed-off-by: Baoquan He 
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Yinghai Lu 
Cc: lasse.col...@tukaani.org
Link: 
http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 

Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6dde6cc..4514514 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -232,7 +232,7 @@ static void handle_relocations(void *output, unsigned long 
output_len)
 * So we work backwards from the end of the decompressed image.
 */
for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-   int extended = *reloc;
+   long extended = *reloc;
extended += map;
 
ptr = (unsigned long)extended;

[tip:x86/boot] x86/KASLR: Handle kernel relocations above 2G correctly

2016-04-29 Thread tip-bot for Baoquan He

Commit-ID:  6f9af75faa1df61e1ee5bea8a787a90605bb528d
Gitweb: http://git.kernel.org/tip/6f9af75faa1df61e1ee5bea8a787a90605bb528d
Author: Baoquan He 
AuthorDate: Thu, 28 Apr 2016 17:09:03 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 29 Apr 2016 09:58:26 +0200

x86/KASLR: Handle kernel relocations above 2G correctly

When processing the relocation table, the offset used to calculate the
relocation is an 'int'. This is sufficient for calculating the physical
address of the relocs entry on 32-bit systems and on 64-bit systems when
the relocation is under 2G.

To handle relocations above 2G (seen in situations like kexec, netboot, etc),
this offset needs to be calculated using a 'long' to avoid wrapping and
miscalculating the relocation.

Signed-off-by: Baoquan He 
[ Rewrote the changelog. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Dave Young 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: Yinghai Lu 
Cc: lasse.col...@tukaani.org
Link: 
http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 

Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 6dde6cc..4514514 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -232,7 +232,7 @@ static void handle_relocations(void *output, unsigned long 
output_len)
 * So we work backwards from the end of the decompressed image.
 */
for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
-   int extended = *reloc;
+   long extended = *reloc;
extended += map;
 
ptr = (unsigned long)extended;

[tip:x86/boot] x86/KASLR: Update description for decompressor worst case size

2016-04-22 Thread tip-bot for Baoquan He

Commit-ID:  4252db10559fc3d1efc1e43613254fdd220b014b
Gitweb: http://git.kernel.org/tip/4252db10559fc3d1efc1e43613254fdd220b014b
Author: Baoquan He 
AuthorDate: Wed, 20 Apr 2016 13:55:42 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 22 Apr 2016 10:00:50 +0200

x86/KASLR: Update description for decompressor worst case size

The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are currently 6 other decompressors
that could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation, fixes several typos, moves
the comment to header.S, updates references, and adds a note at the end
of the decompressor include list to remind us about updating the comment
in the future.

(Instead of moving the comment to mkpiggy.c, where the calculation
is currently happening, it is being moved to header.S because
the calculations in mkpiggy.c will be removed in favor of header.S
calculations in a following patch, and it seemed like overkill to move
the giant comment twice, especially when there's already reference to
z_extract_offset in header.S.)

Signed-off-by: Baoquan He 
[ Rewrote changelog, cleaned up comment style, moved comments around. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1461185746-8017-2-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c |  2 +-
 arch/x86/boot/compressed/misc.c  | 89 
 arch/x86/boot/header.S   | 88 +++
 3 files changed, 97 insertions(+), 82 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 9c29e78..7d86c5d 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -155,7 +155,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
 
/*
 * Avoid the region that is unsafe to overlap during
-* decompression (see calculations at top of misc.c).
+* decompression (see calculations in ../header.S).
 */
unsafe_len = (output_size >> 12) + 32768 + 18;
unsafe = (unsigned long)input + input_size - unsafe_len;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index ad8c01a..e96829b 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -14,90 +14,13 @@
 #include "misc.h"
 #include "../string.h"
 
-/* WARNING!!
- * This code is compiled with -fPIC and it is relocated dynamically
- * at run time, but no relocation processing is performed.
- * This means that it is not safe to place pointers in static structures.
- */
-
 /*
- * Getting to provable safe in place decompression is hard.
- * Worst case behaviours need to be analyzed.
- * Background information:
- *
- * The file layout is:
- *magic[2]
- *method[1]
- *flags[1]
- *timestamp[4]
- *extraflags[1]
- *os[1]
- *compressed data blocks[N]
- *crc[4] orig_len[4]
- *
- * resulting in 18 bytes of non compressed data overhead.
- *
- * Files divided into blocks
- * 1 bit (last block flag)
- * 2 bits (block type)
- *
- * 1 block occurs every 32K -1 bytes or when there 50% compression
- * has been achieved. The smallest block type encoding is always used.
- *
- * stored:
- *32 bits length in bytes.
- *
- * fixed:
- *magic fixed tree.
- *symbols.
- *
- * dynamic:
- *dynamic tree encoding.
- *symbols.
- *
- *
- * The buffer for decompression in place is the length of the
- * uncompressed data, plus a small amount extra to keep the algorithm safe.
- * The compressed data is placed at the end of the buffer.  The output
- * pointer is placed at the start of the buffer and the input pointer
- * is placed where the compressed data starts.  Problems will occur
- * when the output pointer overruns the input pointer.
- *
- * The output pointer can only overrun the input pointer if the input
- * pointer is moving faster than the output pointer.  A condition only
- * triggered by data whose compressed form is larger than the uncompressed
- * form.
- *
- *

[tip:x86/boot] x86/KASLR: Update description for decompressor worst case size

2016-04-22 Thread tip-bot for Baoquan He

Commit-ID:  4252db10559fc3d1efc1e43613254fdd220b014b
Gitweb: http://git.kernel.org/tip/4252db10559fc3d1efc1e43613254fdd220b014b
Author: Baoquan He 
AuthorDate: Wed, 20 Apr 2016 13:55:42 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 22 Apr 2016 10:00:50 +0200

x86/KASLR: Update description for decompressor worst case size

The comment that describes the analysis for the size of the decompressor
code only took gzip into account (there are currently 6 other decompressors
that could be used). The actual z_extract_offset calculation in code was
already handling the correct maximum size, but this documentation hadn't
been updated. This updates the documentation, fixes several typos, moves
the comment to header.S, updates references, and adds a note at the end
of the decompressor include list to remind us about updating the comment
in the future.

(Instead of moving the comment to mkpiggy.c, where the calculation
is currently happening, it is being moved to header.S because
the calculations in mkpiggy.c will be removed in favor of header.S
calculations in a following patch, and it seemed like overkill to move
the giant comment twice, especially when there's already reference to
z_extract_offset in header.S.)

Signed-off-by: Baoquan He 
[ Rewrote changelog, cleaned up comment style, moved comments around. ]
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1461185746-8017-2-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/boot/compressed/kaslr.c |  2 +-
 arch/x86/boot/compressed/misc.c  | 89 
 arch/x86/boot/header.S   | 88 +++
 3 files changed, 97 insertions(+), 82 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 9c29e78..7d86c5d 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -155,7 +155,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
 
/*
 * Avoid the region that is unsafe to overlap during
-* decompression (see calculations at top of misc.c).
+* decompression (see calculations in ../header.S).
 */
unsafe_len = (output_size >> 12) + 32768 + 18;
unsafe = (unsigned long)input + input_size - unsafe_len;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index ad8c01a..e96829b 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -14,90 +14,13 @@
 #include "misc.h"
 #include "../string.h"
 
-/* WARNING!!
- * This code is compiled with -fPIC and it is relocated dynamically
- * at run time, but no relocation processing is performed.
- * This means that it is not safe to place pointers in static structures.
- */
-
 /*
- * Getting to provable safe in place decompression is hard.
- * Worst case behaviours need to be analyzed.
- * Background information:
- *
- * The file layout is:
- *magic[2]
- *method[1]
- *flags[1]
- *timestamp[4]
- *extraflags[1]
- *os[1]
- *compressed data blocks[N]
- *crc[4] orig_len[4]
- *
- * resulting in 18 bytes of non compressed data overhead.
- *
- * Files divided into blocks
- * 1 bit (last block flag)
- * 2 bits (block type)
- *
- * 1 block occurs every 32K -1 bytes or when there 50% compression
- * has been achieved. The smallest block type encoding is always used.
- *
- * stored:
- *32 bits length in bytes.
- *
- * fixed:
- *magic fixed tree.
- *symbols.
- *
- * dynamic:
- *dynamic tree encoding.
- *symbols.
- *
- *
- * The buffer for decompression in place is the length of the
- * uncompressed data, plus a small amount extra to keep the algorithm safe.
- * The compressed data is placed at the end of the buffer.  The output
- * pointer is placed at the start of the buffer and the input pointer
- * is placed where the compressed data starts.  Problems will occur
- * when the output pointer overruns the input pointer.
- *
- * The output pointer can only overrun the input pointer if the input
- * pointer is moving faster than the output pointer.  A condition only
- * triggered by data whose compressed form is larger than the uncompressed
- * form.
- *
- * The worst case at the block level is a growth of the compressed data
- * of 5 bytes per 32767 bytes.
- *
- * The worst case internal to a compressed block is very hard to figure.
- * The worst case can at least be boundined by having one bit that represents
- * 32764 bytes and then all of the rest of the bytes representing the very
- * very last byte.
- *
- * All of which is enough to compute an amount of extra data that

[tip:x86/boot] x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET

2016-04-22 Thread tip-bot for Baoquan He

Commit-ID:  e8581e3d67788b6b29d055fa42c6cb5b258fee64
Gitweb: http://git.kernel.org/tip/e8581e3d67788b6b29d055fa42c6cb5b258fee64
Author: Baoquan He 
AuthorDate: Wed, 20 Apr 2016 13:55:43 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 22 Apr 2016 10:00:50 +0200

x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET

Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He 
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/Kconfig | 72 ++--
 arch/x86/boot/compressed/kaslr.c | 12 +++---
 arch/x86/include/asm/page_64_types.h |  8 ++--
 arch/x86/mm/init_32.c|  3 --
 4 files changed, 36 insertions(+), 59 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605..5892d54 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1932,54 +1932,38 @@ config RELOCATABLE
  (CONFIG_PHYSICAL_START) is used as the minimum location.
 
 config RANDOMIZE_BASE
-   bool "Randomize the address of the kernel image"
+   bool "Randomize the address of the kernel image (KASLR)"
depends on RELOCATABLE
default n
---help---
-  Randomizes the physical and virtual address at which the
-  kernel image is decompressed, as a security feature that
-  deters exploit attempts relying on knowledge of the location
-  of kernel internals.
+ In support of Kernel Address Space Layout Randomization (KASLR),
+ this randomizes the physical address at which the kernel image
+ is decompressed and the virtual address where the kernel
+ image is mapped, as a security feature that deters exploit
+ attempts relying on knowledge of the location of kernel
+ code internals.
+
+ The kernel physical and virtual address can be randomized
+ from 16MB up to 1GB on 64-bit and 512MB on 32-bit. (Note that
+ using RANDOMIZE_BASE reduces the memory space available to
+ kernel modules from 1.5GB to 1GB.)
+
+ Entropy is generated using the RDRAND instruction if it is
+ supported. If RDTSC is supported, its value is mixed into
+ the entropy pool as well. If neither RDRAND nor RDTSC are
+ supported, then entropy is read from the i8254 timer.
+
+ Since the kernel is built using 2GB addressing, and
+ PHYSICAL_ALIGN must be at a minimum of 2MB, only 10 bits of
+ entropy is theoretically possible. Currently, with the
+ default value for PHYSICAL_ALIGN and due to page table
+ layouts, 64-bit uses 9 bits of entropy and 32-bit uses 8 bits.
+
+ If CONFIG_HIBERNATE is also enabled, KASLR is disabled at boot
+ time. To enable it, boot with "kaslr" on the kernel command
+ line (which will also disable hibernation).
 
-  Entropy is generated using the RDRAND instruction if it is
-  supported. If RDTSC is supported, it is used as well. If
-  neither RDRAND nor RDTSC are supported, then randomness is
-  read from the i8254 timer.
-
-  The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
-  and aligned according to PHYSICAL_ALIGN. Since the kernel is
-  built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
-  minimum of 2MiB, only 10 bits of entropy is theoretically
-  possible. At best, due to page table layouts, 64-bit can use
-  9 bits of entropy and 32-bit uses 8 bits.
-
-  If unsure, say N.
-
-config RANDOMIZE_BASE_MAX_OFFSET
-   hex "Maximum kASLR offset allowed" if EXPERT
-   depends on RANDOMIZE_BASE
-   range 0x0 0x2000 if X86_32
-   default "0x2000" if X86_32
-   range 0x0

[tip:x86/boot] x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET

2016-04-22 Thread tip-bot for Baoquan He

Commit-ID:  e8581e3d67788b6b29d055fa42c6cb5b258fee64
Gitweb: http://git.kernel.org/tip/e8581e3d67788b6b29d055fa42c6cb5b258fee64
Author: Baoquan He 
AuthorDate: Wed, 20 Apr 2016 13:55:43 -0700
Committer:  Ingo Molnar 
CommitDate: Fri, 22 Apr 2016 10:00:50 +0200

x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET

Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He 
Signed-off-by: Kees Cook 
Cc: Andrew Morton 
Cc: Andrey Ryabinin 
Cc: Andy Lutomirski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: Dmitry Vyukov 
Cc: H. Peter Anvin 
Cc: H.J. Lu 
Cc: Josh Poimboeuf 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Yinghai Lu 
Link: 
http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keesc...@chromium.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/Kconfig | 72 ++--
 arch/x86/boot/compressed/kaslr.c | 12 +++---
 arch/x86/include/asm/page_64_types.h |  8 ++--
 arch/x86/mm/init_32.c|  3 --
 4 files changed, 36 insertions(+), 59 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605..5892d54 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1932,54 +1932,38 @@ config RELOCATABLE
  (CONFIG_PHYSICAL_START) is used as the minimum location.
 
 config RANDOMIZE_BASE
-   bool "Randomize the address of the kernel image"
+   bool "Randomize the address of the kernel image (KASLR)"
depends on RELOCATABLE
default n
---help---
-  Randomizes the physical and virtual address at which the
-  kernel image is decompressed, as a security feature that
-  deters exploit attempts relying on knowledge of the location
-  of kernel internals.
+ In support of Kernel Address Space Layout Randomization (KASLR),
+ this randomizes the physical address at which the kernel image
+ is decompressed and the virtual address where the kernel
+ image is mapped, as a security feature that deters exploit
+ attempts relying on knowledge of the location of kernel
+ code internals.
+
+ The kernel physical and virtual address can be randomized
+ from 16MB up to 1GB on 64-bit and 512MB on 32-bit. (Note that
+ using RANDOMIZE_BASE reduces the memory space available to
+ kernel modules from 1.5GB to 1GB.)
+
+ Entropy is generated using the RDRAND instruction if it is
+ supported. If RDTSC is supported, its value is mixed into
+ the entropy pool as well. If neither RDRAND nor RDTSC are
+ supported, then entropy is read from the i8254 timer.
+
+ Since the kernel is built using 2GB addressing, and
+ PHYSICAL_ALIGN must be at a minimum of 2MB, only 10 bits of
+ entropy is theoretically possible. Currently, with the
+ default value for PHYSICAL_ALIGN and due to page table
+ layouts, 64-bit uses 9 bits of entropy and 32-bit uses 8 bits.
+
+ If CONFIG_HIBERNATE is also enabled, KASLR is disabled at boot
+ time. To enable it, boot with "kaslr" on the kernel command
+ line (which will also disable hibernation).
 
-  Entropy is generated using the RDRAND instruction if it is
-  supported. If RDTSC is supported, it is used as well. If
-  neither RDRAND nor RDTSC are supported, then randomness is
-  read from the i8254 timer.
-
-  The kernel will be offset by up to RANDOMIZE_BASE_MAX_OFFSET,
-  and aligned according to PHYSICAL_ALIGN. Since the kernel is
-  built using 2GiB addressing, and PHYSICAL_ALGIN must be at a
-  minimum of 2MiB, only 10 bits of entropy is theoretically
-  possible. At best, due to page table layouts, 64-bit can use
-  9 bits of entropy and 32-bit uses 8 bits.
-
-  If unsure, say N.
-
-config RANDOMIZE_BASE_MAX_OFFSET
-   hex "Maximum kASLR offset allowed" if EXPERT
-   depends on RANDOMIZE_BASE
-   range 0x0 0x2000 if X86_32
-   default "0x2000" if X86_32
-   range 0x0 0x4000 if X86_64
-   default "0x4000" if X86_64
-   ---help---
- The lesser of RANDOMIZE_BASE_MAX_OFFSET and available physical
- memory is used to determine the maximal offset in bytes that will
- be applied to the kernel when kernel Address Space Layout
- Randomization (kASLR) is active. This must be a multiple of
- PHYSICAL_ALIGN.
-
- On 32-bit this is limited to

[tip:ras/core] x86/setup: Do not reserve crashkernel high memory if low reservation failed

2015-10-21 Thread tip-bot for Baoquan He

Commit-ID:  eb6db83d105914c246ac5875be76fd4b944833d5
Gitweb: http://git.kernel.org/tip/eb6db83d105914c246ac5875be76fd4b944833d5
Author: Baoquan He 
AuthorDate: Mon, 19 Oct 2015 11:17:41 +0200
Committer:  Ingo Molnar 
CommitDate: Wed, 21 Oct 2015 11:10:55 +0200

x86/setup: Do not reserve crashkernel high memory if low reservation failed

People reported that when allocating crashkernel memory using
the ",high" and ",low" syntax, there were cases where the
reservation of the high portion succeeds but the reservation of
the low portion fails.

Then kexec can load the kdump kernel successfully, but booting
the kdump kernel fails as there's no low memory.

The low memory allocation for the kdump kernel can fail on large
systems for a couple of reasons. For example, the manually
specified crashkernel low memory can be too large and thus no
adequate memblock region would be found.

Therefore, we try to reserve low memory for the crash kernel
*after* the high memory portion has been allocated. If that
fails, we free crashkernel high memory too and return. The user
can then take measures accordingly.

Tested-by: Joerg Roedel 
Signed-off-by: Baoquan He 
[ Massage text. ]
Signed-off-by: Borislav Petkov 
Reviewed-by: Joerg Roedel 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Dave Young 
Cc: H. Peter Anvin 
Cc: Jiri Kosina 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Mark Salter 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: WANG Chao 
Cc: jerry_hoem...@hp.com
Cc: ying...@kernel.org
Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/setup.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fdb7f2a..1b36839 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -493,7 +493,7 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_KERNEL_ADDR_HIGH_MAXMAXMEM
 #endif
 
-static void __init reserve_crashkernel_low(void)
+static int __init reserve_crashkernel_low(void)
 {
 #ifdef CONFIG_X86_64
const unsigned long long alignment = 16<<20;/* 16M */
@@ -522,17 +522,16 @@ static void __init reserve_crashkernel_low(void)
} else {
/* passed with crashkernel=0,low ? */
if (!low_size)
-   return;
+   return 0;
}
 
low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);
 
if (!low_base) {
-   if (!auto_set)
-   pr_info("crashkernel low reservation failed - No 
suitable area found.\n");
-
-   return;
+   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
smaller size.\n",
+  (unsigned long)(low_size >> 20));
+   return -ENOMEM;
}
 
memblock_reserve(low_base, low_size);
@@ -544,6 +543,7 @@ static void __init reserve_crashkernel_low(void)
crashk_low_res.end   = low_base + low_size - 1;
insert_resource(_resource, _low_res);
 #endif
+   return 0;
 }
 
 static void __init reserve_crashkernel(void)
@@ -595,6 +595,11 @@ static void __init reserve_crashkernel(void)
}
memblock_reserve(crash_base, crash_size);
 
+   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
+   memblock_free(crash_base, crash_size);
+   return;
+   }
+
printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
"for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
@@ -604,9 +609,6 @@ static void __init reserve_crashkernel(void)
crashk_res.start = crash_base;
crashk_res.end   = crash_base + crash_size - 1;
insert_resource(_resource, _res);
-
-   if (crash_base >= (1ULL<<32))
-   reserve_crashkernel_low();
 }
 #else
 static void __init reserve_crashkernel(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:ras/core] x86/setup: Do not reserve crashkernel high memory if low reservation failed

2015-10-21 Thread tip-bot for Baoquan He

Commit-ID:  eb6db83d105914c246ac5875be76fd4b944833d5
Gitweb: http://git.kernel.org/tip/eb6db83d105914c246ac5875be76fd4b944833d5
Author: Baoquan He 
AuthorDate: Mon, 19 Oct 2015 11:17:41 +0200
Committer:  Ingo Molnar 
CommitDate: Wed, 21 Oct 2015 11:10:55 +0200

x86/setup: Do not reserve crashkernel high memory if low reservation failed

People reported that when allocating crashkernel memory using
the ",high" and ",low" syntax, there were cases where the
reservation of the high portion succeeds but the reservation of
the low portion fails.

Then kexec can load the kdump kernel successfully, but booting
the kdump kernel fails as there's no low memory.

The low memory allocation for the kdump kernel can fail on large
systems for a couple of reasons. For example, the manually
specified crashkernel low memory can be too large and thus no
adequate memblock region would be found.

Therefore, we try to reserve low memory for the crash kernel
*after* the high memory portion has been allocated. If that
fails, we free crashkernel high memory too and return. The user
can then take measures accordingly.

Tested-by: Joerg Roedel 
Signed-off-by: Baoquan He 
[ Massage text. ]
Signed-off-by: Borislav Petkov 
Reviewed-by: Joerg Roedel 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Dave Young 
Cc: H. Peter Anvin 
Cc: Jiri Kosina 
Cc: Juergen Gross 
Cc: Linus Torvalds 
Cc: Mark Salter 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: WANG Chao 
Cc: jerry_hoem...@hp.com
Cc: ying...@kernel.org
Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email...@alien8.de
Signed-off-by: Ingo Molnar 
---
 arch/x86/kernel/setup.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index fdb7f2a..1b36839 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -493,7 +493,7 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_KERNEL_ADDR_HIGH_MAXMAXMEM
 #endif
 
-static void __init reserve_crashkernel_low(void)
+static int __init reserve_crashkernel_low(void)
 {
 #ifdef CONFIG_X86_64
const unsigned long long alignment = 16<<20;/* 16M */
@@ -522,17 +522,16 @@ static void __init reserve_crashkernel_low(void)
} else {
/* passed with crashkernel=0,low ? */
if (!low_size)
-   return;
+   return 0;
}
 
low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);
 
if (!low_base) {
-   if (!auto_set)
-   pr_info("crashkernel low reservation failed - No 
suitable area found.\n");
-
-   return;
+   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
smaller size.\n",
+  (unsigned long)(low_size >> 20));
+   return -ENOMEM;
}
 
memblock_reserve(low_base, low_size);
@@ -544,6 +543,7 @@ static void __init reserve_crashkernel_low(void)
crashk_low_res.end   = low_base + low_size - 1;
insert_resource(_resource, _low_res);
 #endif
+   return 0;
 }
 
 static void __init reserve_crashkernel(void)
@@ -595,6 +595,11 @@ static void __init reserve_crashkernel(void)
}
memblock_reserve(crash_base, crash_size);
 
+   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
+   memblock_free(crash_base, crash_size);
+   return;
+   }
+
printk(KERN_INFO "Reserving %ldMB of memory at %ldMB "
"for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
@@ -604,9 +609,6 @@ static void __init reserve_crashkernel(void)
crashk_res.start = crash_base;
crashk_res.end   = crash_base + crash_size - 1;
insert_resource(_resource, _res);
-
-   if (crash_base >= (1ULL<<32))
-   reserve_crashkernel_low();
 }
 #else
 static void __init reserve_crashkernel(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

88 matches

Mail list logo