Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-27 Thread Kirill A. Shutemov
On Thu, Jul 26, 2018 at 04:21:11PM +, Dmitry Malkin wrote:
> On 07/26/2018 04:50 PM, Kirill A. Shutemov wrote:
> > > > > 2. reading from memory which may be reserved in case of EFI systems:
> > > > > > >      ebda_start = *(unsigned short *)0x40e << 4;
> > > > > > >      bios_start = *(unsigned short *)0x413 << 10;
> > > > > Also, on EFI system without CSM it will results in all zeros. Which 
> > > > > will
> > > > > place trampoline_start to 0x9d000. And it also may be reserved 
> > > > > memory. In
> > > > > fact I have such system and it is causes instant reboot (when code 
> > > > > starts
> > > > > copying to "trampoline_start").
> > > > Could you show dmesg from such system?
> > > Sure, here it is (please note than not both pages are reserved but only
> > > second one: 0x9e000-0x9):
> > Well. That's bad.
> > 
> > I don't see much options but parse e820 in decompression code. I hoped to
> > avoid this.
> > 
> > Let me see what I can do there.
> Just in case of UEFI (I don't know much about BIOS and kexec):
> register RSI (right before call paging_prepare) will contains pointer to
> "struct boot_params" (returned by function efi_main() in eboot.c).
> There are fields e820_table and e820_entries.
> 

Could you check if this makes a difference for you?

diff --git a/arch/x86/boot/compressed/pgtable_64.c 
b/arch/x86/boot/compressed/pgtable_64.c
index 8c5107545251..9e2157371491 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -1,3 +1,4 @@
+#include 
 #include 
 #include "pgtable.h"
 #include "../string.h"
@@ -34,10 +35,62 @@ unsigned long *trampoline_32bit __section(.data);
 extern struct boot_params *boot_params;
 int cmdline_find_option_bool(const char *option);
 
+static unsigned long find_trampoline_placement(void)
+{
+   unsigned long bios_start, ebda_start;
+   unsigned long trampoline_start;
+   struct boot_e820_entry *entry;
+   int i;
+
+   /*
+* Find a suitable spot for the trampoline.
+* This code is based on reserve_bios_regions().
+*/
+
+   ebda_start = *(unsigned short *)0x40e << 4;
+   bios_start = *(unsigned short *)0x413 << 10;
+
+   if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+   bios_start = BIOS_START_MAX;
+
+   if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+   bios_start = ebda_start;
+
+   bios_start = round_down(bios_start, PAGE_SIZE);
+
+   /* Find the first usable memory region under bios_start. */
+   for (i = boot_params->e820_entries - 1; i >= 0; i--) {
+   entry = _params->e820_table[i];
+
+   /* Skip all entries above bios_start. */
+   if (bios_start <= entry->addr)
+   continue;
+
+   /* Skip non-RAM entries. */
+   if (entry->type != E820_TYPE_RAM)
+   continue;
+
+   /* Adjust bios_start to the end of the entry if needed. */
+   if (bios_start > entry->addr + entry->size)
+   bios_start = entry->addr + entry->size;
+
+   /* Keep bios_start page-aligned. */
+   bios_start = round_down(bios_start, PAGE_SIZE);
+
+   /* Skip the entry if it's too small. */
+   if (bios_start - TRAMPOLINE_32BIT_SIZE < entry->addr)
+   continue;
+
+   break;
+   }
+
+   /* Place the trampoline just below the end of low memory */
+   return bios_start - TRAMPOLINE_32BIT_SIZE;
+}
+
 struct paging_config paging_prepare(void *rmode)
 {
struct paging_config paging_config = {};
-   unsigned long bios_start, ebda_start;
 
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
boot_params = rmode;
@@ -61,23 +114,7 @@ struct paging_config paging_prepare(void *rmode)
paging_config.l5_required = 1;
}
 
-   /*
-* Find a suitable spot for the trampoline.
-* This code is based on reserve_bios_regions().
-*/
-
-   ebda_start = *(unsigned short *)0x40e << 4;
-   bios_start = *(unsigned short *)0x413 << 10;
-
-   if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
-   bios_start = BIOS_START_MAX;
-
-   if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
-   bios_start = ebda_start;
-
-   /* Place the trampoline just below the end of low memory, aligned to 4k 
*/
-   paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
-   paging_config.trampoline_start = 
round_down(paging_config.trampoline_start, PAGE_SIZE);
+   paging_config.trampoline_start = find_trampoline_placement();
 
trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
 
-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-27 Thread Kirill A. Shutemov
On Thu, Jul 26, 2018 at 04:21:11PM +, Dmitry Malkin wrote:
> On 07/26/2018 04:50 PM, Kirill A. Shutemov wrote:
> > > > > 2. reading from memory which may be reserved in case of EFI systems:
> > > > > > >      ebda_start = *(unsigned short *)0x40e << 4;
> > > > > > >      bios_start = *(unsigned short *)0x413 << 10;
> > > > > Also, on EFI system without CSM it will results in all zeros. Which 
> > > > > will
> > > > > place trampoline_start to 0x9d000. And it also may be reserved 
> > > > > memory. In
> > > > > fact I have such system and it is causes instant reboot (when code 
> > > > > starts
> > > > > copying to "trampoline_start").
> > > > Could you show dmesg from such system?
> > > Sure, here it is (please note than not both pages are reserved but only
> > > second one: 0x9e000-0x9):
> > Well. That's bad.
> > 
> > I don't see much options but parse e820 in decompression code. I hoped to
> > avoid this.
> > 
> > Let me see what I can do there.
> Just in case of UEFI (I don't know much about BIOS and kexec):
> register RSI (right before call paging_prepare) will contains pointer to
> "struct boot_params" (returned by function efi_main() in eboot.c).
> There are fields e820_table and e820_entries.
> 

Could you check if this makes a difference for you?

diff --git a/arch/x86/boot/compressed/pgtable_64.c 
b/arch/x86/boot/compressed/pgtable_64.c
index 8c5107545251..9e2157371491 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -1,3 +1,4 @@
+#include 
 #include 
 #include "pgtable.h"
 #include "../string.h"
@@ -34,10 +35,62 @@ unsigned long *trampoline_32bit __section(.data);
 extern struct boot_params *boot_params;
 int cmdline_find_option_bool(const char *option);
 
+static unsigned long find_trampoline_placement(void)
+{
+   unsigned long bios_start, ebda_start;
+   unsigned long trampoline_start;
+   struct boot_e820_entry *entry;
+   int i;
+
+   /*
+* Find a suitable spot for the trampoline.
+* This code is based on reserve_bios_regions().
+*/
+
+   ebda_start = *(unsigned short *)0x40e << 4;
+   bios_start = *(unsigned short *)0x413 << 10;
+
+   if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
+   bios_start = BIOS_START_MAX;
+
+   if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
+   bios_start = ebda_start;
+
+   bios_start = round_down(bios_start, PAGE_SIZE);
+
+   /* Find the first usable memory region under bios_start. */
+   for (i = boot_params->e820_entries - 1; i >= 0; i--) {
+   entry = _params->e820_table[i];
+
+   /* Skip all entries above bios_start. */
+   if (bios_start <= entry->addr)
+   continue;
+
+   /* Skip non-RAM entries. */
+   if (entry->type != E820_TYPE_RAM)
+   continue;
+
+   /* Adjust bios_start to the end of the entry if needed. */
+   if (bios_start > entry->addr + entry->size)
+   bios_start = entry->addr + entry->size;
+
+   /* Keep bios_start page-aligned. */
+   bios_start = round_down(bios_start, PAGE_SIZE);
+
+   /* Skip the entry if it's too small. */
+   if (bios_start - TRAMPOLINE_32BIT_SIZE < entry->addr)
+   continue;
+
+   break;
+   }
+
+   /* Place the trampoline just below the end of low memory */
+   return bios_start - TRAMPOLINE_32BIT_SIZE;
+}
+
 struct paging_config paging_prepare(void *rmode)
 {
struct paging_config paging_config = {};
-   unsigned long bios_start, ebda_start;
 
/* Initialize boot_params. Required for cmdline_find_option_bool(). */
boot_params = rmode;
@@ -61,23 +114,7 @@ struct paging_config paging_prepare(void *rmode)
paging_config.l5_required = 1;
}
 
-   /*
-* Find a suitable spot for the trampoline.
-* This code is based on reserve_bios_regions().
-*/
-
-   ebda_start = *(unsigned short *)0x40e << 4;
-   bios_start = *(unsigned short *)0x413 << 10;
-
-   if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
-   bios_start = BIOS_START_MAX;
-
-   if (ebda_start > BIOS_START_MIN && ebda_start < bios_start)
-   bios_start = ebda_start;
-
-   /* Place the trampoline just below the end of low memory, aligned to 4k 
*/
-   paging_config.trampoline_start = bios_start - TRAMPOLINE_32BIT_SIZE;
-   paging_config.trampoline_start = 
round_down(paging_config.trampoline_start, PAGE_SIZE);
+   paging_config.trampoline_start = find_trampoline_placement();
 
trampoline_32bit = (unsigned long *)paging_config.trampoline_start;
 
-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-26 Thread Dmitry Malkin

On 07/26/2018 04:50 PM, Kirill A. Shutemov wrote:

2. reading from memory which may be reserved in case of EFI systems:

     ebda_start = *(unsigned short *)0x40e << 4;
     bios_start = *(unsigned short *)0x413 << 10;

Also, on EFI system without CSM it will results in all zeros. Which will
place trampoline_start to 0x9d000. And it also may be reserved memory. In
fact I have such system and it is causes instant reboot (when code starts
copying to "trampoline_start").

Could you show dmesg from such system?

Sure, here it is (please note than not both pages are reserved but only
second one: 0x9e000-0x9):

Well. That's bad.

I don't see much options but parse e820 in decompression code. I hoped to
avoid this.

Let me see what I can do there.

Just in case of UEFI (I don't know much about BIOS and kexec):
register RSI (right before call paging_prepare) will contains pointer to 
"struct boot_params" (returned by function efi_main() in eboot.c).

There are fields e820_table and e820_entries.



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-26 Thread Dmitry Malkin

On 07/26/2018 04:50 PM, Kirill A. Shutemov wrote:

2. reading from memory which may be reserved in case of EFI systems:

     ebda_start = *(unsigned short *)0x40e << 4;
     bios_start = *(unsigned short *)0x413 << 10;

Also, on EFI system without CSM it will results in all zeros. Which will
place trampoline_start to 0x9d000. And it also may be reserved memory. In
fact I have such system and it is causes instant reboot (when code starts
copying to "trampoline_start").

Could you show dmesg from such system?

Sure, here it is (please note than not both pages are reserved but only
second one: 0x9e000-0x9):

Well. That's bad.

I don't see much options but parse e820 in decompression code. I hoped to
avoid this.

Let me see what I can do there.

Just in case of UEFI (I don't know much about BIOS and kexec):
register RSI (right before call paging_prepare) will contains pointer to 
"struct boot_params" (returned by function efi_main() in eboot.c).

There are fields e820_table and e820_entries.



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-26 Thread Kirill A. Shutemov
On Thu, Jul 26, 2018 at 08:10:42AM +, Dmitry Malkin wrote:
> 
> 
> On 07/25/2018 11:21 PM, Kirill A. Shutemov wrote:
> > On Wed, Jul 25, 2018 at 05:26:02PM +, Dmitry Malkin wrote:
> > > there may be some other reasons which may cause undefined behavior (reboot
> > > for example):
> > > 
> > > in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():
> > > 
> > > 1. structure "paging_config" allocated on stack without setting default
> > > value for flag "l5_required":
> > > > > struct paging_config paging_config = {};
> > > l5_required is set only if CONFIG_X86_5LEVEL is defined
> > Hm? C99 initializer zeros the structure.
> https://elixir.bootlin.com/linux/latest/source/Makefile#L366
> Here I only see std=gnu89.

gnu89 support C99-style initializers. The syntax above would clear fields
that not initialized explicitly, in this case all of them.

> > > 2. reading from memory which may be reserved in case of EFI systems:
> > > > >     ebda_start = *(unsigned short *)0x40e << 4;
> > > > >     bios_start = *(unsigned short *)0x413 << 10;
> > > Also, on EFI system without CSM it will results in all zeros. Which will
> > > place trampoline_start to 0x9d000. And it also may be reserved memory. In
> > > fact I have such system and it is causes instant reboot (when code starts
> > > copying to "trampoline_start").
> > Could you show dmesg from such system?
> Sure, here it is (please note than not both pages are reserved but only
> second one: 0x9e000-0x9):

Well. That's bad.

I don't see much options but parse e820 in decompression code. I hoped to
avoid this.

Let me see what I can do there.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-26 Thread Kirill A. Shutemov
On Thu, Jul 26, 2018 at 08:10:42AM +, Dmitry Malkin wrote:
> 
> 
> On 07/25/2018 11:21 PM, Kirill A. Shutemov wrote:
> > On Wed, Jul 25, 2018 at 05:26:02PM +, Dmitry Malkin wrote:
> > > there may be some other reasons which may cause undefined behavior (reboot
> > > for example):
> > > 
> > > in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():
> > > 
> > > 1. structure "paging_config" allocated on stack without setting default
> > > value for flag "l5_required":
> > > > > struct paging_config paging_config = {};
> > > l5_required is set only if CONFIG_X86_5LEVEL is defined
> > Hm? C99 initializer zeros the structure.
> https://elixir.bootlin.com/linux/latest/source/Makefile#L366
> Here I only see std=gnu89.

gnu89 support C99-style initializers. The syntax above would clear fields
that not initialized explicitly, in this case all of them.

> > > 2. reading from memory which may be reserved in case of EFI systems:
> > > > >     ebda_start = *(unsigned short *)0x40e << 4;
> > > > >     bios_start = *(unsigned short *)0x413 << 10;
> > > Also, on EFI system without CSM it will results in all zeros. Which will
> > > place trampoline_start to 0x9d000. And it also may be reserved memory. In
> > > fact I have such system and it is causes instant reboot (when code starts
> > > copying to "trampoline_start").
> > Could you show dmesg from such system?
> Sure, here it is (please note than not both pages are reserved but only
> second one: 0x9e000-0x9):

Well. That's bad.

I don't see much options but parse e820 in decompression code. I hoped to
avoid this.

Let me see what I can do there.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-26 Thread Dmitry Malkin



On 07/25/2018 11:21 PM, Kirill A. Shutemov wrote:

On Wed, Jul 25, 2018 at 05:26:02PM +, Dmitry Malkin wrote:

there may be some other reasons which may cause undefined behavior (reboot
for example):

in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():

1. structure "paging_config" allocated on stack without setting default
value for flag "l5_required":

struct paging_config paging_config = {};

l5_required is set only if CONFIG_X86_5LEVEL is defined

Hm? C99 initializer zeros the structure.

https://elixir.bootlin.com/linux/latest/source/Makefile#L366
Here I only see std=gnu89.



2. reading from memory which may be reserved in case of EFI systems:

    ebda_start = *(unsigned short *)0x40e << 4;
    bios_start = *(unsigned short *)0x413 << 10;

Also, on EFI system without CSM it will results in all zeros. Which will
place trampoline_start to 0x9d000. And it also may be reserved memory. In
fact I have such system and it is causes instant reboot (when code starts
copying to "trampoline_start").

Could you show dmesg from such system?
Sure, here it is (please note than not both pages are reserved but only 
second one: 0x9e000-0x9):


[    0.00] Linux version 4.17.9-1.el7.elrepo.x86_64 
(mockbuild@Build64R7) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) 
(GCC)) #1 SMP Sun Jul 22 11:57:51 EDT 2018
[    0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.17.9-1.el7.elrepo.x86_64 
root=UUID=51cc5f87-2bb2-45b5-a0ee-691970f9cf06 ro crashkernel=auto rhgb 
quiet
[    0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating 
point registers'

[    0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds 
registers'

[    0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]: 256
[    0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]: 64
[    0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]: 64
[    0.00] x86/fpu: Enabled xstate features 0x1f, context size is 
960 bytes, using 'compacted' format.

[    0.00] e820: BIOS-provided physical RAM map:
[    0.00] BIOS-e820: [mem 0x-0x00057fff] usable
[    0.00] BIOS-e820: [mem 0x00058000-0x00058fff] 
reserved

[    0.00] BIOS-e820: [mem 0x00059000-0x0009dfff] usable
[    0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
reserved
[    0.00] BIOS-e820: [mem 0x000e-0x000e0fff] 
reserved

[    0.00] BIOS-e820: [mem 0x0010-0xc4a14fff] usable
[    0.00] BIOS-e820: [mem 0xc4a15000-0xc4a15fff] 
ACPI NVS
[    0.00] BIOS-e820: [mem 0xc4a16000-0xc4a3] 
reserved

[    0.00] BIOS-e820: [mem 0xc4a4-0xc91acfff] usable
[    0.00] BIOS-e820: [mem 0xc91ad000-0xc9749fff] 
reserved
[    0.00] BIOS-e820: [mem 0xc974a000-0xc9776fff] 
ACPI data
[    0.00] BIOS-e820: [mem 0xc9777000-0xcba86fff] 
ACPI NVS
[    0.00] BIOS-e820: [mem 0xcba87000-0xcbefdfff] 
reserved

[    0.00] BIOS-e820: [mem 0xcbefe000-0xcbefefff] usable
[    0.00] BIOS-e820: [mem 0xcbf0-0xcbff] 
reserved
[    0.00] BIOS-e820: [mem 0xf800-0xfbff] 
reserved
[    0.00] BIOS-e820: [mem 0xfe00-0xfe010fff] 
reserved
[    0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] 
reserved
[    0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] 
reserved
[    0.00] BIOS-e820: [mem 0xff00-0x] 
reserved

[    0.00] BIOS-e820: [mem 0x0001-0x00022f7f] usable
[    0.00] NX (Execute Disable) protection: active
[    0.00] e820: update [mem 0xc42c9018-0xc4321057] usable ==> usable
[    0.00] e820: update [mem 0xc42c9018-0xc4321057] usable ==> usable
[    0.00] e820: update [mem 0xc42b9018-0xc42c8c57] usable ==> usable
[    0.00] e820: update [mem 0xc42b9018-0xc42c8c57] usable ==> usable
[    0.00] e820: update [mem 0xc42a8018-0xc42b8257] usable ==> usable
[    0.00] e820: update [mem 0xc42a8018-0xc42b8257] usable ==> usable
[    0.00] extended physical RAM map:
[    0.00] reserve setup_data: [mem 
0x-0x00057fff] usable
[    0.00] reserve setup_data: [mem 
0x00058000-0x00058fff] reserved
[    0.00] reserve setup_data: [mem 
0x00059000-0x0009dfff] usable
[    0.00] reserve setup_data: [mem 
0x0009e000-0x0009] reserved
[    0.00] reserve setup_data: [mem 
0x000e-0x000e0fff] reserved
[    0.00] reserve setup_data: [mem 
0x0010-0xc42a8017] usable
[ 

Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-26 Thread Dmitry Malkin



On 07/25/2018 11:21 PM, Kirill A. Shutemov wrote:

On Wed, Jul 25, 2018 at 05:26:02PM +, Dmitry Malkin wrote:

there may be some other reasons which may cause undefined behavior (reboot
for example):

in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():

1. structure "paging_config" allocated on stack without setting default
value for flag "l5_required":

struct paging_config paging_config = {};

l5_required is set only if CONFIG_X86_5LEVEL is defined

Hm? C99 initializer zeros the structure.

https://elixir.bootlin.com/linux/latest/source/Makefile#L366
Here I only see std=gnu89.



2. reading from memory which may be reserved in case of EFI systems:

    ebda_start = *(unsigned short *)0x40e << 4;
    bios_start = *(unsigned short *)0x413 << 10;

Also, on EFI system without CSM it will results in all zeros. Which will
place trampoline_start to 0x9d000. And it also may be reserved memory. In
fact I have such system and it is causes instant reboot (when code starts
copying to "trampoline_start").

Could you show dmesg from such system?
Sure, here it is (please note than not both pages are reserved but only 
second one: 0x9e000-0x9):


[    0.00] Linux version 4.17.9-1.el7.elrepo.x86_64 
(mockbuild@Build64R7) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-28) 
(GCC)) #1 SMP Sun Jul 22 11:57:51 EDT 2018
[    0.00] Command line: 
BOOT_IMAGE=/boot/vmlinuz-4.17.9-1.el7.elrepo.x86_64 
root=UUID=51cc5f87-2bb2-45b5-a0ee-691970f9cf06 ro crashkernel=auto rhgb 
quiet
[    0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating 
point registers'

[    0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds 
registers'

[    0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[    0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]: 256
[    0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]: 64
[    0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]: 64
[    0.00] x86/fpu: Enabled xstate features 0x1f, context size is 
960 bytes, using 'compacted' format.

[    0.00] e820: BIOS-provided physical RAM map:
[    0.00] BIOS-e820: [mem 0x-0x00057fff] usable
[    0.00] BIOS-e820: [mem 0x00058000-0x00058fff] 
reserved

[    0.00] BIOS-e820: [mem 0x00059000-0x0009dfff] usable
[    0.00] BIOS-e820: [mem 0x0009e000-0x0009] 
reserved
[    0.00] BIOS-e820: [mem 0x000e-0x000e0fff] 
reserved

[    0.00] BIOS-e820: [mem 0x0010-0xc4a14fff] usable
[    0.00] BIOS-e820: [mem 0xc4a15000-0xc4a15fff] 
ACPI NVS
[    0.00] BIOS-e820: [mem 0xc4a16000-0xc4a3] 
reserved

[    0.00] BIOS-e820: [mem 0xc4a4-0xc91acfff] usable
[    0.00] BIOS-e820: [mem 0xc91ad000-0xc9749fff] 
reserved
[    0.00] BIOS-e820: [mem 0xc974a000-0xc9776fff] 
ACPI data
[    0.00] BIOS-e820: [mem 0xc9777000-0xcba86fff] 
ACPI NVS
[    0.00] BIOS-e820: [mem 0xcba87000-0xcbefdfff] 
reserved

[    0.00] BIOS-e820: [mem 0xcbefe000-0xcbefefff] usable
[    0.00] BIOS-e820: [mem 0xcbf0-0xcbff] 
reserved
[    0.00] BIOS-e820: [mem 0xf800-0xfbff] 
reserved
[    0.00] BIOS-e820: [mem 0xfe00-0xfe010fff] 
reserved
[    0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] 
reserved
[    0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] 
reserved
[    0.00] BIOS-e820: [mem 0xff00-0x] 
reserved

[    0.00] BIOS-e820: [mem 0x0001-0x00022f7f] usable
[    0.00] NX (Execute Disable) protection: active
[    0.00] e820: update [mem 0xc42c9018-0xc4321057] usable ==> usable
[    0.00] e820: update [mem 0xc42c9018-0xc4321057] usable ==> usable
[    0.00] e820: update [mem 0xc42b9018-0xc42c8c57] usable ==> usable
[    0.00] e820: update [mem 0xc42b9018-0xc42c8c57] usable ==> usable
[    0.00] e820: update [mem 0xc42a8018-0xc42b8257] usable ==> usable
[    0.00] e820: update [mem 0xc42a8018-0xc42b8257] usable ==> usable
[    0.00] extended physical RAM map:
[    0.00] reserve setup_data: [mem 
0x-0x00057fff] usable
[    0.00] reserve setup_data: [mem 
0x00058000-0x00058fff] reserved
[    0.00] reserve setup_data: [mem 
0x00059000-0x0009dfff] usable
[    0.00] reserve setup_data: [mem 
0x0009e000-0x0009] reserved
[    0.00] reserve setup_data: [mem 
0x000e-0x000e0fff] reserved
[    0.00] reserve setup_data: [mem 
0x0010-0xc42a8017] usable
[ 

Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-25 Thread Kirill A. Shutemov
On Wed, Jul 25, 2018 at 05:26:02PM +, Dmitry Malkin wrote:
> there may be some other reasons which may cause undefined behavior (reboot
> for example):
> 
> in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():
> 
> 1. structure "paging_config" allocated on stack without setting default
> value for flag "l5_required":
> >>struct paging_config paging_config = {};
> l5_required is set only if CONFIG_X86_5LEVEL is defined

Hm? C99 initializer zeros the structure.

> 2. reading from memory which may be reserved in case of EFI systems:
> >>    ebda_start = *(unsigned short *)0x40e << 4;
> >>    bios_start = *(unsigned short *)0x413 << 10;
> Also, on EFI system without CSM it will results in all zeros. Which will
> place trampoline_start to 0x9d000. And it also may be reserved memory. In
> fact I have such system and it is causes instant reboot (when code starts
> copying to "trampoline_start").

Could you show dmesg from such system?

> 3. paging_prepare(void) returns "struct paging_config". Copy by value. Is it
> really specified by ABI or GCC itself that the second field (which is flag
> "l5_required") will go to RDX register?

https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf

3.2.3 Parameter Passing

...

Returning of Values
The returning of values is done according to the following algorithm:

...

3.  If the class is INTEGER, the next available register of the sequence
%rax, %rdx is used.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-25 Thread Kirill A. Shutemov
On Wed, Jul 25, 2018 at 05:26:02PM +, Dmitry Malkin wrote:
> there may be some other reasons which may cause undefined behavior (reboot
> for example):
> 
> in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():
> 
> 1. structure "paging_config" allocated on stack without setting default
> value for flag "l5_required":
> >>struct paging_config paging_config = {};
> l5_required is set only if CONFIG_X86_5LEVEL is defined

Hm? C99 initializer zeros the structure.

> 2. reading from memory which may be reserved in case of EFI systems:
> >>    ebda_start = *(unsigned short *)0x40e << 4;
> >>    bios_start = *(unsigned short *)0x413 << 10;
> Also, on EFI system without CSM it will results in all zeros. Which will
> place trampoline_start to 0x9d000. And it also may be reserved memory. In
> fact I have such system and it is causes instant reboot (when code starts
> copying to "trampoline_start").

Could you show dmesg from such system?

> 3. paging_prepare(void) returns "struct paging_config". Copy by value. Is it
> really specified by ABI or GCC itself that the second field (which is flag
> "l5_required") will go to RDX register?

https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-1.0.pdf

3.2.3 Parameter Passing

...

Returning of Values
The returning of values is done according to the following algorithm:

...

3.  If the class is INTEGER, the next available register of the sequence
%rax, %rdx is used.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-25 Thread Dmitry Malkin
there may be some other reasons which may cause undefined behavior 
(reboot for example):


in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():

1. structure "paging_config" allocated on stack without setting default 
value for flag "l5_required":

>>struct paging_config paging_config = {};
l5_required is set only if CONFIG_X86_5LEVEL is defined

2. reading from memory which may be reserved in case of EFI systems:
>>    ebda_start = *(unsigned short *)0x40e << 4;
>>    bios_start = *(unsigned short *)0x413 << 10;
Also, on EFI system without CSM it will results in all zeros. Which will 
place trampoline_start to 0x9d000. And it also may be reserved memory. 
In fact I have such system and it is causes instant reboot (when code 
starts copying to "trampoline_start").


3. paging_prepare(void) returns "struct paging_config". Copy by value. 
Is it really specified by ABI or GCC itself that the second field (which 
is flag "l5_required") will go to RDX register?






Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-25 Thread Dmitry Malkin
there may be some other reasons which may cause undefined behavior 
(reboot for example):


in arch/x86/boot/compressed/pgtable_64.c in function paging_prepare():

1. structure "paging_config" allocated on stack without setting default 
value for flag "l5_required":

>>struct paging_config paging_config = {};
l5_required is set only if CONFIG_X86_5LEVEL is defined

2. reading from memory which may be reserved in case of EFI systems:
>>    ebda_start = *(unsigned short *)0x40e << 4;
>>    bios_start = *(unsigned short *)0x413 << 10;
Also, on EFI system without CSM it will results in all zeros. Which will 
place trampoline_start to 0x9d000. And it also may be reserved memory. 
In fact I have such system and it is causes instant reboot (when code 
starts copying to "trampoline_start").


3. paging_prepare(void) returns "struct paging_config". Copy by value. 
Is it really specified by ABI or GCC itself that the second field (which 
is flag "l5_required") will go to RDX register?






Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-09 Thread Masahiro Yamada
2018-07-09 19:10 GMT+09:00 Kirill A. Shutemov :
> On Sat, Jul 07, 2018 at 10:21:47AM +0900, Masahiro Yamada wrote:
>> 2018-07-07 1:29 GMT+09:00 Kirill A. Shutemov :
>> > On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
>> >> >> LDFLAGS is for internal-use.
>> >> >> Please do not override it from the command line.
>> >> >
>> >> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS 
>> >> > or
>> >> > other critical internal-use-only variables?
>> >>
>> >> Yes, Make can check where variables came from.
>> >
>> > I think we should do this.
>> >
>> >> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>> >> >> will allow you to append linker flags.
>> >> >
>> >> > Okay. It makes me wounder if we should taint kernel in such cases?
>> >> > Custom compiler/linker flags are risky and can lead to weird bugs.
>> >>
>> >> OK.
>> >> So, what problem are we discussing?
>> >
>> > Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
>> > hard to debug. See
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=200385
>>
>>
>> CFLAGS is only used under tools/.
>> Passing CFLAGS is probably no effect to the kernel.
>>
>> In Linux makefiles,
>> KBUILD_ prefixed variables are used internally.
>>
>> KBUILD_CFLAGS, KBUILD_CPPFLAGS, KBUILD_AFLAGS, etc.
>>
>>
>> LDFLAGS is an exception.  I do not know why.
>> Renaming LDFLAGS to KBUILD_LDFLAGS
>> will make the code consistent.
>>
>> At least, it will avoid overriding flags by accident.
>>
>> Of course, users still can change KBUILD_LDFLAGS
>> if they really want.
>>
>> The build system could add belt and braces checks for that,
>> but it is arguable since
>> there are lots of lots of internal variables.
>
> I think renaming LDFLAGS to KBUILD_LDFLAGS is good idea.
> Would you prepare patch?


Yes, targeting for 4.19-rc1.


-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-09 Thread Masahiro Yamada
2018-07-09 19:10 GMT+09:00 Kirill A. Shutemov :
> On Sat, Jul 07, 2018 at 10:21:47AM +0900, Masahiro Yamada wrote:
>> 2018-07-07 1:29 GMT+09:00 Kirill A. Shutemov :
>> > On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
>> >> >> LDFLAGS is for internal-use.
>> >> >> Please do not override it from the command line.
>> >> >
>> >> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS 
>> >> > or
>> >> > other critical internal-use-only variables?
>> >>
>> >> Yes, Make can check where variables came from.
>> >
>> > I think we should do this.
>> >
>> >> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>> >> >> will allow you to append linker flags.
>> >> >
>> >> > Okay. It makes me wounder if we should taint kernel in such cases?
>> >> > Custom compiler/linker flags are risky and can lead to weird bugs.
>> >>
>> >> OK.
>> >> So, what problem are we discussing?
>> >
>> > Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
>> > hard to debug. See
>> >
>> > https://bugzilla.kernel.org/show_bug.cgi?id=200385
>>
>>
>> CFLAGS is only used under tools/.
>> Passing CFLAGS is probably no effect to the kernel.
>>
>> In Linux makefiles,
>> KBUILD_ prefixed variables are used internally.
>>
>> KBUILD_CFLAGS, KBUILD_CPPFLAGS, KBUILD_AFLAGS, etc.
>>
>>
>> LDFLAGS is an exception.  I do not know why.
>> Renaming LDFLAGS to KBUILD_LDFLAGS
>> will make the code consistent.
>>
>> At least, it will avoid overriding flags by accident.
>>
>> Of course, users still can change KBUILD_LDFLAGS
>> if they really want.
>>
>> The build system could add belt and braces checks for that,
>> but it is arguable since
>> there are lots of lots of internal variables.
>
> I think renaming LDFLAGS to KBUILD_LDFLAGS is good idea.
> Would you prepare patch?


Yes, targeting for 4.19-rc1.


-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-09 Thread Kirill A. Shutemov
On Sat, Jul 07, 2018 at 10:21:47AM +0900, Masahiro Yamada wrote:
> 2018-07-07 1:29 GMT+09:00 Kirill A. Shutemov :
> > On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
> >> >> LDFLAGS is for internal-use.
> >> >> Please do not override it from the command line.
> >> >
> >> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS 
> >> > or
> >> > other critical internal-use-only variables?
> >>
> >> Yes, Make can check where variables came from.
> >
> > I think we should do this.
> >
> >> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
> >> >> will allow you to append linker flags.
> >> >
> >> > Okay. It makes me wounder if we should taint kernel in such cases?
> >> > Custom compiler/linker flags are risky and can lead to weird bugs.
> >>
> >> OK.
> >> So, what problem are we discussing?
> >
> > Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
> > hard to debug. See
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=200385
> 
> 
> CFLAGS is only used under tools/.
> Passing CFLAGS is probably no effect to the kernel.
> 
> In Linux makefiles,
> KBUILD_ prefixed variables are used internally.
> 
> KBUILD_CFLAGS, KBUILD_CPPFLAGS, KBUILD_AFLAGS, etc.
> 
> 
> LDFLAGS is an exception.  I do not know why.
> Renaming LDFLAGS to KBUILD_LDFLAGS
> will make the code consistent.
> 
> At least, it will avoid overriding flags by accident.
> 
> Of course, users still can change KBUILD_LDFLAGS
> if they really want.
> 
> The build system could add belt and braces checks for that,
> but it is arguable since
> there are lots of lots of internal variables.

I think renaming LDFLAGS to KBUILD_LDFLAGS is good idea.
Would you prepare patch?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-09 Thread Kirill A. Shutemov
On Sat, Jul 07, 2018 at 10:21:47AM +0900, Masahiro Yamada wrote:
> 2018-07-07 1:29 GMT+09:00 Kirill A. Shutemov :
> > On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
> >> >> LDFLAGS is for internal-use.
> >> >> Please do not override it from the command line.
> >> >
> >> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS 
> >> > or
> >> > other critical internal-use-only variables?
> >>
> >> Yes, Make can check where variables came from.
> >
> > I think we should do this.
> >
> >> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
> >> >> will allow you to append linker flags.
> >> >
> >> > Okay. It makes me wounder if we should taint kernel in such cases?
> >> > Custom compiler/linker flags are risky and can lead to weird bugs.
> >>
> >> OK.
> >> So, what problem are we discussing?
> >
> > Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
> > hard to debug. See
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=200385
> 
> 
> CFLAGS is only used under tools/.
> Passing CFLAGS is probably no effect to the kernel.
> 
> In Linux makefiles,
> KBUILD_ prefixed variables are used internally.
> 
> KBUILD_CFLAGS, KBUILD_CPPFLAGS, KBUILD_AFLAGS, etc.
> 
> 
> LDFLAGS is an exception.  I do not know why.
> Renaming LDFLAGS to KBUILD_LDFLAGS
> will make the code consistent.
> 
> At least, it will avoid overriding flags by accident.
> 
> Of course, users still can change KBUILD_LDFLAGS
> if they really want.
> 
> The build system could add belt and braces checks for that,
> but it is arguable since
> there are lots of lots of internal variables.

I think renaming LDFLAGS to KBUILD_LDFLAGS is good idea.
Would you prepare patch?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
2018-07-07 1:29 GMT+09:00 Kirill A. Shutemov :
> On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
>> >> LDFLAGS is for internal-use.
>> >> Please do not override it from the command line.
>> >
>> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
>> > other critical internal-use-only variables?
>>
>> Yes, Make can check where variables came from.
>
> I think we should do this.
>
>> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>> >> will allow you to append linker flags.
>> >
>> > Okay. It makes me wounder if we should taint kernel in such cases?
>> > Custom compiler/linker flags are risky and can lead to weird bugs.
>>
>> OK.
>> So, what problem are we discussing?
>
> Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
> hard to debug. See
>
> https://bugzilla.kernel.org/show_bug.cgi?id=200385


CFLAGS is only used under tools/.
Passing CFLAGS is probably no effect to the kernel.

In Linux makefiles,
KBUILD_ prefixed variables are used internally.

KBUILD_CFLAGS, KBUILD_CPPFLAGS, KBUILD_AFLAGS, etc.


LDFLAGS is an exception.  I do not know why.
Renaming LDFLAGS to KBUILD_LDFLAGS
will make the code consistent.

At least, it will avoid overriding flags by accident.

Of course, users still can change KBUILD_LDFLAGS
if they really want.

The build system could add belt and braces checks for that,
but it is arguable since
there are lots of lots of internal variables.




> and start of this thread:
>
> https://lore.kernel.org/lkml/20180701213243.ga20...@trogon.sfo.coreos.systems/
>
> It took me a while to track down the issue. I blamed linker for a while.
>
>> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
>> >
>> >  make LDFLAGS="..."
>>
>> In your previous mail, I thought you were asking me how to pass
>> custom linker flags.
>>
>> If not, we do not need to think about that case.
>> Just say "Do not do that".
>
> At least we need to make user aware about risk of setting custom flags.
>
> --
>  Kirill A. Shutemov



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
2018-07-07 1:29 GMT+09:00 Kirill A. Shutemov :
> On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
>> >> LDFLAGS is for internal-use.
>> >> Please do not override it from the command line.
>> >
>> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
>> > other critical internal-use-only variables?
>>
>> Yes, Make can check where variables came from.
>
> I think we should do this.
>
>> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>> >> will allow you to append linker flags.
>> >
>> > Okay. It makes me wounder if we should taint kernel in such cases?
>> > Custom compiler/linker flags are risky and can lead to weird bugs.
>>
>> OK.
>> So, what problem are we discussing?
>
> Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
> hard to debug. See
>
> https://bugzilla.kernel.org/show_bug.cgi?id=200385


CFLAGS is only used under tools/.
Passing CFLAGS is probably no effect to the kernel.

In Linux makefiles,
KBUILD_ prefixed variables are used internally.

KBUILD_CFLAGS, KBUILD_CPPFLAGS, KBUILD_AFLAGS, etc.


LDFLAGS is an exception.  I do not know why.
Renaming LDFLAGS to KBUILD_LDFLAGS
will make the code consistent.

At least, it will avoid overriding flags by accident.

Of course, users still can change KBUILD_LDFLAGS
if they really want.

The build system could add belt and braces checks for that,
but it is arguable since
there are lots of lots of internal variables.




> and start of this thread:
>
> https://lore.kernel.org/lkml/20180701213243.ga20...@trogon.sfo.coreos.systems/
>
> It took me a while to track down the issue. I blamed linker for a while.
>
>> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
>> >
>> >  make LDFLAGS="..."
>>
>> In your previous mail, I thought you were asking me how to pass
>> custom linker flags.
>>
>> If not, we do not need to think about that case.
>> Just say "Do not do that".
>
> At least we need to make user aware about risk of setting custom flags.
>
> --
>  Kirill A. Shutemov



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
2018-07-06 23:39 GMT+09:00 Gabriel C :
> 2018-07-06 16:13 GMT+02:00 Masahiro Yamada :
>> Hi.
>>
>> 2018-07-06 19:41 GMT+09:00 Kirill A. Shutemov :
>>> On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
 >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
 >> > >
 >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
 >> > > too with the same symptoms
 >> >
 >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
 >>
 >> -flto in LDFLAGS screws up this part of paging_prepare():
 >
 > +Masahiro, Michal.
 >
 > I've got it wrong. *Any* LDFLAGS option passed to make this way:
 >
 >   make LDFLAGS="..."
 >
 > would cause a issue. Even empty.
 >
 > It overrides all assignments to the variable in the makefile.
 > As result the image is built without -pie and linker doesn't generate
 > position independed code.
 >
 > Looks like the patch below helps, but my make-fu is poor.
 > I don't see many override directives in kernel makefiles.
 > It makes me think that there's a better way to fix this.
 >
 > Hm?


 LDFLAGS is for internal-use.
 Please do not override it from the command line.
>>>
>>> Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
>>> other critical internal-use-only variables?
>>
>> Yes, Make can check where variables came from.
>>
>>
>>> This breakage was rather hard to debug. We need to have some kind of
>>> fail-safe for the future.
>>>
 You want to pass your own linker flags
 for building vmlinux and modules,
 but do not want to pass them to
 the decompressor (arch/x86/boot/compressed).

 Correct?
>>>
>>> I personally don't think that changing compiler/linker options for kernel
>>> build is good idea in general.
>>>
 Kbuild provides a way for users
 to pass additional linker flags to modules.
 (LDFLAGS_MODULE)


 But, there is no way to do that for vmlinux.

 It is easy to support it, though.

 https://patchwork.kernel.org/patch/10510833/

 If this is the one you want, I can merge this.


 make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
 will allow you to append linker flags.
>>>
>>> Okay. It makes me wounder if we should taint kernel in such cases?
>>> Custom compiler/linker flags are risky and can lead to weird bugs.
>>
>> OK.
>> So, what problem are we discussing?
>>
>>
>>> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>>>
>>>  make LDFLAGS="..."
>>
>> In your previous mail, I thought you were asking me how to pass
>> custom linker flags.
>>
>> If not, we do not need to think about that case.
>> Just say "Do not do that".
>
> I am sorry but I have a hard time to get your logic here.
>
> You are saying : the *env* variable LDFLAGS as well passing
> LDFLAGS to make , which your build allows should not be use
> because is for 'internal usage' .. ?
>
> Well that logic you have here is wrong and wrong for any project
> not just for the kernel,


Why 'my logic'?

LDFLAGS has been long used internally since the old days,
before I ever worked on the kernel.


I shared my knowledge about the kernel build system.

The current situation is not nice,
but why are you blaming me for the code I did not add ?


Note:
I have never said 'the *env* variable LDFLAGS'


> If you know 'parts' need have particular flags then 'you' have to
> ensure nothing
> overrides these or nothing at all can chage these.
>
> So swap your logic and apped LDFLAGS to your private
> 'call_it_whatever_you_wish_KERNEL_NEED_BE_THERE_ANY_KIND_FLAGS'
> and don't allow these to be changed at all , when you
> *know* they have be there.
>
>
> Teling users to not use LD/C/CXX flags is not really going to work right ?
>
>
> BR



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
2018-07-06 23:39 GMT+09:00 Gabriel C :
> 2018-07-06 16:13 GMT+02:00 Masahiro Yamada :
>> Hi.
>>
>> 2018-07-06 19:41 GMT+09:00 Kirill A. Shutemov :
>>> On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
 >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
 >> > >
 >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
 >> > > too with the same symptoms
 >> >
 >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
 >>
 >> -flto in LDFLAGS screws up this part of paging_prepare():
 >
 > +Masahiro, Michal.
 >
 > I've got it wrong. *Any* LDFLAGS option passed to make this way:
 >
 >   make LDFLAGS="..."
 >
 > would cause a issue. Even empty.
 >
 > It overrides all assignments to the variable in the makefile.
 > As result the image is built without -pie and linker doesn't generate
 > position independed code.
 >
 > Looks like the patch below helps, but my make-fu is poor.
 > I don't see many override directives in kernel makefiles.
 > It makes me think that there's a better way to fix this.
 >
 > Hm?


 LDFLAGS is for internal-use.
 Please do not override it from the command line.
>>>
>>> Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
>>> other critical internal-use-only variables?
>>
>> Yes, Make can check where variables came from.
>>
>>
>>> This breakage was rather hard to debug. We need to have some kind of
>>> fail-safe for the future.
>>>
 You want to pass your own linker flags
 for building vmlinux and modules,
 but do not want to pass them to
 the decompressor (arch/x86/boot/compressed).

 Correct?
>>>
>>> I personally don't think that changing compiler/linker options for kernel
>>> build is good idea in general.
>>>
 Kbuild provides a way for users
 to pass additional linker flags to modules.
 (LDFLAGS_MODULE)


 But, there is no way to do that for vmlinux.

 It is easy to support it, though.

 https://patchwork.kernel.org/patch/10510833/

 If this is the one you want, I can merge this.


 make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
 will allow you to append linker flags.
>>>
>>> Okay. It makes me wounder if we should taint kernel in such cases?
>>> Custom compiler/linker flags are risky and can lead to weird bugs.
>>
>> OK.
>> So, what problem are we discussing?
>>
>>
>>> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>>>
>>>  make LDFLAGS="..."
>>
>> In your previous mail, I thought you were asking me how to pass
>> custom linker flags.
>>
>> If not, we do not need to think about that case.
>> Just say "Do not do that".
>
> I am sorry but I have a hard time to get your logic here.
>
> You are saying : the *env* variable LDFLAGS as well passing
> LDFLAGS to make , which your build allows should not be use
> because is for 'internal usage' .. ?
>
> Well that logic you have here is wrong and wrong for any project
> not just for the kernel,


Why 'my logic'?

LDFLAGS has been long used internally since the old days,
before I ever worked on the kernel.


I shared my knowledge about the kernel build system.

The current situation is not nice,
but why are you blaming me for the code I did not add ?


Note:
I have never said 'the *env* variable LDFLAGS'


> If you know 'parts' need have particular flags then 'you' have to
> ensure nothing
> overrides these or nothing at all can chage these.
>
> So swap your logic and apped LDFLAGS to your private
> 'call_it_whatever_you_wish_KERNEL_NEED_BE_THERE_ANY_KIND_FLAGS'
> and don't allow these to be changed at all , when you
> *know* they have be there.
>
>
> Teling users to not use LD/C/CXX flags is not really going to work right ?
>
>
> BR



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Benjamin Gilbert
On Fri, Jul 06, 2018 at 11:11:10AM -0700, Andi Kleen wrote:
> There are valid use cases to override the flags. I use it sometimes too,
> and know some other people do to.
> 
> But you need to know what you're doing. 
> 
> Perhaps a warning during build would be reasonable. So if you ask
> for a build log you would see it.

In our case, the package is presumably passing LDFLAGS="" to override the
LDFLAGS environment variable already set by the packaging system.  This has
worked for years without a problem.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Benjamin Gilbert
On Fri, Jul 06, 2018 at 11:11:10AM -0700, Andi Kleen wrote:
> There are valid use cases to override the flags. I use it sometimes too,
> and know some other people do to.
> 
> But you need to know what you're doing. 
> 
> Perhaps a warning during build would be reasonable. So if you ask
> for a build log you would see it.

In our case, the package is presumably passing LDFLAGS="" to override the
LDFLAGS environment variable already set by the packaging system.  This has
worked for years without a problem.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Andi Kleen
> At least we need to make user aware about risk of setting custom flags.

There are valid use cases to override the flags. I use it sometimes too,
and know some other people do to.

But you need to know what you're doing. 

Perhaps a warning during build would be reasonable. So if you ask
for a build log you would see it.

-Andi


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Andi Kleen
> At least we need to make user aware about risk of setting custom flags.

There are valid use cases to override the flags. I use it sometimes too,
and know some other people do to.

But you need to know what you're doing. 

Perhaps a warning during build would be reasonable. So if you ask
for a build log you would see it.

-Andi


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Gabriel C
2018-07-06 18:33 GMT+02:00 Kirill A. Shutemov :
> On Fri, Jul 06, 2018 at 04:39:28PM +0200, Gabriel C wrote:
>> > If not, we do not need to think about that case.
>> > Just say "Do not do that".
>>
>> I am sorry but I have a hard time to get your logic here.
>>
>> You are saying : the *env* variable LDFLAGS as well passing
>> LDFLAGS to make , which your build allows should not be use
>> because is for 'internal usage' .. ?
>
> Environment variables do not override make variables. Only passing varible
> assignment as make argument will do this.
>

Still .. When the build system allows to do 'make FOO=bar' and you know
when using that things will break , the build system should be fixed.


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Gabriel C
2018-07-06 18:33 GMT+02:00 Kirill A. Shutemov :
> On Fri, Jul 06, 2018 at 04:39:28PM +0200, Gabriel C wrote:
>> > If not, we do not need to think about that case.
>> > Just say "Do not do that".
>>
>> I am sorry but I have a hard time to get your logic here.
>>
>> You are saying : the *env* variable LDFLAGS as well passing
>> LDFLAGS to make , which your build allows should not be use
>> because is for 'internal usage' .. ?
>
> Environment variables do not override make variables. Only passing varible
> assignment as make argument will do this.
>

Still .. When the build system allows to do 'make FOO=bar' and you know
when using that things will break , the build system should be fixed.


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Kirill A. Shutemov
On Fri, Jul 06, 2018 at 04:39:28PM +0200, Gabriel C wrote:
> > If not, we do not need to think about that case.
> > Just say "Do not do that".
> 
> I am sorry but I have a hard time to get your logic here.
> 
> You are saying : the *env* variable LDFLAGS as well passing
> LDFLAGS to make , which your build allows should not be use
> because is for 'internal usage' .. ?

Environment variables do not override make variables. Only passing varible
assignment as make argument will do this.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Kirill A. Shutemov
On Fri, Jul 06, 2018 at 04:39:28PM +0200, Gabriel C wrote:
> > If not, we do not need to think about that case.
> > Just say "Do not do that".
> 
> I am sorry but I have a hard time to get your logic here.
> 
> You are saying : the *env* variable LDFLAGS as well passing
> LDFLAGS to make , which your build allows should not be use
> because is for 'internal usage' .. ?

Environment variables do not override make variables. Only passing varible
assignment as make argument will do this.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Kirill A. Shutemov
On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
> >> LDFLAGS is for internal-use.
> >> Please do not override it from the command line.
> >
> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
> > other critical internal-use-only variables?
> 
> Yes, Make can check where variables came from.

I think we should do this.

> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
> >> will allow you to append linker flags.
> >
> > Okay. It makes me wounder if we should taint kernel in such cases?
> > Custom compiler/linker flags are risky and can lead to weird bugs.
> 
> OK.
> So, what problem are we discussing?

Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
hard to debug. See

https://bugzilla.kernel.org/show_bug.cgi?id=200385

and start of this thread:

https://lore.kernel.org/lkml/20180701213243.ga20...@trogon.sfo.coreos.systems/

It took me a while to track down the issue. I blamed linker for a while.

> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
> >
> >  make LDFLAGS="..."
> 
> In your previous mail, I thought you were asking me how to pass
> custom linker flags.
> 
> If not, we do not need to think about that case.
> Just say "Do not do that".

At least we need to make user aware about risk of setting custom flags.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Kirill A. Shutemov
On Fri, Jul 06, 2018 at 11:13:02PM +0900, Masahiro Yamada wrote:
> >> LDFLAGS is for internal-use.
> >> Please do not override it from the command line.
> >
> > Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
> > other critical internal-use-only variables?
> 
> Yes, Make can check where variables came from.

I think we should do this.

> >> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
> >> will allow you to append linker flags.
> >
> > Okay. It makes me wounder if we should taint kernel in such cases?
> > Custom compiler/linker flags are risky and can lead to weird bugs.
> 
> OK.
> So, what problem are we discussing?

Users set custom LDFLAGS/CFLAGS and break kernel. Then report bug that
hard to debug. See

https://bugzilla.kernel.org/show_bug.cgi?id=200385

and start of this thread:

https://lore.kernel.org/lkml/20180701213243.ga20...@trogon.sfo.coreos.systems/

It took me a while to track down the issue. I blamed linker for a while.

> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
> >
> >  make LDFLAGS="..."
> 
> In your previous mail, I thought you were asking me how to pass
> custom linker flags.
> 
> If not, we do not need to think about that case.
> Just say "Do not do that".

At least we need to make user aware about risk of setting custom flags.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Gabriel C
2018-07-06 16:13 GMT+02:00 Masahiro Yamada :
> Hi.
>
> 2018-07-06 19:41 GMT+09:00 Kirill A. Shutemov :
>> On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
>>> >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
>>> >> > >
>>> >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
>>> >> > > too with the same symptoms
>>> >> >
>>> >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
>>> >>
>>> >> -flto in LDFLAGS screws up this part of paging_prepare():
>>> >
>>> > +Masahiro, Michal.
>>> >
>>> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
>>> >
>>> >   make LDFLAGS="..."
>>> >
>>> > would cause a issue. Even empty.
>>> >
>>> > It overrides all assignments to the variable in the makefile.
>>> > As result the image is built without -pie and linker doesn't generate
>>> > position independed code.
>>> >
>>> > Looks like the patch below helps, but my make-fu is poor.
>>> > I don't see many override directives in kernel makefiles.
>>> > It makes me think that there's a better way to fix this.
>>> >
>>> > Hm?
>>>
>>>
>>> LDFLAGS is for internal-use.
>>> Please do not override it from the command line.
>>
>> Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
>> other critical internal-use-only variables?
>
> Yes, Make can check where variables came from.
>
>
>> This breakage was rather hard to debug. We need to have some kind of
>> fail-safe for the future.
>>
>>> You want to pass your own linker flags
>>> for building vmlinux and modules,
>>> but do not want to pass them to
>>> the decompressor (arch/x86/boot/compressed).
>>>
>>> Correct?
>>
>> I personally don't think that changing compiler/linker options for kernel
>> build is good idea in general.
>>
>>> Kbuild provides a way for users
>>> to pass additional linker flags to modules.
>>> (LDFLAGS_MODULE)
>>>
>>>
>>> But, there is no way to do that for vmlinux.
>>>
>>> It is easy to support it, though.
>>>
>>> https://patchwork.kernel.org/patch/10510833/
>>>
>>> If this is the one you want, I can merge this.
>>>
>>>
>>> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>>> will allow you to append linker flags.
>>
>> Okay. It makes me wounder if we should taint kernel in such cases?
>> Custom compiler/linker flags are risky and can lead to weird bugs.
>
> OK.
> So, what problem are we discussing?
>
>
>> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>>
>>  make LDFLAGS="..."
>
> In your previous mail, I thought you were asking me how to pass
> custom linker flags.
>
> If not, we do not need to think about that case.
> Just say "Do not do that".

I am sorry but I have a hard time to get your logic here.

You are saying : the *env* variable LDFLAGS as well passing
LDFLAGS to make , which your build allows should not be use
because is for 'internal usage' .. ?

Well that logic you have here is wrong and wrong for any project
not just for the kernel,

If you know 'parts' need have particular flags then 'you' have to
ensure nothing
overrides these or nothing at all can chage these.

So swap your logic and apped LDFLAGS to your private
'call_it_whatever_you_wish_KERNEL_NEED_BE_THERE_ANY_KIND_FLAGS'
and don't allow these to be changed at all , when you
*know* they have be there.


Teling users to not use LD/C/CXX flags is not really going to work right ?


BR


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Gabriel C
2018-07-06 16:13 GMT+02:00 Masahiro Yamada :
> Hi.
>
> 2018-07-06 19:41 GMT+09:00 Kirill A. Shutemov :
>> On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
>>> >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
>>> >> > >
>>> >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
>>> >> > > too with the same symptoms
>>> >> >
>>> >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
>>> >>
>>> >> -flto in LDFLAGS screws up this part of paging_prepare():
>>> >
>>> > +Masahiro, Michal.
>>> >
>>> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
>>> >
>>> >   make LDFLAGS="..."
>>> >
>>> > would cause a issue. Even empty.
>>> >
>>> > It overrides all assignments to the variable in the makefile.
>>> > As result the image is built without -pie and linker doesn't generate
>>> > position independed code.
>>> >
>>> > Looks like the patch below helps, but my make-fu is poor.
>>> > I don't see many override directives in kernel makefiles.
>>> > It makes me think that there's a better way to fix this.
>>> >
>>> > Hm?
>>>
>>>
>>> LDFLAGS is for internal-use.
>>> Please do not override it from the command line.
>>
>> Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
>> other critical internal-use-only variables?
>
> Yes, Make can check where variables came from.
>
>
>> This breakage was rather hard to debug. We need to have some kind of
>> fail-safe for the future.
>>
>>> You want to pass your own linker flags
>>> for building vmlinux and modules,
>>> but do not want to pass them to
>>> the decompressor (arch/x86/boot/compressed).
>>>
>>> Correct?
>>
>> I personally don't think that changing compiler/linker options for kernel
>> build is good idea in general.
>>
>>> Kbuild provides a way for users
>>> to pass additional linker flags to modules.
>>> (LDFLAGS_MODULE)
>>>
>>>
>>> But, there is no way to do that for vmlinux.
>>>
>>> It is easy to support it, though.
>>>
>>> https://patchwork.kernel.org/patch/10510833/
>>>
>>> If this is the one you want, I can merge this.
>>>
>>>
>>> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>>> will allow you to append linker flags.
>>
>> Okay. It makes me wounder if we should taint kernel in such cases?
>> Custom compiler/linker flags are risky and can lead to weird bugs.
>
> OK.
> So, what problem are we discussing?
>
>
>> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>>
>>  make LDFLAGS="..."
>
> In your previous mail, I thought you were asking me how to pass
> custom linker flags.
>
> If not, we do not need to think about that case.
> Just say "Do not do that".

I am sorry but I have a hard time to get your logic here.

You are saying : the *env* variable LDFLAGS as well passing
LDFLAGS to make , which your build allows should not be use
because is for 'internal usage' .. ?

Well that logic you have here is wrong and wrong for any project
not just for the kernel,

If you know 'parts' need have particular flags then 'you' have to
ensure nothing
overrides these or nothing at all can chage these.

So swap your logic and apped LDFLAGS to your private
'call_it_whatever_you_wish_KERNEL_NEED_BE_THERE_ANY_KIND_FLAGS'
and don't allow these to be changed at all , when you
*know* they have be there.


Teling users to not use LD/C/CXX flags is not really going to work right ?


BR


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
Hi.

2018-07-06 19:41 GMT+09:00 Kirill A. Shutemov :
> On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
>> >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
>> >> > >
>> >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
>> >> > > too with the same symptoms
>> >> >
>> >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
>> >>
>> >> -flto in LDFLAGS screws up this part of paging_prepare():
>> >
>> > +Masahiro, Michal.
>> >
>> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
>> >
>> >   make LDFLAGS="..."
>> >
>> > would cause a issue. Even empty.
>> >
>> > It overrides all assignments to the variable in the makefile.
>> > As result the image is built without -pie and linker doesn't generate
>> > position independed code.
>> >
>> > Looks like the patch below helps, but my make-fu is poor.
>> > I don't see many override directives in kernel makefiles.
>> > It makes me think that there's a better way to fix this.
>> >
>> > Hm?
>>
>>
>> LDFLAGS is for internal-use.
>> Please do not override it from the command line.
>
> Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
> other critical internal-use-only variables?

Yes, Make can check where variables came from.


> This breakage was rather hard to debug. We need to have some kind of
> fail-safe for the future.
>
>> You want to pass your own linker flags
>> for building vmlinux and modules,
>> but do not want to pass them to
>> the decompressor (arch/x86/boot/compressed).
>>
>> Correct?
>
> I personally don't think that changing compiler/linker options for kernel
> build is good idea in general.
>
>> Kbuild provides a way for users
>> to pass additional linker flags to modules.
>> (LDFLAGS_MODULE)
>>
>>
>> But, there is no way to do that for vmlinux.
>>
>> It is easy to support it, though.
>>
>> https://patchwork.kernel.org/patch/10510833/
>>
>> If this is the one you want, I can merge this.
>>
>>
>> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>> will allow you to append linker flags.
>
> Okay. It makes me wounder if we should taint kernel in such cases?
> Custom compiler/linker flags are risky and can lead to weird bugs.

OK.
So, what problem are we discussing?


> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>
>  make LDFLAGS="..."

In your previous mail, I thought you were asking me how to pass
custom linker flags.

If not, we do not need to think about that case.
Just say "Do not do that".



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
Hi.

2018-07-06 19:41 GMT+09:00 Kirill A. Shutemov :
> On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
>> >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
>> >> > >
>> >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
>> >> > > too with the same symptoms
>> >> >
>> >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
>> >>
>> >> -flto in LDFLAGS screws up this part of paging_prepare():
>> >
>> > +Masahiro, Michal.
>> >
>> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
>> >
>> >   make LDFLAGS="..."
>> >
>> > would cause a issue. Even empty.
>> >
>> > It overrides all assignments to the variable in the makefile.
>> > As result the image is built without -pie and linker doesn't generate
>> > position independed code.
>> >
>> > Looks like the patch below helps, but my make-fu is poor.
>> > I don't see many override directives in kernel makefiles.
>> > It makes me think that there's a better way to fix this.
>> >
>> > Hm?
>>
>>
>> LDFLAGS is for internal-use.
>> Please do not override it from the command line.
>
> Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
> other critical internal-use-only variables?

Yes, Make can check where variables came from.


> This breakage was rather hard to debug. We need to have some kind of
> fail-safe for the future.
>
>> You want to pass your own linker flags
>> for building vmlinux and modules,
>> but do not want to pass them to
>> the decompressor (arch/x86/boot/compressed).
>>
>> Correct?
>
> I personally don't think that changing compiler/linker options for kernel
> build is good idea in general.
>
>> Kbuild provides a way for users
>> to pass additional linker flags to modules.
>> (LDFLAGS_MODULE)
>>
>>
>> But, there is no way to do that for vmlinux.
>>
>> It is easy to support it, though.
>>
>> https://patchwork.kernel.org/patch/10510833/
>>
>> If this is the one you want, I can merge this.
>>
>>
>> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
>> will allow you to append linker flags.
>
> Okay. It makes me wounder if we should taint kernel in such cases?
> Custom compiler/linker flags are risky and can lead to weird bugs.

OK.
So, what problem are we discussing?


> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>
>  make LDFLAGS="..."

In your previous mail, I thought you were asking me how to pass
custom linker flags.

If not, we do not need to think about that case.
Just say "Do not do that".



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Kirill A. Shutemov
On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
> >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> >> > >
> >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> >> > > too with the same symptoms
> >> >
> >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> >>
> >> -flto in LDFLAGS screws up this part of paging_prepare():
> >
> > +Masahiro, Michal.
> >
> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
> >
> >   make LDFLAGS="..."
> >
> > would cause a issue. Even empty.
> >
> > It overrides all assignments to the variable in the makefile.
> > As result the image is built without -pie and linker doesn't generate
> > position independed code.
> >
> > Looks like the patch below helps, but my make-fu is poor.
> > I don't see many override directives in kernel makefiles.
> > It makes me think that there's a better way to fix this.
> >
> > Hm?
> 
> 
> LDFLAGS is for internal-use.
> Please do not override it from the command line.

Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
other critical internal-use-only variables?

This breakage was rather hard to debug. We need to have some kind of
fail-safe for the future.

> You want to pass your own linker flags
> for building vmlinux and modules,
> but do not want to pass them to
> the decompressor (arch/x86/boot/compressed).
> 
> Correct?

I personally don't think that changing compiler/linker options for kernel
build is good idea in general.

> Kbuild provides a way for users
> to pass additional linker flags to modules.
> (LDFLAGS_MODULE)
> 
> 
> But, there is no way to do that for vmlinux.
> 
> It is easy to support it, though.
> 
> https://patchwork.kernel.org/patch/10510833/
> 
> If this is the one you want, I can merge this.
> 
> 
> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
> will allow you to append linker flags.

Okay. It makes me wounder if we should taint kernel in such cases?
Custom compiler/linker flags are risky and can lead to weird bugs.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Kirill A. Shutemov
On Fri, Jul 06, 2018 at 03:37:58PM +0900, Masahiro Yamada wrote:
> >> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> >> > >
> >> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> >> > > too with the same symptoms
> >> >
> >> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> >>
> >> -flto in LDFLAGS screws up this part of paging_prepare():
> >
> > +Masahiro, Michal.
> >
> > I've got it wrong. *Any* LDFLAGS option passed to make this way:
> >
> >   make LDFLAGS="..."
> >
> > would cause a issue. Even empty.
> >
> > It overrides all assignments to the variable in the makefile.
> > As result the image is built without -pie and linker doesn't generate
> > position independed code.
> >
> > Looks like the patch below helps, but my make-fu is poor.
> > I don't see many override directives in kernel makefiles.
> > It makes me think that there's a better way to fix this.
> >
> > Hm?
> 
> 
> LDFLAGS is for internal-use.
> Please do not override it from the command line.

Can we generate a build error if a user try to override LDFLAGS, CFLAGS or
other critical internal-use-only variables?

This breakage was rather hard to debug. We need to have some kind of
fail-safe for the future.

> You want to pass your own linker flags
> for building vmlinux and modules,
> but do not want to pass them to
> the decompressor (arch/x86/boot/compressed).
> 
> Correct?

I personally don't think that changing compiler/linker options for kernel
build is good idea in general.

> Kbuild provides a way for users
> to pass additional linker flags to modules.
> (LDFLAGS_MODULE)
> 
> 
> But, there is no way to do that for vmlinux.
> 
> It is easy to support it, though.
> 
> https://patchwork.kernel.org/patch/10510833/
> 
> If this is the one you want, I can merge this.
> 
> 
> make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
> will allow you to append linker flags.

Okay. It makes me wounder if we should taint kernel in such cases?
Custom compiler/linker flags are risky and can lead to weird bugs.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
Hi.


2018-07-05 0:08 GMT+09:00 Kirill A. Shutemov :
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
>> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
>> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
>> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
>> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
>> > > >> kconfig,
>> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
>> > > >> least)
>> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
>> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
>> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
>> > > >> kernel
>> > > >> config for reference, and am happy to test patches, provide sample 
>> > > >> QCOW
>> > > >> images, etc.
>> > > >
>> > >
>> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
>> > >
>> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
>> > > too with the same symptoms
>> >
>> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
>>
>> -flto in LDFLAGS screws up this part of paging_prepare():
>
> +Masahiro, Michal.
>
> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>
>   make LDFLAGS="..."
>
> would cause a issue. Even empty.
>
> It overrides all assignments to the variable in the makefile.
> As result the image is built without -pie and linker doesn't generate
> position independed code.
>
> Looks like the patch below helps, but my make-fu is poor.
> I don't see many override directives in kernel makefiles.
> It makes me think that there's a better way to fix this.
>
> Hm?


LDFLAGS is for internal-use.
Please do not override it from the command line.


You want to pass your own linker flags
for building vmlinux and modules,
but do not want to pass them to
the decompressor (arch/x86/boot/compressed).

Correct?



Kbuild provides a way for users
to pass additional linker flags to modules.
(LDFLAGS_MODULE)


But, there is no way to do that for vmlinux.

It is easy to support it, though.

https://patchwork.kernel.org/patch/10510833/

If this is the one you want, I can merge this.


make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
will allow you to append linker flags.





> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index fa42f895fdde..4f24baa8cdeb 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -42,16 +42,16 @@ KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  GCOV_PROFILE := n
>  UBSAN_SANITIZE :=n
>
> -LDFLAGS := -m elf_$(UTS_MACHINE)
> +override LDFLAGS := -m elf_$(UTS_MACHINE)
>  # Compressed kernel should be built as PIE since it may be loaded at any
>  # address by the bootloader.
>  ifeq ($(CONFIG_X86_32),y)
> -LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
> +override LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
> --no-dynamic-linker)
>  else
>  # To build 64-bit compressed kernel as PIE, we disable relocation
>  # overflow check to avoid relocation overflow error with a new linker
>  # command-line option, -z noreloc-overflow.
> -LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
> +override LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z 
> noreloc-overflow" \
> && echo "-z noreloc-overflow -pie --no-dynamic-linker")
>  endif
>  LDFLAGS_vmlinux := -T
> --
>  Kirill A. Shutemov



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-06 Thread Masahiro Yamada
Hi.


2018-07-05 0:08 GMT+09:00 Kirill A. Shutemov :
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
>> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
>> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
>> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
>> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
>> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
>> > > >> kconfig,
>> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
>> > > >> least)
>> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
>> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
>> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
>> > > >> kernel
>> > > >> config for reference, and am happy to test patches, provide sample 
>> > > >> QCOW
>> > > >> images, etc.
>> > > >
>> > >
>> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
>> > >
>> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
>> > > too with the same symptoms
>> >
>> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
>>
>> -flto in LDFLAGS screws up this part of paging_prepare():
>
> +Masahiro, Michal.
>
> I've got it wrong. *Any* LDFLAGS option passed to make this way:
>
>   make LDFLAGS="..."
>
> would cause a issue. Even empty.
>
> It overrides all assignments to the variable in the makefile.
> As result the image is built without -pie and linker doesn't generate
> position independed code.
>
> Looks like the patch below helps, but my make-fu is poor.
> I don't see many override directives in kernel makefiles.
> It makes me think that there's a better way to fix this.
>
> Hm?


LDFLAGS is for internal-use.
Please do not override it from the command line.


You want to pass your own linker flags
for building vmlinux and modules,
but do not want to pass them to
the decompressor (arch/x86/boot/compressed).

Correct?



Kbuild provides a way for users
to pass additional linker flags to modules.
(LDFLAGS_MODULE)


But, there is no way to do that for vmlinux.

It is easy to support it, though.

https://patchwork.kernel.org/patch/10510833/

If this is the one you want, I can merge this.


make LDFLAGS_KERNEL=...  LDFLAGS_MODULE=...
will allow you to append linker flags.





> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index fa42f895fdde..4f24baa8cdeb 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -42,16 +42,16 @@ KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
>  GCOV_PROFILE := n
>  UBSAN_SANITIZE :=n
>
> -LDFLAGS := -m elf_$(UTS_MACHINE)
> +override LDFLAGS := -m elf_$(UTS_MACHINE)
>  # Compressed kernel should be built as PIE since it may be loaded at any
>  # address by the bootloader.
>  ifeq ($(CONFIG_X86_32),y)
> -LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
> +override LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
> --no-dynamic-linker)
>  else
>  # To build 64-bit compressed kernel as PIE, we disable relocation
>  # overflow check to avoid relocation overflow error with a new linker
>  # command-line option, -z noreloc-overflow.
> -LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
> +override LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z 
> noreloc-overflow" \
> && echo "-z noreloc-overflow -pie --no-dynamic-linker")
>  endif
>  LDFLAGS_vmlinux := -T
> --
>  Kirill A. Shutemov



-- 
Best Regards
Masahiro Yamada


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Benjamin Gilbert
On Wed, Jul 04, 2018 at 06:08:57PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > > 
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > 
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > > 
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > 
> > -flto in LDFLAGS screws up this part of paging_prepare():
> 
> I've got it wrong. *Any* LDFLAGS option passed to make this way:
> 
>   make LDFLAGS="..."
> 
> would cause a issue. Even empty.
> 
> It overrides all assignments to the variable in the makefile.
> As result the image is built without -pie and linker doesn't generate
> position independed code.
> 
> Looks like the patch below helps, but my make-fu is poor.

Sure enough, we're passing LDFLAGS="" to make.  Your patch fixes the boot
failure for me.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Benjamin Gilbert
On Wed, Jul 04, 2018 at 06:08:57PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > > 
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > 
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > > 
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > 
> > -flto in LDFLAGS screws up this part of paging_prepare():
> 
> I've got it wrong. *Any* LDFLAGS option passed to make this way:
> 
>   make LDFLAGS="..."
> 
> would cause a issue. Even empty.
> 
> It overrides all assignments to the variable in the makefile.
> As result the image is built without -pie and linker doesn't generate
> position independed code.
> 
> Looks like the patch below helps, but my make-fu is poor.

Sure enough, we're passing LDFLAGS="" to make.  Your patch fixes the boot
failure for me.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> -flto in LDFLAGS screws up this part of paging_prepare():

+Masahiro, Michal.

I've got it wrong. *Any* LDFLAGS option passed to make this way:

  make LDFLAGS="..."

would cause a issue. Even empty.

It overrides all assignments to the variable in the makefile.
As result the image is built without -pie and linker doesn't generate
position independed code.

Looks like the patch below helps, but my make-fu is poor.
I don't see many override directives in kernel makefiles.
It makes me think that there's a better way to fix this.

Hm?
 
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index fa42f895fdde..4f24baa8cdeb 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -42,16 +42,16 @@ KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
 UBSAN_SANITIZE :=n
 
-LDFLAGS := -m elf_$(UTS_MACHINE)
+override LDFLAGS := -m elf_$(UTS_MACHINE)
 # Compressed kernel should be built as PIE since it may be loaded at any
 # address by the bootloader.
 ifeq ($(CONFIG_X86_32),y)
-LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
+override LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
--no-dynamic-linker)
 else
 # To build 64-bit compressed kernel as PIE, we disable relocation
 # overflow check to avoid relocation overflow error with a new linker
 # command-line option, -z noreloc-overflow.
-LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
+override LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" 
\
&& echo "-z noreloc-overflow -pie --no-dynamic-linker")
 endif
 LDFLAGS_vmlinux := -T
-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> -flto in LDFLAGS screws up this part of paging_prepare():

+Masahiro, Michal.

I've got it wrong. *Any* LDFLAGS option passed to make this way:

  make LDFLAGS="..."

would cause a issue. Even empty.

It overrides all assignments to the variable in the makefile.
As result the image is built without -pie and linker doesn't generate
position independed code.

Looks like the patch below helps, but my make-fu is poor.
I don't see many override directives in kernel makefiles.
It makes me think that there's a better way to fix this.

Hm?
 
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index fa42f895fdde..4f24baa8cdeb 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -42,16 +42,16 @@ KBUILD_AFLAGS  := $(KBUILD_CFLAGS) -D__ASSEMBLY__
 GCOV_PROFILE := n
 UBSAN_SANITIZE :=n
 
-LDFLAGS := -m elf_$(UTS_MACHINE)
+override LDFLAGS := -m elf_$(UTS_MACHINE)
 # Compressed kernel should be built as PIE since it may be loaded at any
 # address by the bootloader.
 ifeq ($(CONFIG_X86_32),y)
-LDFLAGS += $(call ld-option, -pie) $(call ld-option, --no-dynamic-linker)
+override LDFLAGS += $(call ld-option, -pie) $(call ld-option, 
--no-dynamic-linker)
 else
 # To build 64-bit compressed kernel as PIE, we disable relocation
 # overflow check to avoid relocation overflow error with a new linker
 # command-line option, -z noreloc-overflow.
-LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" \
+override LDFLAGS += $(shell $(LD) --help 2>&1 | grep -q "\-z noreloc-overflow" 
\
&& echo "-z noreloc-overflow -pie --no-dynamic-linker")
 endif
 LDFLAGS_vmlinux := -T
-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 11:10:27PM -0400, Benjamin Gilbert wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > I don't know how to solve it. As far as I know we don't support compiling
> > kernel with LTO in mainline.
> > 
> > Any suggestions?
> > 
> > Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?
> 
> We're using the standard build flags as far as I can tell.  In particular,
> we don't enable LTO, and I've verified that -flto isn't in the build logs.
> 
> Here's a sample image:
> 
> https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
> https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
> https://users.developer.core-os.net/bgilbert/4.17/System.map

It's basically the same issue. We have immidiate load instead of
RIP-relative address load.

You can make the vmlinuz bootable with this binary patch:

echo -en "\x8d\x05\xa9\xa9\xff\xff" | dd of=vmlinuz-4.17.3-coreos 
seek=$((0x005d1fc1)) bs=1 conv=notrunc

Now we need to find out how linker gets it wrong.

Please, *after* complete build of the kernel with your toolchain do this:

touch arch/x86/boot/compressed/pgtable_64.c
make V=1

And share your build log.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-04 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 11:10:27PM -0400, Benjamin Gilbert wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > I don't know how to solve it. As far as I know we don't support compiling
> > kernel with LTO in mainline.
> > 
> > Any suggestions?
> > 
> > Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?
> 
> We're using the standard build flags as far as I can tell.  In particular,
> we don't enable LTO, and I've verified that -flto isn't in the build logs.
> 
> Here's a sample image:
> 
> https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
> https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
> https://users.developer.core-os.net/bgilbert/4.17/System.map

It's basically the same issue. We have immidiate load instead of
RIP-relative address load.

You can make the vmlinuz bootable with this binary patch:

echo -en "\x8d\x05\xa9\xa9\xff\xff" | dd of=vmlinuz-4.17.3-coreos 
seek=$((0x005d1fc1)) bs=1 conv=notrunc

Now we need to find out how linker gets it wrong.

Please, *after* complete build of the kernel with your toolchain do this:

touch arch/x86/boot/compressed/pgtable_64.c
make V=1

And share your build log.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Benjamin Gilbert
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?
> 
> Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

We're using the standard build flags as far as I can tell.  In particular,
we don't enable LTO, and I've verified that -flto isn't in the build logs.

Here's a sample image:

https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/System.map

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Benjamin Gilbert
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?
> 
> Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

We're using the standard build flags as far as I can tell.  In particular,
we don't enable LTO, and I've verified that -flto isn't in the build logs.

Here's a sample image:

https://users.developer.core-os.net/bgilbert/4.17/vmlinuz-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/vmlinux-4.17.3-coreos
https://users.developer.core-os.net/bgilbert/4.17/System.map

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Andi Kleen
On Tue, Jul 03, 2018 at 11:26:09PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 11:03:07AM -0700, Andi Kleen wrote:
> > On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > > >> kconfig,
> > > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > > >> least)
> > > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly 
> > > > > >> reboots.
> > > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > > >> 5-level
> > > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > > >> kernel
> > > > > >> config for reference, and am happy to test patches, provide sample 
> > > > > >> QCOW
> > > > > >> images, etc.
> > > > > >
> > > > > 
> > > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > > 
> > > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > > too with the same symptoms
> > > > 
> > > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > > 
> > > -flto in LDFLAGS screws up this part of paging_prepare():
> > 
> > Where is that coming from? The LTO patches are not upstream.
> > 
> > And I don't see any LTO usage in the main line.
> 
> Apparently some distros try to hack it around:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=200385
> 
> I'm amazed that it kinda worked for them.

I think it only works on older gccs that don't default to 
thin LTO, but always generate a fallback non LTO object. 
The kernel directly uses ld in the link step (without my patches), so LTO
shouldn't be able to ever generate code.

The early boot code may be an exception of this, and it's likely
the only code that actually uses LTO in such a set up.

The -fPIC is actually scarier than the -flto. The generated code 
must create quite a mess and I'm not sure why you even would want that
because the kernel can be relocatable without it.

BTW I hope to eventually resend the full LTO patches.
They seem to get more and more users recently, mainly for smaller
code size.

> 
> 
> > >   /* Copy trampoline code in place */
> > >   memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> > > sizeof(unsigned long),
> > >   _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> > 
> > 
> > > In particular, relocation for trampoline_32bit_src solved in the wrong
> > > way. Without -flto, we have rip-realtive address load:
> > > 
> > >   982d30: 48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> > > 97f940 
> > > 
> > > With -flto we have immediate load:
> > > 
> > >   982cf0: 48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi
> > 
> > Strange.
> > 
> > Can you add some RELOC_HIDE()s and see if that helps?
> 
> Nope. No difference in generated code.

Ok will need to put together some self contained test case for the compiler 
people.
I'll try to take a look.

-Andi


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Andi Kleen
On Tue, Jul 03, 2018 at 11:26:09PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 11:03:07AM -0700, Andi Kleen wrote:
> > On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > > >> kconfig,
> > > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > > >> least)
> > > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly 
> > > > > >> reboots.
> > > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > > >> 5-level
> > > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > > >> kernel
> > > > > >> config for reference, and am happy to test patches, provide sample 
> > > > > >> QCOW
> > > > > >> images, etc.
> > > > > >
> > > > > 
> > > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > > 
> > > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > > too with the same symptoms
> > > > 
> > > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > > 
> > > -flto in LDFLAGS screws up this part of paging_prepare():
> > 
> > Where is that coming from? The LTO patches are not upstream.
> > 
> > And I don't see any LTO usage in the main line.
> 
> Apparently some distros try to hack it around:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=200385
> 
> I'm amazed that it kinda worked for them.

I think it only works on older gccs that don't default to 
thin LTO, but always generate a fallback non LTO object. 
The kernel directly uses ld in the link step (without my patches), so LTO
shouldn't be able to ever generate code.

The early boot code may be an exception of this, and it's likely
the only code that actually uses LTO in such a set up.

The -fPIC is actually scarier than the -flto. The generated code 
must create quite a mess and I'm not sure why you even would want that
because the kernel can be relocatable without it.

BTW I hope to eventually resend the full LTO patches.
They seem to get more and more users recently, mainly for smaller
code size.

> 
> 
> > >   /* Copy trampoline code in place */
> > >   memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> > > sizeof(unsigned long),
> > >   _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> > 
> > 
> > > In particular, relocation for trampoline_32bit_src solved in the wrong
> > > way. Without -flto, we have rip-realtive address load:
> > > 
> > >   982d30: 48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> > > 97f940 
> > > 
> > > With -flto we have immediate load:
> > > 
> > >   982cf0: 48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi
> > 
> > Strange.
> > 
> > Can you add some RELOC_HIDE()s and see if that helps?
> 
> Nope. No difference in generated code.

Ok will need to put together some self contained test case for the compiler 
people.
I'll try to take a look.

-Andi


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 11:03:07AM -0700, Andi Kleen wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > > 
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > 
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > > 
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > 
> > -flto in LDFLAGS screws up this part of paging_prepare():
> 
> Where is that coming from? The LTO patches are not upstream.
> 
> And I don't see any LTO usage in the main line.

Apparently some distros try to hack it around:

https://bugzilla.kernel.org/show_bug.cgi?id=200385

I'm amazed that it kinda worked for them.


> > /* Copy trampoline code in place */
> > memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> > sizeof(unsigned long),
> > _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> 
> 
> > In particular, relocation for trampoline_32bit_src solved in the wrong
> > way. Without -flto, we have rip-realtive address load:
> > 
> >   982d30:   48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> > 97f940 
> > 
> > With -flto we have immediate load:
> > 
> >   982cf0:   48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi
> 
> Strange.
> 
> Can you add some RELOC_HIDE()s and see if that helps?

Nope. No difference in generated code.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 11:03:07AM -0700, Andi Kleen wrote:
> On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > > 
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > 
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > > 
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > 
> > -flto in LDFLAGS screws up this part of paging_prepare():
> 
> Where is that coming from? The LTO patches are not upstream.
> 
> And I don't see any LTO usage in the main line.

Apparently some distros try to hack it around:

https://bugzilla.kernel.org/show_bug.cgi?id=200385

I'm amazed that it kinda worked for them.


> > /* Copy trampoline code in place */
> > memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> > sizeof(unsigned long),
> > _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> 
> 
> > In particular, relocation for trampoline_32bit_src solved in the wrong
> > way. Without -flto, we have rip-realtive address load:
> > 
> >   982d30:   48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> > 97f940 
> > 
> > With -flto we have immediate load:
> > 
> >   982cf0:   48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi
> 
> Strange.
> 
> Can you add some RELOC_HIDE()s and see if that helps?

Nope. No difference in generated code.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Andi Kleen
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> -flto in LDFLAGS screws up this part of paging_prepare():

Where is that coming from? The LTO patches are not upstream.

And I don't see any LTO usage in the main line.

> 
>   /* Copy trampoline code in place */
>   memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> sizeof(unsigned long),
>   _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);


> In particular, relocation for trampoline_32bit_src solved in the wrong
> way. Without -flto, we have rip-realtive address load:
> 
>   982d30: 48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> 97f940 
> 
> With -flto we have immediate load:
> 
>   982cf0: 48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi

Strange.

Can you add some RELOC_HIDE()s and see if that helps?

> It only would be okay if bootloader loads kernel at the address we compile
> it for. But it's not usually the case.
> 
> As result we copy garbage into trampoline and crash when trying to execute
> it.
> 
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.

Right.


-Andi


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Andi Kleen
On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> -flto in LDFLAGS screws up this part of paging_prepare():

Where is that coming from? The LTO patches are not upstream.

And I don't see any LTO usage in the main line.

> 
>   /* Copy trampoline code in place */
>   memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> sizeof(unsigned long),
>   _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);


> In particular, relocation for trampoline_32bit_src solved in the wrong
> way. Without -flto, we have rip-realtive address load:
> 
>   982d30: 48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> 97f940 
> 
> With -flto we have immediate load:
> 
>   982cf0: 48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi

Strange.

Can you add some RELOC_HIDE()s and see if that helps?

> It only would be okay if bootloader loads kernel at the address we compile
> it for. But it's not usually the case.
> 
> As result we copy garbage into trampoline and crash when trying to execute
> it.
> 
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.

Right.


-Andi


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> -flto in LDFLAGS screws up this part of paging_prepare():
> 
>   /* Copy trampoline code in place */
>   memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> sizeof(unsigned long),
>   _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> 
> In particular, relocation for trampoline_32bit_src solved in the wrong
> way. Without -flto, we have rip-realtive address load:
> 
>   982d30: 48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> 97f940 
>
> With -flto we have immediate load:
> 
>   982cf0: 48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi
> 
> It only would be okay if bootloader loads kernel at the address we compile
> it for. But it's not usually the case.
> 
> As result we copy garbage into trampoline and crash when trying to execute
> it.
> 
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?

LTO is broken. The boot code is compiled as position independent, so this
'optimization' is pure garbage.

Thanks,

tglx



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> -flto in LDFLAGS screws up this part of paging_prepare():
> 
>   /* Copy trampoline code in place */
>   memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
> sizeof(unsigned long),
>   _32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> 
> In particular, relocation for trampoline_32bit_src solved in the wrong
> way. Without -flto, we have rip-realtive address load:
> 
>   982d30: 48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
> 97f940 
>
> With -flto we have immediate load:
> 
>   982cf0: 48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi
> 
> It only would be okay if bootloader loads kernel at the address we compile
> it for. But it's not usually the case.
> 
> As result we copy garbage into trampoline and crash when trying to execute
> it.
> 
> I don't know how to solve it. As far as I know we don't support compiling
> kernel with LTO in mainline.
> 
> Any suggestions?

LTO is broken. The boot code is compiled as position independent, so this
'optimization' is pure garbage.

Thanks,

tglx



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > >> config for reference, and am happy to test patches, provide sample QCOW
> > >> images, etc.
> > >
> > 
> > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > 
> > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > too with the same symptoms
> 
> I tracked it down to -flto in LDFLAGS. I'll look more into this.

-flto in LDFLAGS screws up this part of paging_prepare():

/* Copy trampoline code in place */
memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
sizeof(unsigned long),
_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);

In particular, relocation for trampoline_32bit_src solved in the wrong
way. Without -flto, we have rip-realtive address load:

  982d30:   48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
97f940 

With -flto we have immediate load:

  982cf0:   48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi

It only would be okay if bootloader loads kernel at the address we compile
it for. But it's not usually the case.

As result we copy garbage into trampoline and crash when trying to execute
it.

I don't know how to solve it. As far as I know we don't support compiling
kernel with LTO in mainline.

Any suggestions?

Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > >> config for reference, and am happy to test patches, provide sample QCOW
> > >> images, etc.
> > >
> > 
> > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > 
> > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > too with the same symptoms
> 
> I tracked it down to -flto in LDFLAGS. I'll look more into this.

-flto in LDFLAGS screws up this part of paging_prepare():

/* Copy trampoline code in place */
memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / 
sizeof(unsigned long),
_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);

In particular, relocation for trampoline_32bit_src solved in the wrong
way. Without -flto, we have rip-realtive address load:

  982d30:   48 8d 35 09 cc ff fflea-0x33f7(%rip),%rsi# 
97f940 

With -flto we have immediate load:

  982cf0:   48 c7 c6 f0 f8 97 00mov$0x97f8f0,%rsi

It only would be okay if bootloader loads kernel at the address we compile
it for. But it's not usually the case.

As result we copy garbage into trampoline and crash when trying to execute
it.

I don't know how to solve it. As far as I know we don't support compiling
kernel with LTO in mainline.

Any suggestions?

Benjamin, do you change LDFLAGS or CFLAGS when compiling the kernel?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Bernhard Rosenkraenzer wrote:
> On Tuesday, July 03, 2018 16:02 CEST, Thomas Gleixner  
> wrote:
> > On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > >
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > >
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > >
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> >
> > And what sets -flto in LDFLAGS? I can't find anything in the kernel
> > build/Makefiles.
> 
> The kernel doesn't use -flto by default. Some people (and distros) set
> -flto in CFLAGS and LDFLAGS manually hoping to get a few extra
> optimizations.  This never caused any problems before 0a1756bd - would be
> nice to keep it working.

Maybe it never caused any obvious problems, but there is a reason why LTO
is not supported by the kernel, GCC and its LTO history being one of them.

Thanks,

tglx


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Bernhard Rosenkraenzer wrote:
> On Tuesday, July 03, 2018 16:02 CEST, Thomas Gleixner  
> wrote:
> > On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > > >> kconfig,
> > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at 
> > > > >> least)
> > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 
> > > > >> 5-level
> > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our 
> > > > >> kernel
> > > > >> config for reference, and am happy to test patches, provide sample 
> > > > >> QCOW
> > > > >> images, etc.
> > > > >
> > > >
> > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > >
> > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > too with the same symptoms
> > >
> > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> >
> > And what sets -flto in LDFLAGS? I can't find anything in the kernel
> > build/Makefiles.
> 
> The kernel doesn't use -flto by default. Some people (and distros) set
> -flto in CFLAGS and LDFLAGS manually hoping to get a few extra
> optimizations.  This never caused any problems before 0a1756bd - would be
> nice to keep it working.

Maybe it never caused any obvious problems, but there is a reason why LTO
is not supported by the kernel, GCC and its LTO history being one of them.

Thanks,

tglx


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Bernhard Rosenkraenzer
On Tuesday, July 03, 2018 16:02 CEST, Thomas Gleixner  
wrote: 
> On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> And what sets -flto in LDFLAGS? I can't find anything in the kernel
> build/Makefiles.

The kernel doesn't use -flto by default. Some people (and distros) set -flto in 
CFLAGS and LDFLAGS manually hoping to get a few extra optimizations.
This never caused any problems before 0a1756bd - would be nice to keep it 
working.

ttyl
bero



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Bernhard Rosenkraenzer
On Tuesday, July 03, 2018 16:02 CEST, Thomas Gleixner  
wrote: 
> On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and 
> > > >> kconfig,
> > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > >> images, etc.
> > > >
> > > 
> > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > 
> > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > too with the same symptoms
> > 
> > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> 
> And what sets -flto in LDFLAGS? I can't find anything in the kernel
> build/Makefiles.

The kernel doesn't use -flto by default. Some people (and distros) set -flto in 
CFLAGS and LDFLAGS manually hoping to get a few extra optimizations.
This never caused any problems before 0a1756bd - would be nice to keep it 
working.

ttyl
bero



Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > >> config for reference, and am happy to test patches, provide sample QCOW
> > >> images, etc.
> > >
> > 
> > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > 
> > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > too with the same symptoms
> 
> I tracked it down to -flto in LDFLAGS. I'll look more into this.

And what sets -flto in LDFLAGS? I can't find anything in the kernel
build/Makefiles.

Thanks,

tglx


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > >> config for reference, and am happy to test patches, provide sample QCOW
> > >> images, etc.
> > >
> > 
> > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > 
> > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > too with the same symptoms
> 
> I tracked it down to -flto in LDFLAGS. I'll look more into this.

And what sets -flto in LDFLAGS? I can't find anything in the kernel
build/Makefiles.

Thanks,

tglx


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> >> config for reference, and am happy to test patches, provide sample QCOW
> >> images, etc.
> >
> 
> Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> 
> 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> too with the same symptoms

I tracked it down to -flto in LDFLAGS. I'll look more into this.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> >> config for reference, and am happy to test patches, provide sample QCOW
> >> images, etc.
> >
> 
> Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> 
> 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> too with the same symptoms

I tracked it down to -flto in LDFLAGS. I'll look more into this.

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Gabriel C
2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
>> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
>> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
>> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
>> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
>> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
>> config for reference, and am happy to test patches, provide sample QCOW
>> images, etc.
>

Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,

0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
too with the same symptoms

BR


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Gabriel C
2018-07-01 23:32 GMT+02:00 Benjamin Gilbert :
> On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
>> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
>> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
>> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
>> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
>> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
>> config for reference, and am happy to test patches, provide sample QCOW
>> images, etc.
>

Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,

0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
too with the same symptoms

BR


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 10:59:48AM +0200, Thomas Gleixner wrote:
> On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> 
> > On Mon, Jul 02, 2018 at 07:01:28PM +, Benjamin Gilbert wrote:
> > > On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> > > > Could you check if you can trigger the issue with my changes to config 
> > > > and
> > > > the way I run KVM?
> > > 
> > > Yes, the issue still triggers in that case.  I've also verified that the
> > > kernel boots normally with your qemu command if the commit is reverted.
> > 
> > Hm. What toolchain do you use?
> 
> On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 

I've built the kernel with toolchain from CoreOS alpha (gcc-7.3.0,
binutils-2.29.1). Still cannot trigger the problem.

Benjamin, could you share the kernel image?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Tue, Jul 03, 2018 at 10:59:48AM +0200, Thomas Gleixner wrote:
> On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:
> 
> > On Mon, Jul 02, 2018 at 07:01:28PM +, Benjamin Gilbert wrote:
> > > On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> > > > Could you check if you can trigger the issue with my changes to config 
> > > > and
> > > > the way I run KVM?
> > > 
> > > Yes, the issue still triggers in that case.  I've also verified that the
> > > kernel boots normally with your qemu command if the commit is reverted.
> > 
> > Hm. What toolchain do you use?
> 
> On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 

I've built the kernel with toolchain from CoreOS alpha (gcc-7.3.0,
binutils-2.29.1). Still cannot trigger the problem.

Benjamin, could you share the kernel image?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:

> On Mon, Jul 02, 2018 at 07:01:28PM +, Benjamin Gilbert wrote:
> > On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> > > Could you check if you can trigger the issue with my changes to config and
> > > the way I run KVM?
> > 
> > Yes, the issue still triggers in that case.  I've also verified that the
> > kernel boots normally with your qemu command if the commit is reverted.
> 
> Hm. What toolchain do you use?

On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Thomas Gleixner
On Tue, 3 Jul 2018, Kirill A. Shutemov wrote:

> On Mon, Jul 02, 2018 at 07:01:28PM +, Benjamin Gilbert wrote:
> > On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> > > Could you check if you can trigger the issue with my changes to config and
> > > the way I run KVM?
> > 
> > Yes, the issue still triggers in that case.  I've also verified that the
> > kernel boots normally with your qemu command if the commit is reverted.
> 
> Hm. What toolchain do you use?

On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Mon, Jul 02, 2018 at 07:01:28PM +, Benjamin Gilbert wrote:
> On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> > Could you check if you can trigger the issue with my changes to config and
> > the way I run KVM?
> 
> Yes, the issue still triggers in that case.  I've also verified that the
> kernel boots normally with your qemu command if the commit is reverted.

Hm. What toolchain do you use?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-03 Thread Kirill A. Shutemov
On Mon, Jul 02, 2018 at 07:01:28PM +, Benjamin Gilbert wrote:
> On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> > Could you check if you can trigger the issue with my changes to config and
> > the way I run KVM?
> 
> Yes, the issue still triggers in that case.  I've also verified that the
> kernel boots normally with your qemu command if the commit is reverted.

Hm. What toolchain do you use?

-- 
 Kirill A. Shutemov


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-02 Thread Benjamin Gilbert
On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> Could you check if you can trigger the issue with my changes to config and
> the way I run KVM?

Yes, the issue still triggers in that case.  I've also verified that the
kernel boots normally with your qemu command if the commit is reverted.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-02 Thread Benjamin Gilbert
On Mon, Jul 02, 2018 at 12:34:50PM +0300, Kirill A. Shutemov wrote:
> Could you check if you can trigger the issue with my changes to config and
> the way I run KVM?

Yes, the issue still triggers in that case.  I've also verified that the
kernel boots normally with your qemu command if the commit is reverted.

--Benjamin Gilbert


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-01 Thread Benjamin Gilbert
On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 
> up to and including 4.17.3, fail to boot on AMD64 running in (at least) 
> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.  
> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level 
> paging boot if kernel is above 4G") fixes it.  I've attached our kernel 
> config for reference, and am happy to test patches, provide sample QCOW 
> images, etc.

Adding linux-x86_64, LKML.

--Benjamin Gilbert


config.gz
Description: application/gzip


Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G"

2018-07-01 Thread Benjamin Gilbert
On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig, 
> up to and including 4.17.3, fail to boot on AMD64 running in (at least) 
> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.  
> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level 
> paging boot if kernel is above 4G") fixes it.  I've attached our kernel 
> config for reference, and am happy to test patches, provide sample QCOW 
> images, etc.

Adding linux-x86_64, LKML.

--Benjamin Gilbert


config.gz
Description: application/gzip