Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread John Hubbard

On 8/21/19 12:49 PM, Neil MacLeod wrote:

The fix looks good - many thanks for the quick turnaround!



Great news, and thanks for the bug report!

thanks,
--
John Hubbard
NVIDIA



On Wed, 21 Aug 2019 at 19:56, John Hubbard  wrote:


On 8/21/19 11:51 AM, Thomas Gleixner wrote:

On Wed, 21 Aug 2019, John Hubbard wrote:

On 8/21/19 10:05 AM, Neil MacLeod wrote:
static void sanitize_boot_params(struct boot_params *boot_params)
{
...
  const struct boot_params_to_save to_save[] = {
  BOOT_PARAM_PRESERVE(screen_info),
  BOOT_PARAM_PRESERVE(apm_bios_info),
  BOOT_PARAM_PRESERVE(tboot_addr),
  BOOT_PARAM_PRESERVE(ist_info),
  BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
  BOOT_PARAM_PRESERVE(hd0_info),
  BOOT_PARAM_PRESERVE(hd1_info),
  BOOT_PARAM_PRESERVE(sys_desc_table),
  BOOT_PARAM_PRESERVE(olpc_ofw_header),
  BOOT_PARAM_PRESERVE(efi_info),
  BOOT_PARAM_PRESERVE(alt_mem_k),
  BOOT_PARAM_PRESERVE(scratch),
  BOOT_PARAM_PRESERVE(e820_entries),
  BOOT_PARAM_PRESERVE(eddbuf_entries),
  BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
  BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
  BOOT_PARAM_PRESERVE(e820_table),
  BOOT_PARAM_PRESERVE(eddbuf),
  };


I think I spotted it:

-   boot_params->acpi_rsdp_addr = 0;

+ BOOT_PARAM_PRESERVE(acpi_rsdp_addr),

And it does not preserve 'hdr'

Grr. I surely was too tired when staring at this last time.



ohhh man, that's embarrassing. Especially hdr, which was the center of
the whole thing...sigh. Patch coming shortly.


thanks,
--
John Hubbard
NVIDIA


Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread Neil MacLeod
The fix looks good - many thanks for the quick turnaround!

Neil

On Wed, 21 Aug 2019 at 19:56, John Hubbard  wrote:
>
> On 8/21/19 11:51 AM, Thomas Gleixner wrote:
> > On Wed, 21 Aug 2019, John Hubbard wrote:
> >> On 8/21/19 10:05 AM, Neil MacLeod wrote:
> >> static void sanitize_boot_params(struct boot_params *boot_params)
> >> {
> >> ...
> >>  const struct boot_params_to_save to_save[] = {
> >>  BOOT_PARAM_PRESERVE(screen_info),
> >>  BOOT_PARAM_PRESERVE(apm_bios_info),
> >>  BOOT_PARAM_PRESERVE(tboot_addr),
> >>  BOOT_PARAM_PRESERVE(ist_info),
> >>  BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
> >>  BOOT_PARAM_PRESERVE(hd0_info),
> >>  BOOT_PARAM_PRESERVE(hd1_info),
> >>  BOOT_PARAM_PRESERVE(sys_desc_table),
> >>  BOOT_PARAM_PRESERVE(olpc_ofw_header),
> >>  BOOT_PARAM_PRESERVE(efi_info),
> >>  BOOT_PARAM_PRESERVE(alt_mem_k),
> >>  BOOT_PARAM_PRESERVE(scratch),
> >>  BOOT_PARAM_PRESERVE(e820_entries),
> >>  BOOT_PARAM_PRESERVE(eddbuf_entries),
> >>  BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
> >>  BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
> >>  BOOT_PARAM_PRESERVE(e820_table),
> >>  BOOT_PARAM_PRESERVE(eddbuf),
> >>  };
> >
> > I think I spotted it:
> >
> > -   boot_params->acpi_rsdp_addr = 0;
> >
> > + BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
> >
> > And it does not preserve 'hdr'
> >
> > Grr. I surely was too tired when staring at this last time.
> >
>
> ohhh man, that's embarrassing. Especially hdr, which was the center of
> the whole thing...sigh. Patch coming shortly.
>
>
> thanks,
> --
> John Hubbard
> NVIDIA


Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread John Hubbard

On 8/21/19 11:51 AM, Thomas Gleixner wrote:

On Wed, 21 Aug 2019, John Hubbard wrote:

On 8/21/19 10:05 AM, Neil MacLeod wrote:
static void sanitize_boot_params(struct boot_params *boot_params)
{
...
const struct boot_params_to_save to_save[] = {
BOOT_PARAM_PRESERVE(screen_info),
BOOT_PARAM_PRESERVE(apm_bios_info),
BOOT_PARAM_PRESERVE(tboot_addr),
BOOT_PARAM_PRESERVE(ist_info),
BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
BOOT_PARAM_PRESERVE(hd0_info),
BOOT_PARAM_PRESERVE(hd1_info),
BOOT_PARAM_PRESERVE(sys_desc_table),
BOOT_PARAM_PRESERVE(olpc_ofw_header),
BOOT_PARAM_PRESERVE(efi_info),
BOOT_PARAM_PRESERVE(alt_mem_k),
BOOT_PARAM_PRESERVE(scratch),
BOOT_PARAM_PRESERVE(e820_entries),
BOOT_PARAM_PRESERVE(eddbuf_entries),
BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
BOOT_PARAM_PRESERVE(e820_table),
BOOT_PARAM_PRESERVE(eddbuf),
};


I think I spotted it:

-   boot_params->acpi_rsdp_addr = 0;

+   BOOT_PARAM_PRESERVE(acpi_rsdp_addr),

And it does not preserve 'hdr'

Grr. I surely was too tired when staring at this last time.



ohhh man, that's embarrassing. Especially hdr, which was the center of
the whole thing...sigh. Patch coming shortly.


thanks,
--
John Hubbard
NVIDIA


Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread Neil MacLeod
John & Thomas

Thanks both - if you can ping me a suitable patch I'll test it and let
you all know ASAP!

Neil

On Wed, 21 Aug 2019 at 19:51, Thomas Gleixner  wrote:
>
> On Wed, 21 Aug 2019, John Hubbard wrote:
> > On 8/21/19 10:05 AM, Neil MacLeod wrote:
> > static void sanitize_boot_params(struct boot_params *boot_params)
> > {
> > ...
> >   const struct boot_params_to_save to_save[] = {
> >   BOOT_PARAM_PRESERVE(screen_info),
> >   BOOT_PARAM_PRESERVE(apm_bios_info),
> >   BOOT_PARAM_PRESERVE(tboot_addr),
> >   BOOT_PARAM_PRESERVE(ist_info),
> >   BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
> >   BOOT_PARAM_PRESERVE(hd0_info),
> >   BOOT_PARAM_PRESERVE(hd1_info),
> >   BOOT_PARAM_PRESERVE(sys_desc_table),
> >   BOOT_PARAM_PRESERVE(olpc_ofw_header),
> >   BOOT_PARAM_PRESERVE(efi_info),
> >   BOOT_PARAM_PRESERVE(alt_mem_k),
> >   BOOT_PARAM_PRESERVE(scratch),
> >   BOOT_PARAM_PRESERVE(e820_entries),
> >   BOOT_PARAM_PRESERVE(eddbuf_entries),
> >   BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
> >   BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
> >   BOOT_PARAM_PRESERVE(e820_table),
> >   BOOT_PARAM_PRESERVE(eddbuf),
> >   };
>
> I think I spotted it:
>
> -   boot_params->acpi_rsdp_addr = 0;
>
> +   BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
>
> And it does not preserve 'hdr'
>
> Grr. I surely was too tired when staring at this last time.
>
> Thanks,
>
> tglx


Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread Thomas Gleixner
On Wed, 21 Aug 2019, John Hubbard wrote:
> On 8/21/19 10:05 AM, Neil MacLeod wrote:
> static void sanitize_boot_params(struct boot_params *boot_params)
> {
> ...
>   const struct boot_params_to_save to_save[] = {
>   BOOT_PARAM_PRESERVE(screen_info),
>   BOOT_PARAM_PRESERVE(apm_bios_info),
>   BOOT_PARAM_PRESERVE(tboot_addr),
>   BOOT_PARAM_PRESERVE(ist_info),
>   BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
>   BOOT_PARAM_PRESERVE(hd0_info),
>   BOOT_PARAM_PRESERVE(hd1_info),
>   BOOT_PARAM_PRESERVE(sys_desc_table),
>   BOOT_PARAM_PRESERVE(olpc_ofw_header),
>   BOOT_PARAM_PRESERVE(efi_info),
>   BOOT_PARAM_PRESERVE(alt_mem_k),
>   BOOT_PARAM_PRESERVE(scratch),
>   BOOT_PARAM_PRESERVE(e820_entries),
>   BOOT_PARAM_PRESERVE(eddbuf_entries),
>   BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
>   BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
>   BOOT_PARAM_PRESERVE(e820_table),
>   BOOT_PARAM_PRESERVE(eddbuf),
>   };

I think I spotted it:

-   boot_params->acpi_rsdp_addr = 0;

+   BOOT_PARAM_PRESERVE(acpi_rsdp_addr),

And it does not preserve 'hdr'

Grr. I surely was too tired when staring at this last time.

Thanks,

tglx


Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread John Hubbard

On 8/21/19 11:32 AM, Neil MacLeod wrote:

Hi John

I can test any patches given a link to the diff, and am happy to do so.

If I've understood your suggestion correctly, I will try commenting
out all of the entries, then add them back one-by-one until I get a
non-booting situation. I'll let you know how I get on.



I was actually going the other direction. Note that the BOOT_PARAM_PRESERVE
entries indicate what *not* to zero out. So if you remove them all, then
everything gets zeroed, including lots of critical fields, and you
definitely won't boot up. (The tricky part is we don't yet know whether
fields are missing, need to be added--or both.)

So I was heading toward adding most (but not all--important) of these fields,
as BOOT_PARAM_PRESERVE entries, as a first bisect step. Let me work that up
and post a patch for that.

struct boot_params {
struct screen_info screen_info; /* 0x000 */
struct apm_bios_info apm_bios_info; /* 0x040 */
__u8  _pad2[4]; /* 0x054 */
__u64  tboot_addr;  /* 0x058 */
struct ist_info ist_info;   /* 0x060 */
__u64 acpi_rsdp_addr;   /* 0x070 */
__u8  _pad3[8]; /* 0x078 */
__u8  hd0_info[16]; /* obsolete! */ /* 0x080 */
__u8  hd1_info[16]; /* obsolete! */ /* 0x090 */
struct sys_desc_table sys_desc_table; /* obsolete! */   /* 0x0a0 */
struct olpc_ofw_header olpc_ofw_header; /* 0x0b0 */
__u32 ext_ramdisk_image;/* 0x0c0 */
__u32 ext_ramdisk_size; /* 0x0c4 */
__u32 ext_cmd_line_ptr; /* 0x0c8 */
__u8  _pad4[116];   /* 0x0cc */
struct edid_info edid_info; /* 0x140 */
struct efi_info efi_info;   /* 0x1c0 */
__u32 alt_mem_k;/* 0x1e0 */
__u32 scratch;  /* Scratch field! *//* 0x1e4 */
__u8  e820_entries; /* 0x1e8 */
__u8  eddbuf_entries;   /* 0x1e9 */
__u8  edd_mbr_sig_buf_entries;  /* 0x1ea */
__u8  kbd_status;   /* 0x1eb */
__u8  secure_boot;  /* 0x1ec */
__u8  _pad5[2]; /* 0x1ed */
/*
 * The sentinel is set to a nonzero value (0xff) in header.S.
 *
 * A bootloader is supposed to only take setup_header and put
 * it into a clean boot_params buffer. If it turns out that
 * it is clumsy or too generous with the buffer, it most
 * probably will pick up the sentinel variable too. The fact
 * that this variable then is still 0xff will let kernel
 * know that some variables in boot_params are invalid and
 * kernel should zero out certain portions of boot_params.
 */
__u8  sentinel; /* 0x1ef */
__u8  _pad6[1]; /* 0x1f0 */
struct setup_header hdr;/* setup header */  /* 0x1f1 */
__u8  _pad7[0x290-0x1f1-sizeof(struct setup_header)];
__u32 edd_mbr_sig_buffer[EDD_MBR_SIG_MAX];  /* 0x290 */
struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 
*/
__u8  _pad8[48];/* 0xcd0 */
struct edd_info eddbuf[EDDMAXNR];   /* 0xd00 */
__u8  _pad9[276];   /* 0xeec */
} __attribute__((packed));



thanks,
--
John Hubbard
NVIDIA



The OS I'm testing is LibreELEC, which is a custom x86_64 build, and
this hasn't shown any problems on this Skylake i5 NUC in all the years
I've been using it as a test system (since at least 4.15.y). So far
5.3-rc has been trouble-free until 5.3-rc5. As I mentioned in the bug,
I've been able to boot 5.3-rc5 from a USB memory stick, but can't boot
5.3-rc5 from the internal M2 drive unless I revert this commit.



Re: Boot failure due to: x86/boot: Save fields explicitly, zero out everything else

2019-08-21 Thread Neil MacLeod
Hi John

I can test any patches given a link to the diff, and am happy to do so.

If I've understood your suggestion correctly, I will try commenting
out all of the entries, then add them back one-by-one until I get a
non-booting situation. I'll let you know how I get on.

The OS I'm testing is LibreELEC, which is a custom x86_64 build, and
this hasn't shown any problems on this Skylake i5 NUC in all the years
I've been using it as a test system (since at least 4.15.y). So far
5.3-rc has been trouble-free until 5.3-rc5. As I mentioned in the bug,
I've been able to boot 5.3-rc5 from a USB memory stick, but can't boot
5.3-rc5 from the internal M2 drive unless I revert this commit.

Regards
Neil

On Wed, 21 Aug 2019 at 19:20, John Hubbard  wrote:
>
> On 8/21/19 10:05 AM, Neil MacLeod wrote:
> > Hi John
> >
> > The following bug might be of interest:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=204645
> >
> > Let me know if I can be of any help.
> >
>
> Hi Neil,
>
> First of all, I'm pasting in the bug information so that it's directly 
> available
> here:
>
> ===
> Description: Neil MacLeod 2019-08-21 16:29:19 UTC
> System: Intel i5 Skylake NUC (NUC6i5SYH)
>
> This system boots fine from internal M2 (128GB) drive with 5.3-rc4.
>
> With 5.3-rc5, it does not boot from M2 and is stuck on the Intel splash 
> screen (no other text is displayed, no panic etc.). It will boot 5.3-rc5 from 
> a USB flash memory stick (via the F10 boot menu), but not from the internal 
> M2.
>
> Bisecting between 5.3-rc4 and 5.3-rc5, the bad commit is:
>
> neil@nm-linux:~/projects/pullrequest_repos/torvalds-linux$ git bisect bad
> a90118c445cc7f07781de26a9684d4ec58bfcfd1 is the first bad commit
> commit a90118c445cc7f07781de26a9684d4ec58bfcfd1
> Author: John Hubbard 
> Date:   Tue Jul 30 22:46:27 2019 -0700
>
>  x86/boot: Save fields explicitly, zero out everything else
>
>  Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
>  memset, if the memset goes accross several fields of a struct. This
>  generated a couple of warnings on x86_64 builds in 
> sanitize_boot_params().
>
>  Fix this by explicitly saving the fields in struct boot_params
>  that are intended to be preserved, and zeroing all the rest.
>
>  [ tglx: Tagged for stable as it breaks the warning free build there as 
> well ]
>
>  Suggested-by: Thomas Gleixner 
>  Suggested-by: H. Peter Anvin 
>  Signed-off-by: John Hubbard 
>  Signed-off-by: Thomas Gleixner 
>  Cc: sta...@vger.kernel.org
>  Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubb...@nvidia.com
>
> :04 04 e0963edca990540dd759765a3d765af4698df892 
> d07e645eb3a500c31bd65526205e286ff6941187 M  arch
>
> Comment 1 Neil MacLeod 2019-08-21 16:31:35 UTC
> The kernel is built with gcc-9.2.0.
>
> Comment 2 Neil MacLeod 2019-08-21 16:55:48 UTC
>
> 5.3-rc5 with "x86/boot: Save fields explicitly, zero out everything else" 
> reverted will build with gcc-9.2.0, and boot from M2.
> ===
>
> I'm also CC'ing the lists, so they know that the patch has caused a 
> regression, and
> also out of hope that they can help me choose the shortest path forward to 
> debugging
> this. My first reaction is that the list of BOOT_PARAM_PRESERVE() fields is
> flawed--by which I include cases of  "the system improperly relied on a field 
> that the spec said
> should be zeroed". (After all, the boot_params->sentinel is set, which 
> already means
> the system is not really doing it right.)
>
> So I'm going to cheat and ask right now if anyone notices either ommissions
> or wrong entries here:
>
> static void sanitize_boot_params(struct boot_params *boot_params)
> {
> ...
> const struct boot_params_to_save to_save[] = {
> BOOT_PARAM_PRESERVE(screen_info),
> BOOT_PARAM_PRESERVE(apm_bios_info),
> BOOT_PARAM_PRESERVE(tboot_addr),
> BOOT_PARAM_PRESERVE(ist_info),
> BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
> BOOT_PARAM_PRESERVE(hd0_info),
> BOOT_PARAM_PRESERVE(hd1_info),
> BOOT_PARAM_PRESERVE(sys_desc_table),
> BOOT_PARAM_PRESERVE(olpc_ofw_header),
> BOOT_PARAM_PRESERVE(efi_info),
> BOOT_PARAM_PRESERVE(alt_mem_k),
> BOOT_PARAM_PRESERVE(scratch),
> BOOT_PARAM_PRESERVE(e820_entries),
> BOOT_PARAM_PRESERVE(eddbuf_entries),
> BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
> BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
> BOOT_PARAM_PRESERVE(e820_table),
> BOOT_PARAM_PRESERVE(eddbuf),
>