Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
On January 31, 2023 1:22:43 PM PST, "Jason A. Donenfeld" wrote: >On Tue, Jan 31, 2023, 15:55 H. Peter Anvin wrote: > >> On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" >> wrote: >> >From: "Jason A. Donenfeld" >> > >> >The setup_data links are appended to the compressed kernel image. Since >> >the kernel image is typically loaded at 0x10, setup_data lives at >> >`0x10 + compressed_size`, which does not get relocated during the >> >kernel's boot process. >> > >> >The kernel typically decompresses the image starting at address >> >0x100 (note: there's one more zero there than the compressed image >> >above). This usually is fine for most kernels. >> > >> >However, if the compressed image is actually quite large, then >> >setup_data will live at a `0x10 + compressed_size` that extends into >> >the decompressed zone at 0x100. In other words, if compressed_size >> >is larger than `0x100 - 0x10`, then the decompression step will >> >clobber setup_data, resulting in crashes. >> > >> >Visually, what happens now is that QEMU appends setup_data to the kernel >> >image: >> > >> > kernel imagesetup_data >> > |--||| >> >0x10 0x10+l1 0x10+l1+l2 >> > >> >The problem is that this decompresses to 0x100 (one more zero). So >> >if l1 is > (0x100-0x10), then this winds up looking like: >> > >> > kernel imagesetup_data >> > |--||| >> >0x10 0x10+l1 0x10+l1+l2 >> > >> > d e c o m p r e s s e d k e r n e l >> > >> |-| >> >0x100 >> 0x100+l3 >> > >> >The decompressed kernel seemingly overwriting the compressed kernel >> >image isn't a problem, because that gets relocated to a higher address >> >early on in the boot process, at the end of startup_64. setup_data, >> >however, stays in the same place, since those links are self referential >> >and nothing fixes them up. So the decompressed kernel clobbers it. >> > >> >Fix this by appending setup_data to the cmdline blob rather than the >> >kernel image blob, which remains at a lower address that won't get >> >clobbered. >> > >> >This could have been done by overwriting the initrd blob instead, but >> >that poses big difficulties, such as no longer being able to use memory >> >mapped files for initrd, hurting performance, and, more importantly, the >> >initrd address calculation is hard coded in qboot, and it always grows >> >down rather than up, which means lots of brittle semantics would have to >> >be changed around, incurring more complexity. In contrast, using cmdline >> >is simple and doesn't interfere with anything. >> > >> >The microvm machine has a gross hack where it fiddles with fw_cfg data >> >after the fact. So this hack is updated to account for this appending, >> >by reserving some bytes. >> > >> >Fixup-by: Michael S. Tsirkin >> >Cc: x...@kernel.org >> >Cc: Philippe Mathieu-Daudé >> >Cc: H. Peter Anvin >> >Cc: Borislav Petkov >> >Cc: Eric Biggers >> >Signed-off-by: Jason A. Donenfeld >> >Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> >> >Message-ID: <20230128061015-mutt-send-email-...@kernel.org> >> >Reviewed-by: Michael S. Tsirkin >> >Signed-off-by: Michael S. Tsirkin >> >Tested-by: Eric Biggers >> >Tested-by: Mathias Krause >> >--- >> > include/hw/i386/microvm.h | 5 ++-- >> > include/hw/nvram/fw_cfg.h | 9 +++ >> > hw/i386/microvm.c | 15 +++ >> > hw/i386/x86.c | 52 +-- >> > hw/nvram/fw_cfg.c | 9 +++ >> > 5 files changed, 59 insertions(+), 31 deletions(-) >> > >> >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h >> >index fad97a891d..e8af61f194 100644 >> >--- a/include/hw/i386/microvm.h >> >+++ b/include/hw/i386/microvm.h >> >@@ -50,8 +50,9 @@ >> > */ >> > >> > /* Platform virtio definitions */ >> >-#define VIRTIO_MMIO_BASE 0xfeb0 >> >-#define VIRTIO_CMDLINE_MAXLEN 64 >> >+#define VIRTIO_MMIO_BASE0xfeb0 >> >+#define VIRTIO_CMDLINE_MAXLEN 64 >> >+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN((VIRTIO_CMDLINE_MAXLEN + 1) * >> 16) >> > >> > #define GED_MMIO_BASE 0xfea0 >> > #define GED_MMIO_BASE_MEMHP (GED_MMIO_BASE + 0x100) >> >diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h >> >index 2e503904dc..990dcdbb2e 100644 >> >--- a/include/hw/nvram/fw_cfg.h >> >+++ b/include/hw/nvram/fw_cfg.h >> >@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, >> uint16_t key, >> >void *data, size_t len, >> >bool read_only); >> > >> >+/** >> >+ * fw_cfg_read_bytes_ptr: >> >+ * @s: fw_cfg device being modified >> >+ * @key: selector key value for new fw_cfg item >> >+ *
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote: > From: "Jason A. Donenfeld" > > The setup_data links are appended to the compressed kernel image. Since > the kernel image is typically loaded at 0x10, setup_data lives at > `0x10 + compressed_size`, which does not get relocated during the > kernel's boot process. > > The kernel typically decompresses the image starting at address > 0x100 (note: there's one more zero there than the compressed image > above). This usually is fine for most kernels. > > However, if the compressed image is actually quite large, then > setup_data will live at a `0x10 + compressed_size` that extends into > the decompressed zone at 0x100. In other words, if compressed_size > is larger than `0x100 - 0x10`, then the decompression step will > clobber setup_data, resulting in crashes. > > Visually, what happens now is that QEMU appends setup_data to the kernel > image: > > kernel imagesetup_data >|--||| > 0x10 0x10+l1 0x10+l1+l2 > > The problem is that this decompresses to 0x100 (one more zero). So > if l1 is > (0x100-0x10), then this winds up looking like: > > kernel imagesetup_data >|--||| > 0x10 0x10+l1 0x10+l1+l2 > > d e c o m p r e s s e d k e r n e l > > |-| > 0x100 > 0x100+l3 > > The decompressed kernel seemingly overwriting the compressed kernel > image isn't a problem, because that gets relocated to a higher address > early on in the boot process, at the end of startup_64. setup_data, > however, stays in the same place, since those links are self referential > and nothing fixes them up. So the decompressed kernel clobbers it. > > Fix this by appending setup_data to the cmdline blob rather than the > kernel image blob, which remains at a lower address that won't get > clobbered. > > This could have been done by overwriting the initrd blob instead, but > that poses big difficulties, such as no longer being able to use memory > mapped files for initrd, hurting performance, and, more importantly, the > initrd address calculation is hard coded in qboot, and it always grows > down rather than up, which means lots of brittle semantics would have to > be changed around, incurring more complexity. In contrast, using cmdline > is simple and doesn't interfere with anything. > > The microvm machine has a gross hack where it fiddles with fw_cfg data > after the fact. So this hack is updated to account for this appending, > by reserving some bytes. > > Fixup-by: Michael S. Tsirkin > Cc: x...@kernel.org > Cc: Philippe Mathieu-Daudé > Cc: H. Peter Anvin > Cc: Borislav Petkov > Cc: Eric Biggers > Signed-off-by: Jason A. Donenfeld > Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> > Message-ID: <20230128061015-mutt-send-email-...@kernel.org> > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > Tested-by: Eric Biggers > Tested-by: Mathias Krause > --- > include/hw/i386/microvm.h | 5 ++-- > include/hw/nvram/fw_cfg.h | 9 +++ > hw/i386/microvm.c | 15 +++ > hw/i386/x86.c | 52 +-- > hw/nvram/fw_cfg.c | 9 +++ > 5 files changed, 59 insertions(+), 31 deletions(-) Cc: qemu-sta...@nongnu.org Fixes: 67f7e426 ("hw/i386: pass RNG seed via setup_data entry")
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
On Tue, Jan 31, 2023 at 08:39:08PM +0100, Jason A. Donenfeld wrote: > On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote: > > From: "Jason A. Donenfeld" > > > > The setup_data links are appended to the compressed kernel image. Since > > the kernel image is typically loaded at 0x10, setup_data lives at > > `0x10 + compressed_size`, which does not get relocated during the > > kernel's boot process. > > > > The kernel typically decompresses the image starting at address > > 0x100 (note: there's one more zero there than the compressed image > > above). This usually is fine for most kernels. > > > > However, if the compressed image is actually quite large, then > > setup_data will live at a `0x10 + compressed_size` that extends into > > the decompressed zone at 0x100. In other words, if compressed_size > > is larger than `0x100 - 0x10`, then the decompression step will > > clobber setup_data, resulting in crashes. > > > > Visually, what happens now is that QEMU appends setup_data to the kernel > > image: > > > > kernel imagesetup_data > >|--||| > > 0x10 0x10+l1 0x10+l1+l2 > > > > The problem is that this decompresses to 0x100 (one more zero). So > > if l1 is > (0x100-0x10), then this winds up looking like: > > > > kernel imagesetup_data > >|--||| > > 0x10 0x10+l1 0x10+l1+l2 > > > > d e c o m p r e s s e d k e r n e l > > > > |-| > > 0x100 > > 0x100+l3 > > > > The decompressed kernel seemingly overwriting the compressed kernel > > image isn't a problem, because that gets relocated to a higher address > > early on in the boot process, at the end of startup_64. setup_data, > > however, stays in the same place, since those links are self referential > > and nothing fixes them up. So the decompressed kernel clobbers it. > > > > Fix this by appending setup_data to the cmdline blob rather than the > > kernel image blob, which remains at a lower address that won't get > > clobbered. > > > > This could have been done by overwriting the initrd blob instead, but > > that poses big difficulties, such as no longer being able to use memory > > mapped files for initrd, hurting performance, and, more importantly, the > > initrd address calculation is hard coded in qboot, and it always grows > > down rather than up, which means lots of brittle semantics would have to > > be changed around, incurring more complexity. In contrast, using cmdline > > is simple and doesn't interfere with anything. > > > > The microvm machine has a gross hack where it fiddles with fw_cfg data > > after the fact. So this hack is updated to account for this appending, > > by reserving some bytes. > > > > Fixup-by: Michael S. Tsirkin > > Cc: x...@kernel.org > > Cc: Philippe Mathieu-Daudé > > Cc: H. Peter Anvin > > Cc: Borislav Petkov > > Cc: Eric Biggers > > Signed-off-by: Jason A. Donenfeld > > Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> > > Message-ID: <20230128061015-mutt-send-email-...@kernel.org> > > Reviewed-by: Michael S. Tsirkin > > Signed-off-by: Michael S. Tsirkin > > Tested-by: Eric Biggers > > Tested-by: Mathias Krause > > This one should wind up in the stable point release too. Dunno what the > procedure for that is. > > Jason If you want that you need to include Cc: qemu-sta...@nongnu.org Fixes: ("subject") you can still reply to the original mail with this. -- MST
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
On Tue, Jan 31, 2023, 15:55 H. Peter Anvin wrote: > On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" > wrote: > >From: "Jason A. Donenfeld" > > > >The setup_data links are appended to the compressed kernel image. Since > >the kernel image is typically loaded at 0x10, setup_data lives at > >`0x10 + compressed_size`, which does not get relocated during the > >kernel's boot process. > > > >The kernel typically decompresses the image starting at address > >0x100 (note: there's one more zero there than the compressed image > >above). This usually is fine for most kernels. > > > >However, if the compressed image is actually quite large, then > >setup_data will live at a `0x10 + compressed_size` that extends into > >the decompressed zone at 0x100. In other words, if compressed_size > >is larger than `0x100 - 0x10`, then the decompression step will > >clobber setup_data, resulting in crashes. > > > >Visually, what happens now is that QEMU appends setup_data to the kernel > >image: > > > > kernel imagesetup_data > > |--||| > >0x10 0x10+l1 0x10+l1+l2 > > > >The problem is that this decompresses to 0x100 (one more zero). So > >if l1 is > (0x100-0x10), then this winds up looking like: > > > > kernel imagesetup_data > > |--||| > >0x10 0x10+l1 0x10+l1+l2 > > > > d e c o m p r e s s e d k e r n e l > > > |-| > >0x100 > 0x100+l3 > > > >The decompressed kernel seemingly overwriting the compressed kernel > >image isn't a problem, because that gets relocated to a higher address > >early on in the boot process, at the end of startup_64. setup_data, > >however, stays in the same place, since those links are self referential > >and nothing fixes them up. So the decompressed kernel clobbers it. > > > >Fix this by appending setup_data to the cmdline blob rather than the > >kernel image blob, which remains at a lower address that won't get > >clobbered. > > > >This could have been done by overwriting the initrd blob instead, but > >that poses big difficulties, such as no longer being able to use memory > >mapped files for initrd, hurting performance, and, more importantly, the > >initrd address calculation is hard coded in qboot, and it always grows > >down rather than up, which means lots of brittle semantics would have to > >be changed around, incurring more complexity. In contrast, using cmdline > >is simple and doesn't interfere with anything. > > > >The microvm machine has a gross hack where it fiddles with fw_cfg data > >after the fact. So this hack is updated to account for this appending, > >by reserving some bytes. > > > >Fixup-by: Michael S. Tsirkin > >Cc: x...@kernel.org > >Cc: Philippe Mathieu-Daudé > >Cc: H. Peter Anvin > >Cc: Borislav Petkov > >Cc: Eric Biggers > >Signed-off-by: Jason A. Donenfeld > >Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> > >Message-ID: <20230128061015-mutt-send-email-...@kernel.org> > >Reviewed-by: Michael S. Tsirkin > >Signed-off-by: Michael S. Tsirkin > >Tested-by: Eric Biggers > >Tested-by: Mathias Krause > >--- > > include/hw/i386/microvm.h | 5 ++-- > > include/hw/nvram/fw_cfg.h | 9 +++ > > hw/i386/microvm.c | 15 +++ > > hw/i386/x86.c | 52 +-- > > hw/nvram/fw_cfg.c | 9 +++ > > 5 files changed, 59 insertions(+), 31 deletions(-) > > > >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h > >index fad97a891d..e8af61f194 100644 > >--- a/include/hw/i386/microvm.h > >+++ b/include/hw/i386/microvm.h > >@@ -50,8 +50,9 @@ > > */ > > > > /* Platform virtio definitions */ > >-#define VIRTIO_MMIO_BASE 0xfeb0 > >-#define VIRTIO_CMDLINE_MAXLEN 64 > >+#define VIRTIO_MMIO_BASE0xfeb0 > >+#define VIRTIO_CMDLINE_MAXLEN 64 > >+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN((VIRTIO_CMDLINE_MAXLEN + 1) * > 16) > > > > #define GED_MMIO_BASE 0xfea0 > > #define GED_MMIO_BASE_MEMHP (GED_MMIO_BASE + 0x100) > >diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h > >index 2e503904dc..990dcdbb2e 100644 > >--- a/include/hw/nvram/fw_cfg.h > >+++ b/include/hw/nvram/fw_cfg.h > >@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, > uint16_t key, > >void *data, size_t len, > >bool read_only); > > > >+/** > >+ * fw_cfg_read_bytes_ptr: > >+ * @s: fw_cfg device being modified > >+ * @key: selector key value for new fw_cfg item > >+ * > >+ * Reads an existing fw_cfg data pointer. > >+ */ > >+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key); > >+ > > /** > > * fw_cfg_add_string: > > * @s: fw_cfg device
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" wrote: >From: "Jason A. Donenfeld" > >The setup_data links are appended to the compressed kernel image. Since >the kernel image is typically loaded at 0x10, setup_data lives at >`0x10 + compressed_size`, which does not get relocated during the >kernel's boot process. > >The kernel typically decompresses the image starting at address >0x100 (note: there's one more zero there than the compressed image >above). This usually is fine for most kernels. > >However, if the compressed image is actually quite large, then >setup_data will live at a `0x10 + compressed_size` that extends into >the decompressed zone at 0x100. In other words, if compressed_size >is larger than `0x100 - 0x10`, then the decompression step will >clobber setup_data, resulting in crashes. > >Visually, what happens now is that QEMU appends setup_data to the kernel >image: > > kernel imagesetup_data > |--||| >0x10 0x10+l1 0x10+l1+l2 > >The problem is that this decompresses to 0x100 (one more zero). So >if l1 is > (0x100-0x10), then this winds up looking like: > > kernel imagesetup_data > |--||| >0x10 0x10+l1 0x10+l1+l2 > > d e c o m p r e s s e d k e r n e l > > |-| >0x100 > 0x100+l3 > >The decompressed kernel seemingly overwriting the compressed kernel >image isn't a problem, because that gets relocated to a higher address >early on in the boot process, at the end of startup_64. setup_data, >however, stays in the same place, since those links are self referential >and nothing fixes them up. So the decompressed kernel clobbers it. > >Fix this by appending setup_data to the cmdline blob rather than the >kernel image blob, which remains at a lower address that won't get >clobbered. > >This could have been done by overwriting the initrd blob instead, but >that poses big difficulties, such as no longer being able to use memory >mapped files for initrd, hurting performance, and, more importantly, the >initrd address calculation is hard coded in qboot, and it always grows >down rather than up, which means lots of brittle semantics would have to >be changed around, incurring more complexity. In contrast, using cmdline >is simple and doesn't interfere with anything. > >The microvm machine has a gross hack where it fiddles with fw_cfg data >after the fact. So this hack is updated to account for this appending, >by reserving some bytes. > >Fixup-by: Michael S. Tsirkin >Cc: x...@kernel.org >Cc: Philippe Mathieu-Daudé >Cc: H. Peter Anvin >Cc: Borislav Petkov >Cc: Eric Biggers >Signed-off-by: Jason A. Donenfeld >Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> >Message-ID: <20230128061015-mutt-send-email-...@kernel.org> >Reviewed-by: Michael S. Tsirkin >Signed-off-by: Michael S. Tsirkin >Tested-by: Eric Biggers >Tested-by: Mathias Krause >--- > include/hw/i386/microvm.h | 5 ++-- > include/hw/nvram/fw_cfg.h | 9 +++ > hw/i386/microvm.c | 15 +++ > hw/i386/x86.c | 52 +-- > hw/nvram/fw_cfg.c | 9 +++ > 5 files changed, 59 insertions(+), 31 deletions(-) > >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h >index fad97a891d..e8af61f194 100644 >--- a/include/hw/i386/microvm.h >+++ b/include/hw/i386/microvm.h >@@ -50,8 +50,9 @@ > */ > > /* Platform virtio definitions */ >-#define VIRTIO_MMIO_BASE 0xfeb0 >-#define VIRTIO_CMDLINE_MAXLEN 64 >+#define VIRTIO_MMIO_BASE0xfeb0 >+#define VIRTIO_CMDLINE_MAXLEN 64 >+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN((VIRTIO_CMDLINE_MAXLEN + 1) * 16) > > #define GED_MMIO_BASE 0xfea0 > #define GED_MMIO_BASE_MEMHP (GED_MMIO_BASE + 0x100) >diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h >index 2e503904dc..990dcdbb2e 100644 >--- a/include/hw/nvram/fw_cfg.h >+++ b/include/hw/nvram/fw_cfg.h >@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t >key, >void *data, size_t len, >bool read_only); > >+/** >+ * fw_cfg_read_bytes_ptr: >+ * @s: fw_cfg device being modified >+ * @key: selector key value for new fw_cfg item >+ * >+ * Reads an existing fw_cfg data pointer. >+ */ >+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key); >+ > /** > * fw_cfg_add_string: > * @s: fw_cfg device being modified >diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c >index 170a331e3f..29f30dd6d3 100644 >--- a/hw/i386/microvm.c >+++ b/hw/i386/microvm.c >@@ -378,7 +378,8 @@ static void
Re: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote: > From: "Jason A. Donenfeld" > > The setup_data links are appended to the compressed kernel image. Since > the kernel image is typically loaded at 0x10, setup_data lives at > `0x10 + compressed_size`, which does not get relocated during the > kernel's boot process. > > The kernel typically decompresses the image starting at address > 0x100 (note: there's one more zero there than the compressed image > above). This usually is fine for most kernels. > > However, if the compressed image is actually quite large, then > setup_data will live at a `0x10 + compressed_size` that extends into > the decompressed zone at 0x100. In other words, if compressed_size > is larger than `0x100 - 0x10`, then the decompression step will > clobber setup_data, resulting in crashes. > > Visually, what happens now is that QEMU appends setup_data to the kernel > image: > > kernel imagesetup_data >|--||| > 0x10 0x10+l1 0x10+l1+l2 > > The problem is that this decompresses to 0x100 (one more zero). So > if l1 is > (0x100-0x10), then this winds up looking like: > > kernel imagesetup_data >|--||| > 0x10 0x10+l1 0x10+l1+l2 > > d e c o m p r e s s e d k e r n e l > > |-| > 0x100 > 0x100+l3 > > The decompressed kernel seemingly overwriting the compressed kernel > image isn't a problem, because that gets relocated to a higher address > early on in the boot process, at the end of startup_64. setup_data, > however, stays in the same place, since those links are self referential > and nothing fixes them up. So the decompressed kernel clobbers it. > > Fix this by appending setup_data to the cmdline blob rather than the > kernel image blob, which remains at a lower address that won't get > clobbered. > > This could have been done by overwriting the initrd blob instead, but > that poses big difficulties, such as no longer being able to use memory > mapped files for initrd, hurting performance, and, more importantly, the > initrd address calculation is hard coded in qboot, and it always grows > down rather than up, which means lots of brittle semantics would have to > be changed around, incurring more complexity. In contrast, using cmdline > is simple and doesn't interfere with anything. > > The microvm machine has a gross hack where it fiddles with fw_cfg data > after the fact. So this hack is updated to account for this appending, > by reserving some bytes. > > Fixup-by: Michael S. Tsirkin > Cc: x...@kernel.org > Cc: Philippe Mathieu-Daudé > Cc: H. Peter Anvin > Cc: Borislav Petkov > Cc: Eric Biggers > Signed-off-by: Jason A. Donenfeld > Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> > Message-ID: <20230128061015-mutt-send-email-...@kernel.org> > Reviewed-by: Michael S. Tsirkin > Signed-off-by: Michael S. Tsirkin > Tested-by: Eric Biggers > Tested-by: Mathias Krause This one should wind up in the stable point release too. Dunno what the procedure for that is. Jason