Re: [PULL v2 31/82] vhost: Change the sequence of device start

2022-11-07 Thread Christian A. Ehrhardt
On Mon, Nov 07, 2022 at 08:30:32AM -0500, Michael S. Tsirkin wrote:
> On Sun, Nov 06, 2022 at 07:00:33PM +0100, Christian A. Ehrhardt wrote:
> > 
> > Hi,
> > 
> > On Sat, Nov 05, 2022 at 12:43:05PM -0400, Michael S. Tsirkin wrote:
> > > On Sat, Nov 05, 2022 at 05:35:57PM +0100, Bernhard Beschow wrote:
> > > > 
> > > > 
> > > > On Wed, Nov 2, 2022 at 5:24 PM Michael S. Tsirkin  
> > > > wrote:
> > > > 
> > > > From: Yajun Wu 
> > > > 
> > > > This patch is part of adding vhost-user vhost_dev_start support. The
> > > > motivation is to improve backend configuration speed and reduce live
> > > > migration VM downtime.
> > > > 
> > > > Moving the device start routines after finishing all the necessary 
> > > > device
> > > > and VQ configuration, further aligning to the virtio specification 
> > > > for
> > > > "device initialization sequence".
> > > > 
> > > > Following patch will add vhost-user vhost_dev_start support.
> > > > 
> > > > Signed-off-by: Yajun Wu 
> > > > Acked-by: Parav Pandit 
> > > > 
> > > > Message-Id: <20221017064452.1226514-2-yaj...@nvidia.com>
> > > > Reviewed-by: Michael S. Tsirkin 
> > > > Signed-off-by: Michael S. Tsirkin 
> > > > ---
> > > >  hw/block/vhost-user-blk.c | 18 +++---
> > > >  hw/net/vhost_net.c        | 12 ++--
> > > >  2 files changed, 17 insertions(+), 13 deletions(-)
> > > > 
> > > > 
> > > > A git bisect tells me that this is the first bad commit for failing 
> > > > qos-tests
> > > > which only fail when parallel jobs are enabled, e.g. `make check-qtest 
> > > > -j8`:
> > 
> > Parallel test run is not required provided that the test machine is
> > sufficiently busy (load > number of CPU threads). In this case a single
> > invocation of the qos test will fail reliably with this change.
> > 
> > However, the change is not really the root cause of the failures.
> > 
> > > > Summary of Failures:
> > > > 
> > > >  76/541 qemu:qtest+qtest-aarch64 / qtest-aarch64/qos-test               
> > > >        
> > > >   ERROR          18.68s   killed by signal 6 SIGABRT
> > > >  77/541 qemu:qtest+qtest-arm / qtest-arm/qos-test                       
> > > >        
> > > >   ERROR          17.60s   killed by signal 6 SIGABRT
> > > >  93/541 qemu:qtest+qtest-i386 / qtest-i386/qos-test                     
> > > >        
> > > >   ERROR          18.98s   killed by signal 6 SIGABRT
> > > > 108/541 qemu:qtest+qtest-ppc64 / qtest-ppc64/qos-test                   
> > > >        
> > > >   ERROR          16.40s   killed by signal 6 SIGABRT
> > > > 112/541 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test             
> > > >        
> > > >   ERROR          145.94s   killed by signal 6 SIGABRT
> > > > 130/541 qemu:qtest+qtest-x86_64 / qtest-x86_64/qos-test                 
> > > >        
> > > >   ERROR          17.32s   killed by signal 6 SIGABRT
> > > > 243/541 qemu:qtest+qtest-x86_64 / qtest-x86_64/bios-tables-test         
> > > >        
> > > >   ERROR          127.70s   killed by signal 6 SIGABRT
> > > > 
> > > > Ok:                 500
> > > > Expected Fail:      0  
> > > > Fail:               7  
> > > > Unexpected Pass:    0  
> > > > Skipped:            34  
> > > > Timeout:            0  
> > > > 
> > > > Can anyone else reproduce this?
> > > 
> > > Could you pls try latest for_upstream in my tree?
> > > That should have this fixed.
> > 
> > Your new pull request simply drops this change and this does fix
> > make check-qtest. However, this looks accidental to me and the real
> > bug is there in plain origin/master, too.
> > 
> > What happens is this backtrace a recursive call to vu_gpio_stop
> > via the backtrace below. It is caused by a delayed of the TCP
> > connection (the delayed part only triggers with heavy load on the
> > machine).
> > 
> > You can get the failure back (probably in upstream) if the test is
> > forced to us "use-started=off" which can be set on the command line.
>

Re: [PULL v2 31/82] vhost: Change the sequence of device start

2022-11-06 Thread Christian A. Ehrhardt


Hi,

On Sat, Nov 05, 2022 at 12:43:05PM -0400, Michael S. Tsirkin wrote:
> On Sat, Nov 05, 2022 at 05:35:57PM +0100, Bernhard Beschow wrote:
> > 
> > 
> > On Wed, Nov 2, 2022 at 5:24 PM Michael S. Tsirkin  wrote:
> > 
> > From: Yajun Wu 
> > 
> > This patch is part of adding vhost-user vhost_dev_start support. The
> > motivation is to improve backend configuration speed and reduce live
> > migration VM downtime.
> > 
> > Moving the device start routines after finishing all the necessary 
> > device
> > and VQ configuration, further aligning to the virtio specification for
> > "device initialization sequence".
> > 
> > Following patch will add vhost-user vhost_dev_start support.
> > 
> > Signed-off-by: Yajun Wu 
> > Acked-by: Parav Pandit 
> > 
> > Message-Id: <20221017064452.1226514-2-yaj...@nvidia.com>
> > Reviewed-by: Michael S. Tsirkin 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  hw/block/vhost-user-blk.c | 18 +++---
> >  hw/net/vhost_net.c        | 12 ++--
> >  2 files changed, 17 insertions(+), 13 deletions(-)
> > 
> > 
> > A git bisect tells me that this is the first bad commit for failing 
> > qos-tests
> > which only fail when parallel jobs are enabled, e.g. `make check-qtest -j8`:

Parallel test run is not required provided that the test machine is
sufficiently busy (load > number of CPU threads). In this case a single
invocation of the qos test will fail reliably with this change.

However, the change is not really the root cause of the failures.

> > Summary of Failures:
> > 
> >  76/541 qemu:qtest+qtest-aarch64 / qtest-aarch64/qos-test                   
> >    
> >   ERROR          18.68s   killed by signal 6 SIGABRT
> >  77/541 qemu:qtest+qtest-arm / qtest-arm/qos-test                           
> >    
> >   ERROR          17.60s   killed by signal 6 SIGABRT
> >  93/541 qemu:qtest+qtest-i386 / qtest-i386/qos-test                         
> >    
> >   ERROR          18.98s   killed by signal 6 SIGABRT
> > 108/541 qemu:qtest+qtest-ppc64 / qtest-ppc64/qos-test                       
> >    
> >   ERROR          16.40s   killed by signal 6 SIGABRT
> > 112/541 qemu:qtest+qtest-i386 / qtest-i386/bios-tables-test                 
> >    
> >   ERROR          145.94s   killed by signal 6 SIGABRT
> > 130/541 qemu:qtest+qtest-x86_64 / qtest-x86_64/qos-test                     
> >    
> >   ERROR          17.32s   killed by signal 6 SIGABRT
> > 243/541 qemu:qtest+qtest-x86_64 / qtest-x86_64/bios-tables-test             
> >    
> >   ERROR          127.70s   killed by signal 6 SIGABRT
> > 
> > Ok:                 500
> > Expected Fail:      0  
> > Fail:               7  
> > Unexpected Pass:    0  
> > Skipped:            34  
> > Timeout:            0  
> > 
> > Can anyone else reproduce this?
> 
> Could you pls try latest for_upstream in my tree?
> That should have this fixed.

Your new pull request simply drops this change and this does fix
make check-qtest. However, this looks accidental to me and the real
bug is there in plain origin/master, too.

What happens is this backtrace a recursive call to vu_gpio_stop
via the backtrace below. It is caused by a delayed of the TCP
connection (the delayed part only triggers with heavy load on the
machine).

You can get the failure back (probably in upstream) if the test is
forced to us "use-started=off" which can be set on the command line.
E.g. like this:

diff --git a/tests/qtest/libqos/virtio-gpio.c b/tests/qtest/libqos/virtio-gpio.c
index 762aa6695b..17c6b71e8b 100644
--- a/tests/qtest/libqos/virtio-gpio.c
+++ b/tests/qtest/libqos/virtio-gpio.c
@@ -154,14 +154,14 @@ static void virtio_gpio_register_nodes(void)
 QOSGraphEdgeOptions edge_opts = { };

 /* vhost-user-gpio-device */
-edge_opts.extra_device_opts = "id=gpio0,chardev=chr-vhost-user-test";
+edge_opts.extra_device_opts = 
"id=gpio0,chardev=chr-vhost-user-test,use-started=off";
 qos_node_create_driver("vhost-user-gpio-device",
 virtio_gpio_device_create);
 qos_node_consumes("vhost-user-gpio-device", "virtio-bus", _opts);
 qos_node_produces("vhost-user-gpio-device", "vhost-user-gpio");

 /* virtio-gpio-pci */
-edge_opts.extra_device_opts = 
"id=gpio0,addr=04.0,chardev=chr-vhost-user-test";
+edge_opts.extra_device_opts = 
"id=gpio0,addr=04.0,chardev=chr-vhost-user-test,use-started=on";
 add_qpci_address(_opts, );
 qos_node_create_driver("vhost-user-gpio-pci", virtio_gpio_pci_create);
 qos_node_consumes("vhost-user-gpio-pci", "pci-bus", _opts);


I haven't verified this but from looking at the code other types of
vhost devices seem to have the same problem (e.g. vhost-user-i2c looks
suspicious).

Ok, here's the backtrace:

#0  vu_gpio_stop (vdev=vdev@entry=0x560e0ae449d0) at 
../hw/virtio/vhost-user-gpio.c:143
#1  0x560e0768fb1f in vu_gpio_disconnect (dev=) at 
../hw/virtio/vhost-user-gpio.c:260
#2  vu_gpio_event 

[PATCH v2] hw/acpi/erst.c: Fix memory handling issues

2022-10-24 Thread Christian A. Ehrhardt
- Fix memset argument order: The second argument is
  the value, the length goes last.
- Fix an integer overflow reported by Alexander Bulekov.

Both issues allow the guest to overrun the host buffer
allocated for the ERST memory device.

Cc: Eric DeVolder 
Cc: qemu-sta...@nongnu.org
Fixes: f7e26ffa590 ("ACPI ERST: support for ACPI ERST feature")
Tested-by: Alexander Bulekov 
Signed-off-by: Christian A. Ehrhardt 
---
 hw/acpi/erst.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
index df856b2669..aefcc03ad6 100644
--- a/hw/acpi/erst.c
+++ b/hw/acpi/erst.c
@@ -635,7 +635,7 @@ static unsigned read_erst_record(ERSTDeviceState *s)
 if (record_length < UEFI_CPER_RECORD_MIN_SIZE) {
 rc = STATUS_FAILED;
 }
-if ((s->record_offset + record_length) > exchange_length) {
+if (record_length > exchange_length - s->record_offset) {
 rc = STATUS_FAILED;
 }
 /* If all is ok, copy the record to the exchange buffer */
@@ -684,7 +684,7 @@ static unsigned write_erst_record(ERSTDeviceState *s)
 if (record_length < UEFI_CPER_RECORD_MIN_SIZE) {
 return STATUS_FAILED;
 }
-if ((s->record_offset + record_length) > exchange_length) {
+if (record_length > exchange_length - s->record_offset) {
 return STATUS_FAILED;
 }
 
@@ -716,7 +716,7 @@ static unsigned write_erst_record(ERSTDeviceState *s)
 if (nvram) {
 /* Write the record into the slot */
 memcpy(nvram, exchange, record_length);
-memset(nvram + record_length, exchange_length - record_length, 0xFF);
+memset(nvram + record_length, 0xFF, exchange_length - record_length);
 /* If a new record, increment the record_count */
 if (!record_found) {
 uint32_t record_count;
-- 
2.34.1




Re: [PATCH] hw/acpi/erst.c: Fix memset argument order

2022-10-23 Thread Christian A. Ehrhardt


Hi,

On Sat, Oct 22, 2022 at 01:37:27AM -0400, Alexander Bulekov wrote:
> On 221021 1505, Alexander Bulekov wrote:
> > On 221019 2115, Christian A. Ehrhardt wrote:
> > > Fix memset argument order: The second argument is
> > > the value, the length goes last.
> > > 
> > > Cc: Eric DeVolder 
> > > Cc: qemu-sta...@nongnu.org
> > > Fixes: f7e26ffa590 ("ACPI ERST: support for ACPI ERST feature")
> > > Signed-off-by: Christian A. Ehrhardt 
> > > ---
> > >  hw/acpi/erst.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
> > > index df856b2669..26391f93ca 100644
> > > --- a/hw/acpi/erst.c
> > > +++ b/hw/acpi/erst.c
> > > @@ -716,7 +716,7 @@ static unsigned write_erst_record(ERSTDeviceState *s)
> > >  if (nvram) {
> > >  /* Write the record into the slot */
> > >  memcpy(nvram, exchange, record_length);
> > > -memset(nvram + record_length, exchange_length - record_length, 
> > > 0xFF);
> > > +memset(nvram + record_length, 0xFF, exchange_length - 
> > > record_length);
> > 
> > Hi, 
> > I'm running the fuzzer over this code. So far, it hasn't complained
> > about this particular memset call, but it has crashed on the memcpy,
> > directly preceding it. I think the record_length checks might be
> > insufficient. I made an issue/reproducer:
> > https://gitlab.com/qemu-project/qemu/-/issues/1268
> > 
> > In that particular case, record_length is ff00 and passes the
> > checks. I'll keep an eye on the fuzzer to see if it will be able to
> > crash the memset.
> 
> Here's a testcase for the memset issue:

Looks like this check in {read,write}_erst_record():
|   if ((s->record_offset + record_length) > exchange_length) {
|   return STATUS_FAILED;
|   }

has an integer overrun and should be re-written as:
|   if (record_length > exchange_length - s->record_offset) {
|   return STATUS_FAILED;
|   }

   regardsChristian




Re: [PATCH] hw/acpi/erst.c: Fix memset argument order

2022-10-21 Thread Christian A. Ehrhardt
On Fri, Oct 21, 2022 at 06:22:50AM +0200, Markus Armbruster wrote:
> "Christian A. Ehrhardt"  writes:
> 
> > Hi Markus,
> >
> > On Thu, Oct 20, 2022 at 08:14:32AM +0200, Markus Armbruster wrote:
> >> "Christian A. Ehrhardt"  writes:
> >> 
> >> > Fix memset argument order: The second argument is
> >> > the value, the length goes last.
> >> 
> >> Impact of the bug?
> >
> > Well, this is a memory error, i.e. the potential impact is
> > anything from silent data corruption to arbitrary code execution.
> > Phillipe described this accurately as "Ouch".
> >
> >> > Cc: Eric DeVolder 
> >> > Cc: qemu-sta...@nongnu.org
> >> > Fixes: f7e26ffa590 ("ACPI ERST: support for ACPI ERST feature")
> >> > Signed-off-by: Christian A. Ehrhardt 
> >> >  /* Write the record into the slot */
> >
> > [ ... ]
> >
> >> This first copies @record_length bytes into the exchange buffer.
> >> 
> >> > -memset(nvram + record_length, exchange_length - record_length, 
> >> > 0xFF);
> >> > +memset(nvram + record_length, 0xFF, exchange_length - 
> >> > record_length);
> >> 
> >> The new code pads it to the full exchange buffer size.
> >> 
> >> The old code writes 0xFF bytes.
> >> 
> >> If 0xFF < exchange_length - record_length, the padding doesn't extend to
> >> the end of the buffer.  Impact?
> >
> > Incorrect and insufficient data is written.
> >
> >> If 0xFF > exchange_length - record_length, we write beyond the end of
> >> the buffer.  Impact?
> >
> > Buffer overrun with well known potentially catastrophic consequences.
> > Additionally, incorrect data is used for the padding.
> 
> Is record_length controlled by the guest?

Yes, it is taken from the CPER header in the exchange store.

 regards   Christian




Re: [PATCH] hw/acpi/erst.c: Fix memset argument order

2022-10-20 Thread Christian A. Ehrhardt


Hi Markus,

On Thu, Oct 20, 2022 at 08:14:32AM +0200, Markus Armbruster wrote:
> "Christian A. Ehrhardt"  writes:
> 
> > Fix memset argument order: The second argument is
> > the value, the length goes last.
> 
> Impact of the bug?

Well, this is a memory error, i.e. the potential impact is
anything from silent data corruption to arbitrary code execution.
Phillipe described this accurately as "Ouch".

> > Cc: Eric DeVolder 
> > Cc: qemu-sta...@nongnu.org
> > Fixes: f7e26ffa590 ("ACPI ERST: support for ACPI ERST feature")
> > Signed-off-by: Christian A. Ehrhardt 
> >  /* Write the record into the slot */

[ ... ]

> This first copies @record_length bytes into the exchange buffer.
> 
> > -memset(nvram + record_length, exchange_length - record_length, 
> > 0xFF);
> > +memset(nvram + record_length, 0xFF, exchange_length - 
> > record_length);
> 
> The new code pads it to the full exchange buffer size.
> 
> The old code writes 0xFF bytes.
> 
> If 0xFF < exchange_length - record_length, the padding doesn't extend to
> the end of the buffer.  Impact?

Incorrect and insufficient data is written.

> If 0xFF > exchange_length - record_length, we write beyond the end of
> the buffer.  Impact?

Buffer overrun with well known potentially catastrophic consequences.
Additionally, incorrect data is used for the padding.

  regardsChristian




[PATCH] hw/acpi/erst.c: Fix memset argument order

2022-10-19 Thread Christian A. Ehrhardt
Fix memset argument order: The second argument is
the value, the length goes last.

Cc: Eric DeVolder 
Cc: qemu-sta...@nongnu.org
Fixes: f7e26ffa590 ("ACPI ERST: support for ACPI ERST feature")
Signed-off-by: Christian A. Ehrhardt 
---
 hw/acpi/erst.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
index df856b2669..26391f93ca 100644
--- a/hw/acpi/erst.c
+++ b/hw/acpi/erst.c
@@ -716,7 +716,7 @@ static unsigned write_erst_record(ERSTDeviceState *s)
 if (nvram) {
 /* Write the record into the slot */
 memcpy(nvram, exchange, record_length);
-memset(nvram + record_length, exchange_length - record_length, 0xFF);
+memset(nvram + record_length, 0xFF, exchange_length - record_length);
 /* If a new record, increment the record_count */
 if (!record_found) {
 uint32_t record_count;
-- 
2.34.1




Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in fw_cfg_modify_bytes_read()

2022-08-30 Thread Christian A. Ehrhardt


Hi,

Shameer: Thanks for bringing this to my attention.

Some comments inline.

On Tue, Aug 30, 2022 at 06:43:56AM +, Shameerali Kolothum Thodi wrote:
> 
> 
> > -Original Message-
> > From: Shameerali Kolothum Thodi
> > Sent: 26 August 2022 13:15
> > To: 'Laszlo Ersek' ; qemu-devel@nongnu.org;
> > qemu-...@nongnu.org
> > Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm
> > ; chenxiang (M) ; Ard
> > Biesheuvel (kernel.org address) ; Gerd Hoffmann
> > 
> > Subject: RE: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > fw_cfg_modify_bytes_read()
> > 
> > 
> > 
> > > -Original Message-
> > > From: Laszlo Ersek [mailto:ler...@redhat.com]
> > > Sent: 26 August 2022 13:07
> > > To: Shameerali Kolothum Thodi
> > ;
> > > qemu-devel@nongnu.org; qemu-...@nongnu.org
> > > Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm
> > > ; chenxiang (M) ;
> > Ard
> > > Biesheuvel (kernel.org address) ; Gerd Hoffmann
> > > 
> > > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in
> > > fw_cfg_modify_bytes_read()
> > >
> > > +Ard +Gerd, one pointer at the bottom
> > >
> > > On 08/26/22 13:59, Laszlo Ersek wrote:
> > > > On 08/25/22 18:18, Shameer Kolothum wrote:
> > > >> Hi
> > > >>
> > > >> On arm/virt platform, Chen Xiang reported a Guest crash while
> > > >> attempting the below steps,
> > > >>
> > > >> 1. Launch the Guest with nvdimm=on
> > > >> 2. Hot-add a NVDIMM dev
> > > >> 3. Reboot
> > > >> 4. Guest boots fine.
> > > >> 5. Reboot again.
> > > >> 6. Guest boot fails.
> > > >>
> > > >> QEMU_EFI reports the below error:
> > > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables"
> > > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > > >>
> > > >> Debugging shows that on first reboot(after hot-adding NVDIMM),
> > > >> Qemu updates the etc/table-loader len,
> > > >>
> > > >> qemu_ram_resize()
> > > >>   fw_cfg_modify_file()
> > > >>      fw_cfg_modify_bytes_read()
> > > >>
> > > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for
> > > >> the "key" entry to NULL. Because of this, on the second reboot,
> > > >> virt_acpi_build_update() is called with a NULL "build_state" and
> > > >> returns without updating the ACPI tables. This seems to be
> > > >> upsetting the firmware.
> > > >>
> > > >> To fix this, don't change the callback_opaque in
> > > fw_cfg_modify_bytes_read().
> > > >>
> > > >> Reported-by: chenxiang 
> > > >> Signed-off-by: Shameer Kolothum
> > > 
> > > >> ---
> > > >> I am still not very convinced this is the root cause of the issue.
> > > >> Though it looks like setting callback_opaque to NULL while updating
> > > >> the file size is wrong, what puzzles me is that on the second reboot
> > > >> we don't have any ACPI table size changes and ideally firmware should
> > > >> see the updated tables from the first reboot itself.
> > > >>
> > > >> Please take a look and let me know.
> > > >>
> > > >> Thanks,
> > > >> Shameer
> > > >>
> > > >> ---
> > > >>  hw/nvram/fw_cfg.c | 1 -
> > > >>  1 file changed, 1 deletion(-)
> > > >>
> > > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > > >> index d605f3f45a..dfe8404c01 100644
> > > >> --- a/hw/nvram/fw_cfg.c
> > > >> +++ b/hw/nvram/fw_cfg.c
> > > >> @@ -728,7 +728,6 @@ static void
> > > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key,
> > > >>  ptr = s->entries[arch][key].data;
> > > >>  s->entries[arch][key].data = data;
> > > >>  s->entries[arch][key].len = len;
> > > >> -s->entries[arch][key].callback_opaque = NULL;
> > > >>  s->entries[arch][key].allow_write = false;
> > > >>
> > > >>  return ptr;
> > > >>


The code as it stands clears callback_opaque (the data pointer
of the callbacks) while leaving the actual callbacks in place.
I think it is obvious that this cannot be correct.

IMHO the change to allow_write is wrong for similar reasons but
I don't think that this matters in practice.

If this path is hit for the table-loader file the ACPI tables in the
guest will be corrupt.


> > > > I vaguely recall seeing the same issue report years ago (also in
> > > > relation to hot-adding NVDIMM). However, I have no capacity to
> > > > participate in the discussion. Making this remark just for clarity.
> > >
> > > The earlier report I've had in mind was from Shameer as well:
> > >
> > >
> > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F
> > > b...@lhreml524-mbs.china.huawei.com
> > 
> > Right. That was a slightly different issue though. It was basically ACPI 
> > table
> > size not
> > getting updated on the first reboot of Guest after we hot-add NVDIMM dev.
> > The error
> > from firmware was different in that case,
> > 
> > ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> > OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> > 
> > And it was fixed with this series here,
> > https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.3
> > 

fwcfg: Wrong callback behaviour after fw_cfg_modify_bytes_read

2022-04-13 Thread Christian A. Ehrhardt


Hi,

there's a long story behind this (see below). However, I'll start with
the result:

fw_cfg_modify_bytes_read() sets the callback data of an existing
fw_cfg file to NULL but leaves the actual callbacks in place.
Additioanlly, this function sets ->allow_write to false for no
good reason AFAICS.

For most callbacks, the callback will just crash on the NULL pointer
in ->callback_opaque if this path is ever hit. 

I think the following patch is required (I can properly format it
if you agree). I'm not 100% sure about the "allow_write" part, tough:

diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index e5f3c98184..b8b6d8fe10 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -742,8 +742,6 @@ static void *fw_cfg_modify_bytes_read(FWCfgState *s, 
uint16_t key,
 ptr = s->entries[arch][key].data;
 s->entries[arch][key].data = data;
 s->entries[arch][key].len = len;
-s->entries[arch][key].callback_opaque = NULL;
-s->entries[arch][key].allow_write = false;
 
 return ptr;
 }

Oppinions?


For those interesed here's the somewhat longer story and the reason
why the diff actually matters:

We are running Windows in a Q35 based machine in UEFI mode with OVMF.
In some situations we saw that the Windows guest would hang in the
Windows boot loader after a guest initiated reboot of the virtual
machine. A hard "system_reset" would trigger the same bug.

The guest was hanging in a loop trying to read from unassigned
I/O port 0xb008. This is the default port used for the ACPI
PM timer on PIIX based machines (but remember that we use Q35 where
the PM timer lives at 0x608 instead).

It turned out that after the reboot OVMF would try to read the
ACPI tables from FWCFG but commands in the table-loader file
could not be executed correctly and OVMF falls back to some hard
coded PIIX based default.

ACPI tables and the table-loader data is initially generated
during setup but this data is re-generated via an FWCFG callback
(acpi_update_build) when the first of these files is accessed.
The tables generated at this later time differ slightly from those
generated during initial setup.

In our case these differences required a resize of the table-loader
romfile. This resize calls fw_cfg_modify_file() via the resize
hook of the memory region that contains the FWCFG file.
As described above this clears the ->callback_opaque data that
points to the build_state.

After a reboot rom_reset will restore the original contents of
the linker-loader file. In theory, this is only temporary. However,
due to the missing callback_opaque data the first call to
acpi_update_build() will do nothing. As a result the OVMF guest
reads an outdated version of the table-loader file. The actual
tables are properly re-generated on the next access to a different
FWCFG file that did not go through a resize. But at this point the
guest has already read the outdated table-loader data and trying to
apply this to the re-generated ACPI tables results in errors.

This results in broken ACPI tables as discussed above.

   regardsChristian