Re: [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails

2022-10-10 Thread liulongfang via
On 2022/5/31 1:07, Avihai Horon wrote:
> If vfio_migration_set_state() fails to set the device in the requested
> state it tries to put it in a recover state. If setting the device in
> the recover state fails as well, hw_error is triggered and the VM is
> aborted.
> 
> To improve user experience and avoid VM data loss, reset the device with
> VFIO_RESET_DEVICE instead of aborting the VM.
> 
> Signed-off-by: Avihai Horon 
> ---
>  hw/vfio/migration.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 852759e6ca..6c34502611 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -89,8 +89,16 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
>  /* Try to put the device in some good state */
>  mig_state->device_state = recover_state;
>  if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -hw_error("%s: Device in error state, can't recover",
> - vbasedev->name);
> +if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> +hw_error("%s: Device in error state, can't recover",
> + vbasedev->name);
> +}
> +
> +error_report(
> +"%s: Device was reset due to failure in changing device 
> state to recover state %s",
> +vbasedev->name, mig_state_to_str(recover_state));
> +
> +return -1;
>  }
> 

When I used the qemu 7.1.50 version compiled with this set of patches,
I found that after the migration failed due to disconnecting the destination VM
during the live migration process, when I exited the source qemu, the
following error would appear:

[100337.287047] BUG: Bad page state in process qemu-system-aar  pfn:82199518
[100337.295815] page:356de4da refcount:-2 mapcount:0 mapping:000
0 index:0x0 pfn:0x82199518
[100337.306403] flags: 0xbfff800(node=0|zone=2|lastcpupid=0x7fff)
[100337.314091] raw: 0bfff800 dead0100 dead0122 

[100337.322589] raw:   fffe 

[100337.330630] page dumped because: nonzero _refcount
[100337.335840] Modules linked in: hisi_acc_vfio_pci hisi_sec2 hisi_zip hisi_hpr
e hisi_qm uacce vfio_iommu_type1 vfio_pci vfio_pci_core vfio_virqfd vfio pv680_m
ii(O) [last unloaded: hisi_sec2]
[100337.354564] CPU: 1 PID: 786 Comm: qemu-system-aar Tainted: GB  O
   6.0.0-rc4+ #1
[100337.377378] Call trace:
[100337.380382]  dump_backtrace.part.0+0xc4/0xd0
[100337.385791]  show_stack+0x24/0x40
[100337.389478]  dump_stack_lvl+0x68/0x84
[100337.394155]  dump_stack+0x18/0x34
[100337.398006]  bad_page+0xf0/0x120
[100337.401796]  check_free_page_bad+0x84/0x90
[100337.406404]  free_pcppages_bulk+0x1bc/0x2b0
[100337.411126]  free_unref_page_commit+0x120/0x15c
[100337.416935]  free_unref_page+0x15c/0x254
[100337.421436]  free_compound_page+0x6c/0x100
[100337.425868]  free_transhuge_page+0xd4/0x140
[100337.430535]  destroy_large_folio+0x30/0x40
[100337.434953]  release_pages+0x1bc/0x4d0
[100337.439268]  free_pages_and_swap_cache+0x68/0x80
[100337.444224]  tlb_batch_pages_flush+0x5c/0x94
[100337.448976]  tlb_flush_mmu+0x4c/0xd4
[100337.453062]  unmap_page_range+0x8d0/0xbd0
[100337.457432]  unmap_single_vma+0x90/0x12c
[100337.461673]  unmap_vmas+0x84/0xfc
[100337.465354]  exit_mmap+0x88/0x1b0
[100337.469008]  __mmput+0x48/0x134
[100337.472637]  mmput+0x44/0x50
[100337.475857]  do_exit+0x2b8/0x970
[100337.479641]  do_group_exit+0x40/0xac
[100337.484079]  get_signal+0x8c0/0x934
[100337.488215]  do_notify_resume+0x1d0/0x1570
[100337.492795]  el0_svc+0xa8/0xc0
[100337.496452]  el0t_64_sync_handler+0x1ac/0x1b0
[100337.501187]  el0t_64_sync+0x19c/0x1a0

Can anyone see what is causing this error?

>  error_report("%s: Failed changing device state to %s", 
> vbasedev->name,
> 
Thanks
Longfang.



[PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails

2022-05-30 Thread Avihai Horon
If vfio_migration_set_state() fails to set the device in the requested
state it tries to put it in a recover state. If setting the device in
the recover state fails as well, hw_error is triggered and the VM is
aborted.

To improve user experience and avoid VM data loss, reset the device with
VFIO_RESET_DEVICE instead of aborting the VM.

Signed-off-by: Avihai Horon 
---
 hw/vfio/migration.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 852759e6ca..6c34502611 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -89,8 +89,16 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
 /* Try to put the device in some good state */
 mig_state->device_state = recover_state;
 if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-hw_error("%s: Device in error state, can't recover",
- vbasedev->name);
+if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
+hw_error("%s: Device in error state, can't recover",
+ vbasedev->name);
+}
+
+error_report(
+"%s: Device was reset due to failure in changing device state 
to recover state %s",
+vbasedev->name, mig_state_to_str(recover_state));
+
+return -1;
 }
 
 error_report("%s: Failed changing device state to %s", vbasedev->name,
-- 
2.21.3