Re: [Qemu-arm] [PATCH v18 0/6] Add ARMv8 RAS virtualization support in QEMU
On Fri, 6 Sep 2019 at 09:33, Xiang Zheng wrote: > > In the ARMv8 platform, the CPU error types are synchronous external abort(SEA) > and SError Interrupt (SEI). If exception happens in guest, sometimes it's > better > for guest to perform the recovery, because host does not know the detailed > information of guest. For example, if an exception happens in a user-space > application within guest, host does not know which application encounters > errors. > > For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace. > After user space gets the notification, it will record the CPER into guest > GHES > buffer and inject an exception or IRQ into guest. > > In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will > treat it as a synchronous exception, and notify guest with ARMv8 SEA > notification type after recording CPER into guest. > > This series of patches are based on Qemu 4.1, which include two parts: > 1. Generate APEI/GHES table. > 2. Handle the SIGBUS signal, record the CPER in runtime and fill it into guest >memory, then notify guest according to the type of SIGBUS. > > The whole solution was suggested by James(james.mo...@arm.com); The solution > of > APEI section was suggested by Laszlo(ler...@redhat.com). > Show some discussions in [1]. > > This series of patches have already been tested on ARM64 platform with RAS > feature enabled: > Show the APEI part verification result in [2]. > Show the BUS_MCEERR_AR SIGBUS handling verification result in [3]. > > --- > > Since Dongjiu is too busy to do this work, I will finish the rest work on > behalf > of him. Thanks for picking up the work on this patchset, and sorry it's taken me a while to get to reviewing it. I've now given review comments on the arm parts of this, which are looking in generally good shape (my comments are all pretty minor stuff I think). I'll have to leave the ACPI parts to somebody else to review as that is definitely not my speciality. thanks -- PMM
RE: [PATCH v18 0/6] Add ARMv8 RAS virtualization support in QEMU
ping. Hi peter/Igor/all, can you review these patches,thanks a lot. -- 耿东久 Geng Dongjiu Mobile: +86-18221809728 Email: gengdong...@huawei.com<mailto:gengdong...@huawei.com> 发件人:zhengxiang (A) 收件人:pbonzini ;mst ;imammedo ;shannon.zhaosl ;peter.maydell ;lersek ;james.morse ;gengdongjiu ;mtosatti ;rth ;ehabkost ;Jonathan Cameron ;xuwei (O) ;kvm ;qemu-devel ;qemu-arm ;Linuxarm 抄 送:Wanghaibin (D) 时 间:2019-09-17 20:40:21 主题Re: [PATCH v18 0/6] Add ARMv8 RAS virtualization support in QEMU Hi all, This patch series has been tested for both TCG and KVM scenes. 1) Test for TCG: - Re-compile qemu after applying the patch refered to https://patchwork.kernel.org/cover/10942757/#22640271). - Use command line shown below to start qemu: ./qemu-system-aarch64 \ -name guest=ras \ -machine virt,gic-version=3,ras=on \ -cpu cortex-a57 \ -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ -nodefaults \ -kernel ${GUEST_KERNEL} \ -initrd ${GUEST_FS} \ -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x900" \ -m 8192 \ -smp 4 \ -serial stdio \ - Send a signal to one of the VCPU threads: kill -s SIGBUS 71571 - The result of test is shown below: [ 41.194753] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 41.197329] {1}[Hardware Error]: event severity: recoverable [ 41.199078] {1}[Hardware Error]: Error 0, type: recoverable [ 41.200829] {1}[Hardware Error]: section_type: memory error [ 41.202603] {1}[Hardware Error]: physical_address: 0x400a1000 [ 41.204649] {1}[Hardware Error]: error_type: 0, unknown [ 41.206328] EDAC MC0: 1 UE Unknown on unknown label ( page:0x400a1 offset:0x0 grain:0) [ 41.208788] Internal error: synchronous external abort: 96000410 [#1] SMP [ 41.210879] Modules linked in: [ 41.211823] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0+ #8 [ 41.213698] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 41.215812] pstate: 60c00085 (nZCv daIf +PAN +UAO) [ 41.217296] pc : cpu_do_idle+0x8/0xc [ 41.218400] lr : arch_cpu_idle+0x2c/0x1b8 [ 41.219629] sp : 09f9bf00 [ 41.220649] x29: 09f9bf00 x28: [ 41.222310] x27: x26: 8001fe471d80 [ 41.223945] x25: x24: 0937ba38 [ 41.225581] x23: 090b3338 x22: 09379000 [ 41.227220] x21: 0937b000 x20: 0004 [ 41.228871] x19: 090a6000 x18: [ 41.230517] x17: x16: [ 41.232165] x15: x14: [ 41.233810] x13: 089f4da8 x12: 000e [ 41.235448] x11: 089f4d80 x10: 0af0 [ 41.237101] x9 : 09f9be80 x8 : 8001fe4728d0 [ 41.238738] x7 : 0004 x6 : 8001fffbaf30 [ 41.240380] x5 : 0c43b940 x4 : 8001f6f0c000 [ 41.242030] x3 : 0001 x2 : 09f9bf00 [ 41.243666] x1 : 8001fffb82c8 x0 : 090a6018 [ 41.245306] Process swapper/2 (pid: 0, stack limit = 0x(ptrval)) [ 41.247378] Call trace: [ 41.248117] cpu_do_idle+0x8/0xc [ 41.249111] do_idle+0x1dc/0x2a8 [ 41.250111] cpu_startup_entry+0x28/0x30 [ 41.251319] secondary_start_kernel+0x180/0x1c8 [ 41.252725] Code: a8c17bfd d65f03c0 d5033f9f d503207f (d65f03c0) [ 41.254606] ---[ end trace 221bc8a614fb5a1d ]--- [ 41.256030] Kernel panic - not syncing: Fatal exception [ 41.257644] SMP: stopping secondary CPUs [ 41.258912] Kernel Offset: disabled [ 41.260011] CPU features: 0x0,22a00238 [ 41.261178] Memory Limit: none [ 41.262122] ---[ end Kernel panic - not syncing: Fatal exception ]--- 2) Test for KVM: - Use command line shown below to start qemu: ./qemu-system-aarch64 \ -name guest=ras \ -machine virt,accel=kvm,gic-version=3,ras=on \ -cpu host \ -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ -nodefaults \ -kernel ${GUEST_KERNEL} \ -initrd ${GUEST_FS} \ -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x900" \ -m 8192 \ -smp 4 \ -serial stdio \ - Run mca-recover and get the GPA(IPA) of allocated page which would be corrupted on the later. - Convert the GPA to HPA and corrupt this HPA via APEI/EINJ. - Go back to guest and continue to read this page. - The result of test is shown below: root@genericarmv8:~/tools# ./mca-recover pagesize: 0x1000 before clear cache
Re: [PATCH v18 0/6] Add ARMv8 RAS virtualization support in QEMU
Thanks xiang's continue upstream and test. Hope maintainer can review it. On 2019/9/17 20:39, Xiang Zheng wrote: > Hi all, > > This patch series has been tested for both TCG and KVM scenes. > > 1) Test for TCG: >- Re-compile qemu after applying the patch refered to > https://patchwork.kernel.org/cover/10942757/#22640271). >- Use command line shown below to start qemu: > ./qemu-system-aarch64 \ > -name guest=ras \ > -machine virt,gic-version=3,ras=on \ > -cpu cortex-a57 \ > -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ > -nodefaults \ > -kernel ${GUEST_KERNEL} \ > -initrd ${GUEST_FS} \ > -append "rdinit=init console=ttyAMA0 > earlycon=pl011,0x900" \ > -m 8192 \ > -smp 4 \ > -serial stdio \ > >- Send a signal to one of the VCPU threads: > kill -s SIGBUS 71571 > >- The result of test is shown below: > > [ 41.194753] {1}[Hardware Error]: Hardware error from APEI Generic > Hardware Error Source: 0 > [ 41.197329] {1}[Hardware Error]: event severity: recoverable > [ 41.199078] {1}[Hardware Error]: Error 0, type: recoverable > [ 41.200829] {1}[Hardware Error]: section_type: memory error > [ 41.202603] {1}[Hardware Error]: physical_address: 0x400a1000 > [ 41.204649] {1}[Hardware Error]: error_type: 0, unknown > [ 41.206328] EDAC MC0: 1 UE Unknown on unknown label ( page:0x400a1 > offset:0x0 grain:0) > [ 41.208788] Internal error: synchronous external abort: 96000410 [#1] > SMP > [ 41.210879] Modules linked in: > [ 41.211823] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0+ #8 > [ 41.213698] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 > 02/06/2015 > [ 41.215812] pstate: 60c00085 (nZCv daIf +PAN +UAO) > [ 41.217296] pc : cpu_do_idle+0x8/0xc > [ 41.218400] lr : arch_cpu_idle+0x2c/0x1b8 > [ 41.219629] sp : 09f9bf00 > [ 41.220649] x29: 09f9bf00 x28: > [ 41.222310] x27: x26: 8001fe471d80 > [ 41.223945] x25: x24: 0937ba38 > [ 41.225581] x23: 090b3338 x22: 09379000 > [ 41.227220] x21: 0937b000 x20: 0004 > [ 41.228871] x19: 090a6000 x18: > [ 41.230517] x17: x16: > [ 41.232165] x15: x14: > [ 41.233810] x13: 089f4da8 x12: 000e > [ 41.235448] x11: 089f4d80 x10: 0af0 > [ 41.237101] x9 : 09f9be80 x8 : 8001fe4728d0 > [ 41.238738] x7 : 0004 x6 : 8001fffbaf30 > [ 41.240380] x5 : 0c43b940 x4 : 8001f6f0c000 > [ 41.242030] x3 : 0001 x2 : 09f9bf00 > [ 41.243666] x1 : 8001fffb82c8 x0 : 090a6018 > [ 41.245306] Process swapper/2 (pid: 0, stack limit = > 0x(ptrval)) > [ 41.247378] Call trace: > [ 41.248117] cpu_do_idle+0x8/0xc > [ 41.249111] do_idle+0x1dc/0x2a8 > [ 41.250111] cpu_startup_entry+0x28/0x30 > [ 41.251319] secondary_start_kernel+0x180/0x1c8 > [ 41.252725] Code: a8c17bfd d65f03c0 d5033f9f d503207f (d65f03c0) > [ 41.254606] ---[ end trace 221bc8a614fb5a1d ]--- > [ 41.256030] Kernel panic - not syncing: Fatal exception > [ 41.257644] SMP: stopping secondary CPUs > [ 41.258912] Kernel Offset: disabled > [ 41.260011] CPU features: 0x0,22a00238 > [ 41.261178] Memory Limit: none > [ 41.262122] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > 2) Test for KVM: >- Use command line shown below to start qemu: > ./qemu-system-aarch64 \ > -name guest=ras \ > -machine virt,accel=kvm,gic-version=3,ras=on \ > -cpu host \ > -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ > -nodefaults \ > -kernel ${GUEST_KERNEL} \ > -initrd ${GUEST_FS} \ > -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x900" \ > -m 8192 \ > -smp 4 \ > -serial stdio \ > >- Run mca-recover and get the GPA(IPA) of allocated page which would be > corrupted on the later. >- Convert the GPA to HPA and corrupt this HPA via APEI/EINJ. >- Go back to guest and continue to read this page. > >- The result of test is shown below: > > root@genericarmv8:~/tools# ./mca-recover > pagesize: 0x1000 > before clear cache > flags for page 0x2317b2: uptodate active mmap anon swapbacked > vtop(0x9c9e8000) = 0x2317b2000 > Hit any key to access: before read > > after read > Access at Tue Sep 17 01:41:14 2019 > > flags for page 0x2317b2: uptodate active
Re: [Qemu-devel] [PATCH v18 0/6] Add ARMv8 RAS virtualization support in QEMU
Hi all, This patch series has been tested for both TCG and KVM scenes. 1) Test for TCG: - Re-compile qemu after applying the patch refered to https://patchwork.kernel.org/cover/10942757/#22640271). - Use command line shown below to start qemu: ./qemu-system-aarch64 \ -name guest=ras \ -machine virt,gic-version=3,ras=on \ -cpu cortex-a57 \ -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ -nodefaults \ -kernel ${GUEST_KERNEL} \ -initrd ${GUEST_FS} \ -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x900" \ -m 8192 \ -smp 4 \ -serial stdio \ - Send a signal to one of the VCPU threads: kill -s SIGBUS 71571 - The result of test is shown below: [ 41.194753] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 41.197329] {1}[Hardware Error]: event severity: recoverable [ 41.199078] {1}[Hardware Error]: Error 0, type: recoverable [ 41.200829] {1}[Hardware Error]: section_type: memory error [ 41.202603] {1}[Hardware Error]: physical_address: 0x400a1000 [ 41.204649] {1}[Hardware Error]: error_type: 0, unknown [ 41.206328] EDAC MC0: 1 UE Unknown on unknown label ( page:0x400a1 offset:0x0 grain:0) [ 41.208788] Internal error: synchronous external abort: 96000410 [#1] SMP [ 41.210879] Modules linked in: [ 41.211823] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0+ #8 [ 41.213698] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 41.215812] pstate: 60c00085 (nZCv daIf +PAN +UAO) [ 41.217296] pc : cpu_do_idle+0x8/0xc [ 41.218400] lr : arch_cpu_idle+0x2c/0x1b8 [ 41.219629] sp : 09f9bf00 [ 41.220649] x29: 09f9bf00 x28: [ 41.222310] x27: x26: 8001fe471d80 [ 41.223945] x25: x24: 0937ba38 [ 41.225581] x23: 090b3338 x22: 09379000 [ 41.227220] x21: 0937b000 x20: 0004 [ 41.228871] x19: 090a6000 x18: [ 41.230517] x17: x16: [ 41.232165] x15: x14: [ 41.233810] x13: 089f4da8 x12: 000e [ 41.235448] x11: 089f4d80 x10: 0af0 [ 41.237101] x9 : 09f9be80 x8 : 8001fe4728d0 [ 41.238738] x7 : 0004 x6 : 8001fffbaf30 [ 41.240380] x5 : 0c43b940 x4 : 8001f6f0c000 [ 41.242030] x3 : 0001 x2 : 09f9bf00 [ 41.243666] x1 : 8001fffb82c8 x0 : 090a6018 [ 41.245306] Process swapper/2 (pid: 0, stack limit = 0x(ptrval)) [ 41.247378] Call trace: [ 41.248117] cpu_do_idle+0x8/0xc [ 41.249111] do_idle+0x1dc/0x2a8 [ 41.250111] cpu_startup_entry+0x28/0x30 [ 41.251319] secondary_start_kernel+0x180/0x1c8 [ 41.252725] Code: a8c17bfd d65f03c0 d5033f9f d503207f (d65f03c0) [ 41.254606] ---[ end trace 221bc8a614fb5a1d ]--- [ 41.256030] Kernel panic - not syncing: Fatal exception [ 41.257644] SMP: stopping secondary CPUs [ 41.258912] Kernel Offset: disabled [ 41.260011] CPU features: 0x0,22a00238 [ 41.261178] Memory Limit: none [ 41.262122] ---[ end Kernel panic - not syncing: Fatal exception ]--- 2) Test for KVM: - Use command line shown below to start qemu: ./qemu-system-aarch64 \ -name guest=ras \ -machine virt,accel=kvm,gic-version=3,ras=on \ -cpu host \ -bios /usr/share/edk2/aarch64/QEMU_EFI.fd \ -nodefaults \ -kernel ${GUEST_KERNEL} \ -initrd ${GUEST_FS} \ -append "rdinit=init console=ttyAMA0 earlycon=pl011,0x900" \ -m 8192 \ -smp 4 \ -serial stdio \ - Run mca-recover and get the GPA(IPA) of allocated page which would be corrupted on the later. - Convert the GPA to HPA and corrupt this HPA via APEI/EINJ. - Go back to guest and continue to read this page. - The result of test is shown below: root@genericarmv8:~/tools# ./mca-recover pagesize: 0x1000 before clear cache flags for page 0x2317b2: uptodate active mmap anon swapbacked vtop(0x9c9e8000) = 0x2317b2000 Hit any key to access: before read after read Access at Tue Sep 17 01:41:14 2019 flags for page 0x2317b2: uptodate active mmap anon swapbacked [ 403.298539] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 403.301421] {1}[Hardware Error]: event severity: recoverable [ 403.303217] {1}[Hardware Error]: Error 0, type: recoverable [ 403.304920] {1}[Hardware Error]: section_type: memory error [
[Qemu-devel] [PATCH v18 0/6] Add ARMv8 RAS virtualization support in QEMU
In the ARMv8 platform, the CPU error types are synchronous external abort(SEA) and SError Interrupt (SEI). If exception happens in guest, sometimes it's better for guest to perform the recovery, because host does not know the detailed information of guest. For example, if an exception happens in a user-space application within guest, host does not know which application encounters errors. For the ARMv8 SEA/SEI, KVM or host kernel delivers SIGBUS to notify userspace. After user space gets the notification, it will record the CPER into guest GHES buffer and inject an exception or IRQ into guest. In the current implementation, if the type of SIGBUS is BUS_MCEERR_AR, we will treat it as a synchronous exception, and notify guest with ARMv8 SEA notification type after recording CPER into guest. This series of patches are based on Qemu 4.1, which include two parts: 1. Generate APEI/GHES table. 2. Handle the SIGBUS signal, record the CPER in runtime and fill it into guest memory, then notify guest according to the type of SIGBUS. The whole solution was suggested by James(james.mo...@arm.com); The solution of APEI section was suggested by Laszlo(ler...@redhat.com). Show some discussions in [1]. This series of patches have already been tested on ARM64 platform with RAS feature enabled: Show the APEI part verification result in [2]. Show the BUS_MCEERR_AR SIGBUS handling verification result in [3]. --- Since Dongjiu is too busy to do this work, I will finish the rest work on behalf of him. --- Change since v17: 1. Improve some commit messages and comments. 2. Fix some code-style problems. 3. Add a *ras* machine option. 4. Move HEST/GHES related structures and macros into "hw/acpi/acpi_ghes.*". 5. Move HWPoison page functions into "include/sysemu/kvm_int.h". 6. Fix some bugs. 7. Improve the design document. Change since v16: 1. check whether ACPI table is enabled when handling the memory error in the SIGBUS handler. Change since v15: 1. Add a doc-comment in the proper format for 'include/exec/ram_addr.h' 2. Remove write_part_cpustate_to_list() because there is another bug fix patch has been merged "arm: Allow system registers for KVM guests to be changed by QEMU code" 3. Add some comments for kvm_inject_arm_sea() in 'target/arm/kvm64.c' 4. Compare the arm_current_el() return value to 0,1,2,3, not to PSTATE_MODE_* constants. 5. Change the RAS support wasn't introduced before 4.1 QEMU version. 6. Move the no_ras flag patch to begin in this series Change since v14: 1. Remove the BUS_MCEERR_AO handling logic because this asynchronous signal was masked by main thread 2. Address some Igor Mammedov's comments(ACPI part) 1) change the comments for the enum AcpiHestNotifyType definition and remove ditto in patch 1 2) change some patch commit messages and separate "APEI GHES table generation" patch to more patches. 3. Address some peter's comments(arm64 Synchronous External Abort injection) 1) change some code notes 2) using arm_current_el() for current EL 2) use the helper functions for those (syn_data_abort_*). Change since v13: 1. Move the patches that set guest ESR and inject virtual SError out of this series 2. Clean and optimize the APEI part patches 3. Update the commit messages and add some comments for the code Change since v12: 1. Address Paolo's comments to move HWPoisonPage definition to accel/kvm/kvm-all.c 2. Only call kvm_cpu_synchronize_state() when get the BUS_MCEERR_AR signal 3. Only add and enable GPIO-Signal and ARMv8 SEA two hardware error sources 4. Address Michael's comments to not sync SPDX from Linux kernel header file Change since v11: Address James's comments(james.mo...@arm.com) 1. Check whether KVM has the capability to to set ESR instead of detecting host CPU RAS capability 2. For SIGBUS_MCEERR_AR SIGBUS, use Synchronous-External-Abort(SEA) notification type for SIGBUS_MCEERR_AO SIGBUS, use GPIO-Signal notification Address Shannon's comments(for ACPI part): 1. Unify hest_ghes.c and hest_ghes.h license declaration 2. Remove unnecessary including "qmp-commands.h" in hest_ghes.c 3. Unconditionally add guest APEI table based on James's comments(james.mo...@arm.com) 4. Add a option to virt machine for migration compatibility. On new virt machine it's on by default while off for old ones, we enabled it since 2.12 5. Refer to the ACPI spec version which introduces Hardware Error Notification first time 6. Add ACPI_HEST_NOTIFY_RESERVED notification type Address Igor's comments(for ACPI part): 1. Add doc patch first which will describe how it's supposed to work between QEMU/firmware/guest OS with expected flows. 2. Move APEI diagrams into doc/spec patch 3. Remove redundant g_malloc in ghes_record_cper() 4. Use build_append_int_noprefix() API to compose whole error status block and whole APEI table, and try to get rid of most structures in patch 1, as they will be left unused after that 5. Reuse something like