Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi gengdongjiu, (I've re-ordered some of the hunks here:) On 04/09/17 12:10, gengdongjiu wrote: > On 2017/9/1 1:43, James Morse wrote: >> On 28/08/17 11:38, Dongjiu Geng wrote: >>> Not call memory_failure() to handle it. Because the error address recorded >>> by APEI is not accurated, so can not identify the address to hwpoison >>> memory. >> >> This looks like a firmware bug, what address do you get in your CPER >> records? It should be a physical address. > No, not firmware bug. At least in the armv8.0 CPU and huawei's armv8.2 CPU, > the architecture decided it is not accurate, this abort is asynchronous not > synchronous. This is going to be a problem. (I'm chasing Achin to find out when this is allowed to happen and what we're expected to do about it!) I hope this isn't the default behaviour, but only happens in exceptionally rare circumstances. >> To report a memory-error you must have an address. > maybe we can not get the accurate error address, can you get it in your armv8 > platform? I only have software-models, they only generate the errors you tell them to. I think I see why you're taking this approach with the series, the scenario is: 1. Firmware takes an SError due to a bad memory location from guest EL0. 2. The CPU doesn't provide the address of the memory location. You want to confine this error as much as possible, in particular to the context it came from (e.g. guest EL0). CPU context isn't something the CPER records can describe (they describe failures in system components), hence your hybrid {kernel,firmware}-first code. I don't think its safe to kill guest-EL0 and hope this confined the error. If the affected page of guest memory has never been written to by the guest, the host will map in the global zero-page, (made read-only at stage2). If the corruption is in this page it affects the host kernel, guests and user-space processes. Just because the error came from guest-EL0 doesn't mean kernel/hypervisor memory isn't affected. This doesn't just affect that one page: KSM may have merged every copy of every guest user-space's libc, which has subsequently become corrupt. The first guest-EL0 to step in this triggers the fault, but it affects all the guests. With the address all the guests can fix this error, and KSM will re-merge the pages. Without the address every user-space process in every guest will eventually be killed. We aren't even guaranteed that the access that caused the fault came from your guest EL0. The fault may be in the page tables belonging to the guest kernel, even worse they may belong to they hypervisor's stage2 page tables. (Thanks to Mark and Robin for these examples) I think in this scenario your firmware should describe a memory-error with an unknown address. (i.e. don't set the 'physical address valid' bit in CPER's 'Table 275 Memory Error Record'). When Linux gets one of these, it should panic(): We know some memory is corrupt, we don't know where it is. >> User-space may be signalled by the memory_failure() helper, and user-space >> may choose to notify the guest about the memory-failure, but this would be a >> new error. > For the SError, it is asynchronous abort. so it is not better to call > memory_failure() helper, because the error address is not accurate. > memory_failure() will offline or poison the address, but the address is not > accurate. so it is dangerous By 'not accurate' do you mean the CPU provides an address, and its wrong. (surely this is a CPU bug), or just no address is provided. (i.e. the ERRADDR.AI 'address incorrect' bit is set). >>> Because the error was taken from a lower Exception level, if the >>> exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware >>> sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then >>> transfers to hypervisor. >> >> What happens if you took an SError from EL2 and EL2 has PSTATE.A set masking >> SError? (this is very common today: all kernel code runs like this). > Firstly, the guest OS usually runs in the El1 or El0, not El2. > if El2 happens an SError, it will trap to EL3 firmware even though the > PSTATE.A is set. > Because the PSTATE.A can not mask it if the SError is trapped to EL3. Sure, we agree that from the CPU's view when SCR_EL3.EA is set physical-SError can't be masked when executing any EL below EL3. My question was about the 'firmware sets ESR_EL2/FAR_EL2 to fake an exception trap to EL2' step. While EL3 can take the physical-SError at any time the normal-world is running, it can't always deliver a fake-SError to EL2, because EL2 believes it has masked physical-SError. With the SError rework this should only be masked while we are in entry.S preparing to handle an exception, receiving an unexpected asynchronous exception at this point would overwrite ELR/ESR, meaning we could never handle the original exception. >> What happens if the hypervisor then executes an ESB with PSTATE.A set? It >> expects to see any pending SError deferred and its syndrome
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi gengdongjiu, (I've re-ordered some of the hunks here:) On 04/09/17 12:10, gengdongjiu wrote: > On 2017/9/1 1:43, James Morse wrote: >> On 28/08/17 11:38, Dongjiu Geng wrote: >>> Not call memory_failure() to handle it. Because the error address recorded >>> by APEI is not accurated, so can not identify the address to hwpoison >>> memory. >> >> This looks like a firmware bug, what address do you get in your CPER >> records? It should be a physical address. > No, not firmware bug. At least in the armv8.0 CPU and huawei's armv8.2 CPU, > the architecture decided it is not accurate, this abort is asynchronous not > synchronous. This is going to be a problem. (I'm chasing Achin to find out when this is allowed to happen and what we're expected to do about it!) I hope this isn't the default behaviour, but only happens in exceptionally rare circumstances. >> To report a memory-error you must have an address. > maybe we can not get the accurate error address, can you get it in your armv8 > platform? I only have software-models, they only generate the errors you tell them to. I think I see why you're taking this approach with the series, the scenario is: 1. Firmware takes an SError due to a bad memory location from guest EL0. 2. The CPU doesn't provide the address of the memory location. You want to confine this error as much as possible, in particular to the context it came from (e.g. guest EL0). CPU context isn't something the CPER records can describe (they describe failures in system components), hence your hybrid {kernel,firmware}-first code. I don't think its safe to kill guest-EL0 and hope this confined the error. If the affected page of guest memory has never been written to by the guest, the host will map in the global zero-page, (made read-only at stage2). If the corruption is in this page it affects the host kernel, guests and user-space processes. Just because the error came from guest-EL0 doesn't mean kernel/hypervisor memory isn't affected. This doesn't just affect that one page: KSM may have merged every copy of every guest user-space's libc, which has subsequently become corrupt. The first guest-EL0 to step in this triggers the fault, but it affects all the guests. With the address all the guests can fix this error, and KSM will re-merge the pages. Without the address every user-space process in every guest will eventually be killed. We aren't even guaranteed that the access that caused the fault came from your guest EL0. The fault may be in the page tables belonging to the guest kernel, even worse they may belong to they hypervisor's stage2 page tables. (Thanks to Mark and Robin for these examples) I think in this scenario your firmware should describe a memory-error with an unknown address. (i.e. don't set the 'physical address valid' bit in CPER's 'Table 275 Memory Error Record'). When Linux gets one of these, it should panic(): We know some memory is corrupt, we don't know where it is. >> User-space may be signalled by the memory_failure() helper, and user-space >> may choose to notify the guest about the memory-failure, but this would be a >> new error. > For the SError, it is asynchronous abort. so it is not better to call > memory_failure() helper, because the error address is not accurate. > memory_failure() will offline or poison the address, but the address is not > accurate. so it is dangerous By 'not accurate' do you mean the CPU provides an address, and its wrong. (surely this is a CPU bug), or just no address is provided. (i.e. the ERRADDR.AI 'address incorrect' bit is set). >>> Because the error was taken from a lower Exception level, if the >>> exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware >>> sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then >>> transfers to hypervisor. >> >> What happens if you took an SError from EL2 and EL2 has PSTATE.A set masking >> SError? (this is very common today: all kernel code runs like this). > Firstly, the guest OS usually runs in the El1 or El0, not El2. > if El2 happens an SError, it will trap to EL3 firmware even though the > PSTATE.A is set. > Because the PSTATE.A can not mask it if the SError is trapped to EL3. Sure, we agree that from the CPU's view when SCR_EL3.EA is set physical-SError can't be masked when executing any EL below EL3. My question was about the 'firmware sets ESR_EL2/FAR_EL2 to fake an exception trap to EL2' step. While EL3 can take the physical-SError at any time the normal-world is running, it can't always deliver a fake-SError to EL2, because EL2 believes it has masked physical-SError. With the SError rework this should only be masked while we are in entry.S preparing to handle an exception, receiving an unexpected asynchronous exception at this point would overwrite ELR/ESR, meaning we could never handle the original exception. >> What happens if the hypervisor then executes an ESB with PSTATE.A set? It >> expects to see any pending SError deferred and its syndrome
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi Peter, On 2017/9/6 19:19, Peter Maydell wrote: > On 28 August 2017 at 11:38, Dongjiu Gengwrote: >> In the firmware-first RAS solution, corrupt data is detected in a >> memory location when guest OS application software executing at EL0 >> or guest OS kernel El1 software are reading from the memory. The >> memory node records errors in an error record accessible using >> system registers. > > Hi Dongjiu -- it looks like this patch set is extending > the API KVM provides to userspace, but it doesn't update > the documentation in Documentation/virtual/kvm/api.txt. > I appreciate the API is still somewhat under discussion, > but if you can include the docs updates it's helpful to > me for reviewing whether the API makes sense from the > userspace consumer end of it. sure, it should. thanks a lot for the reminder. I will update the related docs in my next patch set version. > > thanks > -- PMM > > . >
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi Peter, On 2017/9/6 19:19, Peter Maydell wrote: > On 28 August 2017 at 11:38, Dongjiu Geng wrote: >> In the firmware-first RAS solution, corrupt data is detected in a >> memory location when guest OS application software executing at EL0 >> or guest OS kernel El1 software are reading from the memory. The >> memory node records errors in an error record accessible using >> system registers. > > Hi Dongjiu -- it looks like this patch set is extending > the API KVM provides to userspace, but it doesn't update > the documentation in Documentation/virtual/kvm/api.txt. > I appreciate the API is still somewhat under discussion, > but if you can include the docs updates it's helpful to > me for reviewing whether the API makes sense from the > userspace consumer end of it. sure, it should. thanks a lot for the reminder. I will update the related docs in my next patch set version. > > thanks > -- PMM > > . >
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
On 28 August 2017 at 11:38, Dongjiu Gengwrote: > In the firmware-first RAS solution, corrupt data is detected in a > memory location when guest OS application software executing at EL0 > or guest OS kernel El1 software are reading from the memory. The > memory node records errors in an error record accessible using > system registers. Hi Dongjiu -- it looks like this patch set is extending the API KVM provides to userspace, but it doesn't update the documentation in Documentation/virtual/kvm/api.txt. I appreciate the API is still somewhat under discussion, but if you can include the docs updates it's helpful to me for reviewing whether the API makes sense from the userspace consumer end of it. thanks -- PMM
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
On 28 August 2017 at 11:38, Dongjiu Geng wrote: > In the firmware-first RAS solution, corrupt data is detected in a > memory location when guest OS application software executing at EL0 > or guest OS kernel El1 software are reading from the memory. The > memory node records errors in an error record accessible using > system registers. Hi Dongjiu -- it looks like this patch set is extending the API KVM provides to userspace, but it doesn't update the documentation in Documentation/virtual/kvm/api.txt. I appreciate the API is still somewhat under discussion, but if you can include the docs updates it's helpful to me for reviewing whether the API makes sense from the userspace consumer end of it. thanks -- PMM
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi James On 2017/9/1 1:43, James Morse wrote: > Hi Dongjiu Geng, > > On 28/08/17 11:38, Dongjiu Geng wrote: >> In the firmware-first RAS solution, corrupt data is detected in a >> memory location when guest OS application software executing at EL0 >> or guest OS kernel El1 software are reading from the memory. The >> memory node records errors in an error record accessible using >> system registers. >> >> Because SCR_EL3.EA is 1, then CPU will trap to El3 firmware, EL3 >> firmware records the error to APEI table through reading system >> register. > > Strictly speaking these are CPER records in a memory region pointed to by the > HEST->GHES ACPI table. yes, Here I mean EL3 firmware reads the RAS Error record register ERXxxx_EL1, such as ERXADDR_EL1/ERXMISC0_EL1, to get the detailed error info, then record them to HEST->GHES ACPI table in a memory region. > > >> Because the error was taken from a lower Exception level, if the >> exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware >> sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then >> transfers to hypervisor. > > What happens if you took an SError from EL2 and EL2 has PSTATE.A set masking > SError? (this is very common today: all kernel code runs like this). Firstly, the guest OS usually runs in the El1 or El0, not El2. if El2 happens an SError, it will trap to EL3 firmware even though the PSTATE.A is set. Because the PSTATE.A can not mask it if the SError is trapped to EL3. > > What happens if the hypervisor then executes an ESB with PSTATE.A set? It > expects to see any pending SError deferred and its syndrome written to > DISR_EL1, > but this register is RAZ/WI when you set SCR_EL3.EA. '4.4.2' of [0] >From my understand, if the SCR_EL3.EA is set, the Abort can not mask, it >always happen and take to EL3, DISR_El1 can not record the syndrome. DISR_El1 is only recorded when the External Abort is masked, but when SCR_EL3.EA is set, the pstate.A can not mask the Error. > > >> For the synchronous external abort(SEA), Hypervisor calls the >> ghes_handle_memory_failure() to deal with this error, >> ghes_handle_memory_failure() function reads the APEI table and >> callls memory_failure() to decide whether it needs to deliver >> SIGBUS signal to user space, the advantage of using SIGBUS signal >> to notify user space is that it can be compatible with Non-Kvm users. >> >> For the SError Interrupt(SEI),KVM firstly classified the error. > > KVM can't parse the CPER records, nor does it know where to look to find them. > KVM should call out to the APEI code so the host kernel can handle the error. KVM does not parse the CPER records, I mean KVM classified the error according to the esr_el2.AET. As shown below: AET, bits [12:10], when categorized Asynchronous Error Type. Describes the state of the PE after taking the SError interrupt exception. Software might use the information in the syndrome registers to determine what recovery might be possible. See Architecturally consumed errors. The possible values of this field are: 0b000 Uncontainable error (UC). 0b001 Unrecoverable error (UEU). 0b010 Restartable error (UEO). 0b011 Recoverable error (UER). 0b110 Corrected error (CE). I pasted the code here: +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu) +{ + unsigned int esr = kvm_vcpu_get_hsr(vcpu); + bool impdef_syndrome = esr & ESR_ELx_ISV; /* aka IDS */ + unsigned int aet = esr & ESR_ELx_AET; + + /* +* In below three conditions, it will directly inject the virtual SError. +* 1. Not support RAS extension; the Syndrome is IMPLEMENTATION DEFINED; +* AET is RES0 if 'the value returned in the DFSC field is not +* [ESR_ELx_FSC_SERROR]' +*/ + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN) || impdef_syndrome || + ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) { + kvm_inject_vabt(vcpu); + return 1; + } + + switch (aet) { + case ESR_ELx_AET_CE:/* corrected error */ + case ESR_ELx_AET_UEO: /* restartable error, not yet consumed */ + return 0; /* continue processing the guest exit */ + case ESR_ELx_AET_UEU: /* The error has not been propagated */ + /* +* Only handle the guest user mode SEI if the error has not been propagated +*/ + if ((!vcpu_mode_priv(vcpu)) && !handle_guest_sei(kvm_vcpu_get_hsr(vcpu))) + return 1; + + /* If SError handling is failed, continue run */ + default: + /* +* Until now, the CPU supports RAS and SEI is fatal, or user space +* does not support to handle the SError. +*/ + panic("This Asynchronous SError interrupt is dangerous, panic"); + } +} + I have called the kernel code to handle the SError. +/* + * Handle SError interrupt that occur in guest OS. + * + * The return value will be
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi James On 2017/9/1 1:43, James Morse wrote: > Hi Dongjiu Geng, > > On 28/08/17 11:38, Dongjiu Geng wrote: >> In the firmware-first RAS solution, corrupt data is detected in a >> memory location when guest OS application software executing at EL0 >> or guest OS kernel El1 software are reading from the memory. The >> memory node records errors in an error record accessible using >> system registers. >> >> Because SCR_EL3.EA is 1, then CPU will trap to El3 firmware, EL3 >> firmware records the error to APEI table through reading system >> register. > > Strictly speaking these are CPER records in a memory region pointed to by the > HEST->GHES ACPI table. yes, Here I mean EL3 firmware reads the RAS Error record register ERXxxx_EL1, such as ERXADDR_EL1/ERXMISC0_EL1, to get the detailed error info, then record them to HEST->GHES ACPI table in a memory region. > > >> Because the error was taken from a lower Exception level, if the >> exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware >> sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then >> transfers to hypervisor. > > What happens if you took an SError from EL2 and EL2 has PSTATE.A set masking > SError? (this is very common today: all kernel code runs like this). Firstly, the guest OS usually runs in the El1 or El0, not El2. if El2 happens an SError, it will trap to EL3 firmware even though the PSTATE.A is set. Because the PSTATE.A can not mask it if the SError is trapped to EL3. > > What happens if the hypervisor then executes an ESB with PSTATE.A set? It > expects to see any pending SError deferred and its syndrome written to > DISR_EL1, > but this register is RAZ/WI when you set SCR_EL3.EA. '4.4.2' of [0] >From my understand, if the SCR_EL3.EA is set, the Abort can not mask, it >always happen and take to EL3, DISR_El1 can not record the syndrome. DISR_El1 is only recorded when the External Abort is masked, but when SCR_EL3.EA is set, the pstate.A can not mask the Error. > > >> For the synchronous external abort(SEA), Hypervisor calls the >> ghes_handle_memory_failure() to deal with this error, >> ghes_handle_memory_failure() function reads the APEI table and >> callls memory_failure() to decide whether it needs to deliver >> SIGBUS signal to user space, the advantage of using SIGBUS signal >> to notify user space is that it can be compatible with Non-Kvm users. >> >> For the SError Interrupt(SEI),KVM firstly classified the error. > > KVM can't parse the CPER records, nor does it know where to look to find them. > KVM should call out to the APEI code so the host kernel can handle the error. KVM does not parse the CPER records, I mean KVM classified the error according to the esr_el2.AET. As shown below: AET, bits [12:10], when categorized Asynchronous Error Type. Describes the state of the PE after taking the SError interrupt exception. Software might use the information in the syndrome registers to determine what recovery might be possible. See Architecturally consumed errors. The possible values of this field are: 0b000 Uncontainable error (UC). 0b001 Unrecoverable error (UEU). 0b010 Restartable error (UEO). 0b011 Recoverable error (UER). 0b110 Corrected error (CE). I pasted the code here: +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu) +{ + unsigned int esr = kvm_vcpu_get_hsr(vcpu); + bool impdef_syndrome = esr & ESR_ELx_ISV; /* aka IDS */ + unsigned int aet = esr & ESR_ELx_AET; + + /* +* In below three conditions, it will directly inject the virtual SError. +* 1. Not support RAS extension; the Syndrome is IMPLEMENTATION DEFINED; +* AET is RES0 if 'the value returned in the DFSC field is not +* [ESR_ELx_FSC_SERROR]' +*/ + if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN) || impdef_syndrome || + ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) { + kvm_inject_vabt(vcpu); + return 1; + } + + switch (aet) { + case ESR_ELx_AET_CE:/* corrected error */ + case ESR_ELx_AET_UEO: /* restartable error, not yet consumed */ + return 0; /* continue processing the guest exit */ + case ESR_ELx_AET_UEU: /* The error has not been propagated */ + /* +* Only handle the guest user mode SEI if the error has not been propagated +*/ + if ((!vcpu_mode_priv(vcpu)) && !handle_guest_sei(kvm_vcpu_get_hsr(vcpu))) + return 1; + + /* If SError handling is failed, continue run */ + default: + /* +* Until now, the CPU supports RAS and SEI is fatal, or user space +* does not support to handle the SError. +*/ + panic("This Asynchronous SError interrupt is dangerous, panic"); + } +} + I have called the kernel code to handle the SError. +/* + * Handle SError interrupt that occur in guest OS. + * + * The return value will be
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi Dongjiu Geng, On 28/08/17 11:38, Dongjiu Geng wrote: > In the firmware-first RAS solution, corrupt data is detected in a > memory location when guest OS application software executing at EL0 > or guest OS kernel El1 software are reading from the memory. The > memory node records errors in an error record accessible using > system registers. > > Because SCR_EL3.EA is 1, then CPU will trap to El3 firmware, EL3 > firmware records the error to APEI table through reading system > register. Strictly speaking these are CPER records in a memory region pointed to by the HEST->GHES ACPI table. > Because the error was taken from a lower Exception level, if the > exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware > sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then > transfers to hypervisor. What happens if you took an SError from EL2 and EL2 has PSTATE.A set masking SError? (this is very common today: all kernel code runs like this). What happens if the hypervisor then executes an ESB with PSTATE.A set? It expects to see any pending SError deferred and its syndrome written to DISR_EL1, but this register is RAZ/WI when you set SCR_EL3.EA. '4.4.2' of [0] > For the synchronous external abort(SEA), Hypervisor calls the > ghes_handle_memory_failure() to deal with this error, > ghes_handle_memory_failure() function reads the APEI table and > callls memory_failure() to decide whether it needs to deliver > SIGBUS signal to user space, the advantage of using SIGBUS signal > to notify user space is that it can be compatible with Non-Kvm users. > > For the SError Interrupt(SEI),KVM firstly classified the error. KVM can't parse the CPER records, nor does it know where to look to find them. KVM should call out to the APEI code so the host kernel can handle the error. User-space may be signalled by the memory_failure() helper, and user-space may choose to notify the guest about the memory-failure, but this would be a new error. > Not call memory_failure() to handle it. Because the error address recorded > by APEI is not accurated, so can not identify the address to hwpoison > memory. This looks like a firmware bug, what address do you get in your CPER records? It should be a physical address. To report a memory-error you must have an address. If the error wasn't detected as a synchronous access then delivering a synchronous-external-abort is inappropriate (I think we both agree on this), and SError-interrupt doesn't have a way of specifying an address ... but the CPER records do. For firmware-first your SError-interrupt is just a notification, its the CPER records the OS uses to handle the error. > If the SError error comes from guest user mode and is not propagated, > then signal user space to handle it, otherwise, directly injects virtual > SError, or panic if the error is fatal. What do you mean by propagated? I don't think we should ever hand RAS notifications to user-space, the host kernel should handle them, then describe the symptom (e.g. this region of your va space is gone) to user-space. > when user space handles the error, > it will specify syndrome for the injected virtual SError. This syndrome value > is set to the VSESR_EL2. VSESR_EL2 is a new ARMv8.2 RAS extensions register > which provides the syndrome value reported to software on taking a virtual > SError interrupt exception. Thanks, James [0] https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf
Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
Hi Dongjiu Geng, On 28/08/17 11:38, Dongjiu Geng wrote: > In the firmware-first RAS solution, corrupt data is detected in a > memory location when guest OS application software executing at EL0 > or guest OS kernel El1 software are reading from the memory. The > memory node records errors in an error record accessible using > system registers. > > Because SCR_EL3.EA is 1, then CPU will trap to El3 firmware, EL3 > firmware records the error to APEI table through reading system > register. Strictly speaking these are CPER records in a memory region pointed to by the HEST->GHES ACPI table. > Because the error was taken from a lower Exception level, if the > exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware > sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then > transfers to hypervisor. What happens if you took an SError from EL2 and EL2 has PSTATE.A set masking SError? (this is very common today: all kernel code runs like this). What happens if the hypervisor then executes an ESB with PSTATE.A set? It expects to see any pending SError deferred and its syndrome written to DISR_EL1, but this register is RAZ/WI when you set SCR_EL3.EA. '4.4.2' of [0] > For the synchronous external abort(SEA), Hypervisor calls the > ghes_handle_memory_failure() to deal with this error, > ghes_handle_memory_failure() function reads the APEI table and > callls memory_failure() to decide whether it needs to deliver > SIGBUS signal to user space, the advantage of using SIGBUS signal > to notify user space is that it can be compatible with Non-Kvm users. > > For the SError Interrupt(SEI),KVM firstly classified the error. KVM can't parse the CPER records, nor does it know where to look to find them. KVM should call out to the APEI code so the host kernel can handle the error. User-space may be signalled by the memory_failure() helper, and user-space may choose to notify the guest about the memory-failure, but this would be a new error. > Not call memory_failure() to handle it. Because the error address recorded > by APEI is not accurated, so can not identify the address to hwpoison > memory. This looks like a firmware bug, what address do you get in your CPER records? It should be a physical address. To report a memory-error you must have an address. If the error wasn't detected as a synchronous access then delivering a synchronous-external-abort is inappropriate (I think we both agree on this), and SError-interrupt doesn't have a way of specifying an address ... but the CPER records do. For firmware-first your SError-interrupt is just a notification, its the CPER records the OS uses to handle the error. > If the SError error comes from guest user mode and is not propagated, > then signal user space to handle it, otherwise, directly injects virtual > SError, or panic if the error is fatal. What do you mean by propagated? I don't think we should ever hand RAS notifications to user-space, the host kernel should handle them, then describe the symptom (e.g. this region of your va space is gone) to user-space. > when user space handles the error, > it will specify syndrome for the injected virtual SError. This syndrome value > is set to the VSESR_EL2. VSESR_EL2 is a new ARMv8.2 RAS extensions register > which provides the syndrome value reported to software on taking a virtual > SError interrupt exception. Thanks, James [0] https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf