Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-06 Thread James Morse
Hi gengdongjiu,

On 05/07/17 09:14, gengdongjiu wrote:
> On 2017/7/4 18:14, James Morse wrote:
>> Can you give us a specific example of an error you are trying to handle?

> For example:
> guest OS user space accesses device type memory, but happen SError. because 
> the
> SError is asynchronous faults, it does not take immediately. when guest OS 
> call "SVC" to enter guest os
> kernel space, the ESB instruction(Error Synchronization Barrier) will defter 
> this SError. so the SError happen immediately.

Ah, this isn't necessarily a 'RAS notification' SError/SEI, it may be a
'vanilla', v8.0 SError.

You've given a guest access to a physical device (how?), the guest has done
something, which caused the device to respond with SError.

Do you have a specific use-case for this? What is the ESR? What kinds of CPER
records does firmware generate? (if any)


We have to be careful here as devices can still generate asynchronous-interrupts
using SError, these aren't contained by ESB barriers. For these we should fall
back to KVM's v8.0 SError behaviour. KVM can tell them apart as the APEI code
doesn't claim the SError as an SEI notification, and with the RAS extensions the
ESR has the 'IDS' bit set.


>> How would a non-KVM user space process handle the error?

> it is indeed, non-KVM user space can not get the notification from hypervisor 
> or host kernel. thanks for the pointing out
> do you mean still Signal SIGBUS from memory_failure?

No, I was assuming this was a RAS notification SEI, (because your patch 1/3 of
touched the RAS cpu-features) being given to user space to handle.

Instead, can I ask how the host should handle this SError if it had accessed the
device itself?


I agree device pass-through is going to be a special case for KVM, but before
the host can deliver a device RAS error into the guest that was using the
device, it needs to fully understand what the error means:

The error may mean that the careful configuration that makes device-passthrough
safe no longer works, letting the guest continue to access the device may let it
damage another guest or the hyper visor.


We may need a way for the host RAS code to identify the driver responsible, to
handle the device error, or delegate it if that's appropriate.


[...]

>> So (a): a physical-CPU hardware error occurs, and then (c) we tell 
>> Qemu/kvmtool
>> via a KVM-specific API.
>>
>> Don't do this, it doesn't work for non-KVM users. You are exposing 
>> host-specific
>> implementation details to user space. What if I discover the same error via a
>> Polling GHES, or one of the IRQ flavours?

> James, you mainly concern the way that "tell Qemu/kvmtool via a KVM-specific 
> API", right?
> so how about still delivered SIGBUS same as the SEA(Synchronous External 
> Abort)?

> by the way, what is your meaning of below words?
>  >"What if I discover the same error via a Polling GHES, or one of the IRQ 
> flavours?"

This was my mistaken assumption that you were passing an APEI RAS SEI
notification to user space via a KVM specific API. This wouldn't work for
applications not using KVM, or notifications not using SEI.

Here I was asking what happens if the notification used NOTIFY_POLL or
NOTIFY_IRQ (instead of NOTIFY_SEI) in the GHES, but this isn't relevant as it
doesn't look like this is a APEI RAS notification.

[...]


>> If there is another type of CPER record where we should notify userspace, 
>> please
>> do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
>> drivers/firmware/efi/cper.c. These should consider all user-space 
>> applications,
>> not just users of KVM, and not just on arm64.
> 
> here I have a question, in the "drivers/acpi/apei/ghes.c" code, it only 
> handle the memory section of CPER.

Yes, we are certainly missing processing for the other record types.


> if the section type of CPER is processor, it will not notify user-space. so 
> only let userspace handle the memory section is reasonable?

I think the only errors that user-space can know more than the kernel are memory
errors. These are the only RAS errors we should expect user space to handle. All
the others fall into either 'corrected by the kernel' or 'fatal for userspace -
SIGKILL'.


>> For memory errors we already have BUS_MCEERR_AR - action-required, and
>> BUS_MCEERR_AO - action-optional.
>>
>> For a TLB error, (Table 250 of UEFI 2.6), what is Qemu expected to do? Linux 
>> has
>> to classify the error and handle it as far as possible. In most cases the 
>> error
>> is either handled (no notification required), or fatal. Memory errors are the
>> only example I've found so far where an application can do additional work to
>> handle the error.

>   James, only memory errors needs application to do additional work. UEFI 
> spec mentioned that?

No, its my observation based on the record types. Memory is the only thing an
application can change. Everything else belongs to the kernel.
For a corrupt page of anonymous memory, there is nothing the kernel can 

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-04 Thread gengdongjiu
Hi James,
  Thanks for the review. I will read your comments carefully and then reply to 
you.


On 2017/7/4 18:14, James Morse wrote:
> Hi gengdongjiu,
> 
> Can you give us a specific example of an error you are trying to handle?
> How would a non-KVM user space process handle the error?
> 
> KVM-users should be regular user space processes, we should not have a KVM-way
> and everyone-else-way of handling errors.
> 
> 
> On 04/07/17 05:46, gengdongjiu wrote:
>> On 2017/7/3 16:39, Christoffer Dall wrote:
>>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
 when SError happen, kvm notifies user space to record the CPER,
 user space specifies and passes the contents of ESR_EL1 on taking
 a virtual SError interrupt to KVM, KVM enables virtual system
 error or asynchronous abort with this specifies syndrome. This
 patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
 saves the virtual SError syndrome, it becomes the ESR_EL1 value when
 HCR_EL2.VSE injects an SError. This register is added by the
 RAS Extensions.
>>>
>>> This commit message is confusing and doesn't help me understand the
>>> patch.
>> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
>> the RAS solution?
> 
>>   a). In the firmware-first RAS solution, when guest OS happen a SError 
>> interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>>   b). The firmware logs, triages, and delegates the error exception to the 
>> hypervisor. As the error came from guest OS  EL1, firmware
>>   does by faking an SError interrupt exception entry to EL2.
>>   c). Control transfers to the hypervisor's delegated error recovery 
>> agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>>   Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
>> setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
> 
> So (a): a physical-CPU hardware error occurs, and then (c) we tell 
> Qemu/kvmtool
> via a KVM-specific API.
> 
> Don't do this, it doesn't work for non-KVM users. You are exposing 
> host-specific
> implementation details to user space. What if I discover the same error via a
> Polling GHES, or one of the IRQ flavours?
> 
> User space should not have to know, or care, how linux is notified about APEI
> RAS errors.
> 
> 
>> (2) what is this patch mainly do?
>>   As mentioned above, the hypervisor needs to enable virtual SError and pass 
>> the virtual syndrome to the guest OS.
>>
>>   a). when Control transfers to the hypervisor from firmware by faking an 
>> SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>>   host VA address( Qemu translate this VA address to the virtual machine 
>> physical address(IPA)) using below new added "serror_intr" struct.
>>  /* KVM_EXIT_SERROR_INTR */
>>  struct {
>>  __u32 syndrome_info;
>>  __u64 address;
>>  } serror_intr;
> 
> This is for a guest exit to host user-space. Here you are telling Qemu that a
> physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
> parse the ESR, this is the job of the operating system.
> 
> When you're using ACPI firmware-first, SError/SEI is just a notification, the
> important data is in the CPER records, which Qemu can't access, (and should be
> processed by Linux APEI code).
> 
> 
> It looks like you've calculated an address from FAR_EL2/HPFAR_EL2. For an
> SError, these are meaningless.
> 
> (These registers hold real values for Synchronous External Abort, but for
>  firmware-first we should prefer the CPER records.)
> 
> 
>>   b). Qemu gets the address(host VA) delivered by KVM, translate this host 
>> VA address to virtual machine physical address(IPA), and runtime record this 
>> virtual
>>  machine physical address(IPA) to the guest OS's APEI table.
> 
> I agree with this step, but you're acting on the wrong data. (You're 
> converting
> fault_ipa -> virtual address -> fault_ipa, something isn't right ...)
> 
> Qemu should react to a signal like BUS_MCEERR_A{R,O} from memory_failure(). 
> This
> mechanism serves all user space processes, not just kvm users. This is where 
> the
> user-space virtual address should come from. Qemu/kvmtool have to generate the
> guest IPA once they discover the affected memory was presented to the guest
> through KVM.
> 
> 
> Your KVM-specific mechanism exposes too much raw information (raw ESR values 
> to
> user space), and only serves applications using KVM.
> 
> If there is another type of CPER record where we should notify userspace, 
> please
> do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
> drivers/firmware/efi/cper.c. These should consider all user-space 
> applications,
> not just users of KVM, and not just on arm64.
> 
> 
>>   c). Qemu gets the syndrome_info delivered by KVM, it refers to this 
>> syndrome value(but can be different from it) to specify the virtual SError 
>> 

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-04 Thread James Morse
Hi gengdongjiu,

Can you give us a specific example of an error you are trying to handle?
How would a non-KVM user space process handle the error?

KVM-users should be regular user space processes, we should not have a KVM-way
and everyone-else-way of handling errors.


On 04/07/17 05:46, gengdongjiu wrote:
> On 2017/7/3 16:39, Christoffer Dall wrote:
>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>>> when SError happen, kvm notifies user space to record the CPER,
>>> user space specifies and passes the contents of ESR_EL1 on taking
>>> a virtual SError interrupt to KVM, KVM enables virtual system
>>> error or asynchronous abort with this specifies syndrome. This
>>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>>> HCR_EL2.VSE injects an SError. This register is added by the
>>> RAS Extensions.
>>
>> This commit message is confusing and doesn't help me understand the
>> patch.
> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
> the RAS solution?

>   a). In the firmware-first RAS solution, when guest OS happen a SError 
> interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>   b). The firmware logs, triages, and delegates the error exception to the 
> hypervisor. As the error came from guest OS  EL1, firmware
>   does by faking an SError interrupt exception entry to EL2.
>   c). Control transfers to the hypervisor's delegated error recovery 
> agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>   Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
> setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.

So (a): a physical-CPU hardware error occurs, and then (c) we tell Qemu/kvmtool
via a KVM-specific API.

Don't do this, it doesn't work for non-KVM users. You are exposing host-specific
implementation details to user space. What if I discover the same error via a
Polling GHES, or one of the IRQ flavours?

User space should not have to know, or care, how linux is notified about APEI
RAS errors.


> (2) what is this patch mainly do?
>   As mentioned above, the hypervisor needs to enable virtual SError and pass 
> the virtual syndrome to the guest OS.
> 
>   a). when Control transfers to the hypervisor from firmware by faking an 
> SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>   host VA address( Qemu translate this VA address to the virtual machine 
> physical address(IPA)) using below new added "serror_intr" struct.
>   /* KVM_EXIT_SERROR_INTR */
>   struct {
>   __u32 syndrome_info;
>   __u64 address;
>   } serror_intr;

This is for a guest exit to host user-space. Here you are telling Qemu that a
physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
parse the ESR, this is the job of the operating system.

When you're using ACPI firmware-first, SError/SEI is just a notification, the
important data is in the CPER records, which Qemu can't access, (and should be
processed by Linux APEI code).


It looks like you've calculated an address from FAR_EL2/HPFAR_EL2. For an
SError, these are meaningless.

(These registers hold real values for Synchronous External Abort, but for
 firmware-first we should prefer the CPER records.)


>   b). Qemu gets the address(host VA) delivered by KVM, translate this host VA 
> address to virtual machine physical address(IPA), and runtime record this 
> virtual
>  machine physical address(IPA) to the guest OS's APEI table.

I agree with this step, but you're acting on the wrong data. (You're converting
fault_ipa -> virtual address -> fault_ipa, something isn't right ...)

Qemu should react to a signal like BUS_MCEERR_A{R,O} from memory_failure(). This
mechanism serves all user space processes, not just kvm users. This is where the
user-space virtual address should come from. Qemu/kvmtool have to generate the
guest IPA once they discover the affected memory was presented to the guest
through KVM.


Your KVM-specific mechanism exposes too much raw information (raw ESR values to
user space), and only serves applications using KVM.

If there is another type of CPER record where we should notify userspace, please
do it from mm/memory-failure.c, drivers/acpi/apei/ghes.c or
drivers/firmware/efi/cper.c. These should consider all user-space applications,
not just users of KVM, and not just on arm64.


>   c). Qemu gets the syndrome_info delivered by KVM, it refers to this 
> syndrome value(but can be different from it) to specify the virtual SError 
> interrupt's syndrome through setting VESR_EL2.

'but can be different from it' is because a classification step is required, the
operating system should do this. We should only signal Qemu/kvmtool for errors
that can actually be handled. Some APEI notifications may be for corrected
errors, (I would hope these always 

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-04 Thread Christoffer Dall
Hi Dongjiu,

On Tue, Jul 04, 2017 at 12:46:23PM +0800, gengdongjiu wrote:
> Hi Christoffer,
>   thanks for the review.
> 
> 
> On 2017/7/3 16:39, Christoffer Dall wrote:
> > Hi Dongjiu,
> > 
> > On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
> >> when SError happen, kvm notifies user space to record the CPER,
> >> user space specifies and passes the contents of ESR_EL1 on taking
> >> a virtual SError interrupt to KVM, KVM enables virtual system
> >> error or asynchronous abort with this specifies syndrome. This
> >> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
> >> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
> >> HCR_EL2.VSE injects an SError. This register is added by the
> >> RAS Extensions.
> > 
> > This commit message is confusing and doesn't help me understand the
> > patch.
> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
> the RAS solution?
>   you can refer to document: "RAS_Extension_PRD03-PRDC-010953-32-0, 6.5.3 
> Example software sequences"
>   a). In the firmware-first RAS solution, when guest OS happen a SError 
> interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>   b). The firmware logs, triages, and delegates the error exception to the 
> hypervisor. As the error came from guest OS  EL1, firmware
>   does by faking an SError interrupt exception entry to EL2.
>   c). Control transfers to the hypervisor's delegated error recovery 
> agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>   Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
> setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
> 
> (2) what is this patch mainly do?
>   As mentioned above, the hypervisor needs to enable virtual SError and pass 
> the virtual syndrome to the guest OS.
> 
>   a). when Control transfers to the hypervisor from firmware by faking an 
> SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>   host VA address( Qemu translate this VA address to the virtual machine 
> physical address(IPA)) using below new added "serror_intr" struct.
>   /* KVM_EXIT_SERROR_INTR */
>   struct {
>   __u32 syndrome_info;
>   __u64 address;
>   } serror_intr;
> 
>   b). Qemu gets the address(host VA) delivered by KVM, translate this host VA 
> address to virtual machine physical address(IPA), and runtime record this 
> virtual
>  machine physical address(IPA) to the guest OS's APEI table.
> 
>   c). Qemu gets the syndrome_info delivered by KVM, it refers to this 
> syndrome value(but can be different from it) to specify the virtual SError 
> interrupt's syndrome through setting VESR_EL2.
> 
> the vsesr_el2 is armv8.2 register, its explanation can be found in 
> "RAS_Extension_PRD03-PRDC-010953-33-0, 5.6.18 VSESR_EL2, Virtual SError 
> Exception Syndrome Register"
> 
>   >>The VSESR_EL2 characteristics are:
>   >>Purpose:
>   >>Provides the syndrome value reported to software on taking a virtual 
> SError interrupt exception:
>   >>  — If the virtual SError interrupt is taken to EL1 using AArch64 
> then VSESR_EL2 provides the
>   >>syndrome value reported in ESR_EL1.
>   >>  — If the virtual SError interrupt is taken to EL1 using AArch32 
> then VSESR_EL2 provides the
>   >>syndrome values reported in DFSR.{AET, ExT} and the remainder 
> of the DFSR is set as
>   >>   defined by VMSAv8-32.
> 
>  so in the KVM, I added a new IOCTL(#define KVM_ARM_SEI  _IO(KVMIO,  
> 0xb8)) to pass the virtual SError syndrome value specified by Qemu and enable 
> a virtual System Error.
> 
> 
>  d). when world switch to guest OS, guest OS will happen virtual SError(this 
> virtual SError can not be route to EL3 firmware), guest OS uses the specified 
> syndrome value to do the recovery and
>  parses the guest OS CPER which is dynamically recorded by the Qemu in 
> the APEI table .
> 
> 

Please format the text nicely, and try to simplify the message when
resubmitting these patches.  I hope it will be easier for you to write
a meaningful commit message if you split these patches into multiple
ones.

One specific thing currently lacking from the commit message is a
discussion of why this API is needed in the first place - why can't we
reuse KVM_SET_ONE_REG, for example.

> 
> > 
> > I think this patch is trying to do too many things.  I suggest you split
> > the patch into (at least) one patch that captures exception information
> > from the world-switch path, one patch that deals with the new exit
> > reason, and finally a patch with the new ioctl.  That way you can write
> > a commit message for each patch describing first what the patch does,
> > and then why this is a good idea.
>   Ok, thanks for the good suggestion.
> 
> > 
> > Neverthess, I added some random comments below.
> > 
> >>
> >> Changes since v3:
> >> (1) Move 

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-03 Thread gengdongjiu
Hi Christoffer,
  thanks for the review.


On 2017/7/3 16:39, Christoffer Dall wrote:
> Hi Dongjiu,
> 
> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>> when SError happen, kvm notifies user space to record the CPER,
>> user space specifies and passes the contents of ESR_EL1 on taking
>> a virtual SError interrupt to KVM, KVM enables virtual system
>> error or asynchronous abort with this specifies syndrome. This
>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>> HCR_EL2.VSE injects an SError. This register is added by the
>> RAS Extensions.
> 
> This commit message is confusing and doesn't help me understand the
> patch.
(1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
the RAS solution?
  you can refer to document: "RAS_Extension_PRD03-PRDC-010953-32-0, 6.5.3 
Example software sequences"
  a). In the firmware-first RAS solution, when guest OS happen a SError 
interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
  b). The firmware logs, triages, and delegates the error exception to the 
hypervisor. As the error came from guest OS  EL1, firmware
  does by faking an SError interrupt exception entry to EL2.
  c). Control transfers to the hypervisor's delegated error recovery 
agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
  Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.

(2) what is this patch mainly do?
  As mentioned above, the hypervisor needs to enable virtual SError and pass 
the virtual syndrome to the guest OS.

  a). when Control transfers to the hypervisor from firmware by faking an 
SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
  host VA address( Qemu translate this VA address to the virtual machine 
physical address(IPA)) using below new added "serror_intr" struct.
/* KVM_EXIT_SERROR_INTR */
struct {
__u32 syndrome_info;
__u64 address;
} serror_intr;

  b). Qemu gets the address(host VA) delivered by KVM, translate this host VA 
address to virtual machine physical address(IPA), and runtime record this 
virtual
 machine physical address(IPA) to the guest OS's APEI table.

  c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome 
value(but can be different from it) to specify the virtual SError interrupt's 
syndrome through setting VESR_EL2.

the vsesr_el2 is armv8.2 register, its explanation can be found in 
"RAS_Extension_PRD03-PRDC-010953-33-0, 5.6.18 VSESR_EL2, Virtual SError 
Exception Syndrome Register"

>>The VSESR_EL2 characteristics are:
>>Purpose:
>>Provides the syndrome value reported to software on taking a virtual 
SError interrupt exception:
>>  — If the virtual SError interrupt is taken to EL1 using AArch64 
then VSESR_EL2 provides the
>>syndrome value reported in ESR_EL1.
>>  — If the virtual SError interrupt is taken to EL1 using AArch32 
then VSESR_EL2 provides the
>>syndrome values reported in DFSR.{AET, ExT} and the remainder 
of the DFSR is set as
>>   defined by VMSAv8-32.

 so in the KVM, I added a new IOCTL(#define KVM_ARM_SEI  _IO(KVMIO,  0xb8)) 
to pass the virtual SError syndrome value specified by Qemu and enable a 
virtual System Error.


 d). when world switch to guest OS, guest OS will happen virtual SError(this 
virtual SError can not be route to EL3 firmware), guest OS uses the specified 
syndrome value to do the recovery and
 parses the guest OS CPER which is dynamically recorded by the Qemu in the 
APEI table .



> 
> I think this patch is trying to do too many things.  I suggest you split
> the patch into (at least) one patch that captures exception information
> from the world-switch path, one patch that deals with the new exit
> reason, and finally a patch with the new ioctl.  That way you can write
> a commit message for each patch describing first what the patch does,
> and then why this is a good idea.
  Ok, thanks for the good suggestion.

> 
> Neverthess, I added some random comments below.
> 
>>
>> Changes since v3:
>> (1) Move restore VSESR_EL2 value logic to a helper method
>> (2) In the world-switch, not save VSESR_EL2, because no one cares the
>> old VSESR_EL2 value
>> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
>> a virtual system error
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Quanming Wu 
>> ---
>>  Documentation/virtual/kvm/api.txt| 10 ++
>>  arch/arm/include/asm/kvm_host.h  |  1 +
>>  arch/arm/kvm/arm.c   |  7 +++
>>  arch/arm/kvm/guest.c |  5 +
>>  arch/arm64/include/asm/esr.h |  2 ++
>>  

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-03 Thread Christoffer Dall
Hi Dongjiu,

On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
> when SError happen, kvm notifies user space to record the CPER,
> user space specifies and passes the contents of ESR_EL1 on taking
> a virtual SError interrupt to KVM, KVM enables virtual system
> error or asynchronous abort with this specifies syndrome. This
> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
> HCR_EL2.VSE injects an SError. This register is added by the
> RAS Extensions.

This commit message is confusing and doesn't help me understand the
patch.

I think this patch is trying to do too many things.  I suggest you split
the patch into (at least) one patch that captures exception information
from the world-switch path, one patch that deals with the new exit
reason, and finally a patch with the new ioctl.  That way you can write
a commit message for each patch describing first what the patch does,
and then why this is a good idea.

Neverthess, I added some random comments below.

> 
> Changes since v3:
> (1) Move restore VSESR_EL2 value logic to a helper method
> (2) In the world-switch, not save VSESR_EL2, because no one cares the
> old VSESR_EL2 value
> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
> a virtual system error
> 
> Signed-off-by: Dongjiu Geng 
> Signed-off-by: Quanming Wu 
> ---
>  Documentation/virtual/kvm/api.txt| 10 ++
>  arch/arm/include/asm/kvm_host.h  |  1 +
>  arch/arm/kvm/arm.c   |  7 +++
>  arch/arm/kvm/guest.c |  5 +
>  arch/arm64/include/asm/esr.h |  2 ++
>  arch/arm64/include/asm/kvm_emulate.h | 10 ++
>  arch/arm64/include/asm/kvm_host.h|  2 ++
>  arch/arm64/include/asm/sysreg.h  |  3 +++
>  arch/arm64/kvm/guest.c   | 14 ++
>  arch/arm64/kvm/handle_exit.c | 25 +++--
>  arch/arm64/kvm/hyp/switch.c  | 14 ++
>  include/uapi/linux/kvm.h |  8 
>  12 files changed, 95 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 3c248f7..852ac55 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3377,6 +3377,16 @@ struct kvm_ppc_resize_hpt {
>   __u32 pad;
>  };
>  
> +4.104 KVM_ARM_SEI
> +
> +Capability: KVM_EXIT_SERROR_INTR
> +Architectures: arm/arm64
> +Type: vcpu ioctl
> +Parameters: u64 (syndrome)
> +Returns: 0 in case of success
> +
> +Pend an virtual system error or asynchronous abort with user space specified.
> +

I don't understand enough about what this ioctl does by just reading
this text.  Did you mean to say something like "Record that userspace
wishes to inject a pending virtual system error to the VCPU.  Next time
the VCPU is run, th

>  5. The kvm_run structure
>  
>  
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468..566292a 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -244,6 +244,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const 
> struct kvm_one_reg *);
>  
>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>   int exception_index);
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>  
>  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>  unsigned long hyp_stack_ptr,
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 96dba7c..583294f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -987,6 +987,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   return -EFAULT;
>   return kvm_arm_vcpu_has_attr(vcpu, );
>   }
> + case KVM_ARM_SEI: {
> + u64 syndrome;
> +
> + if (copy_from_user(, argp, sizeof(syndrome)))
> + return -EFAULT;
> + return kvm_vcpu_ioctl_sei(vcpu, );
> + }
>   default:
>   return -EINVAL;
>   }
> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
> index fa6182a..72505bf 100644
> --- a/arch/arm/kvm/guest.c
> +++ b/arch/arm/kvm/guest.c
> @@ -248,6 +248,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>   return -EINVAL;
>  }
>  
> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome)
> +{
> + return 0;
> +}
> +
>  int __attribute_const__ kvm_target_cpu(void)
>  {
>   switch (read_cpuid_part()) {
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 22f9c90..d009c99 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -127,6 +127,8 @@
>  #define ESR_ELx_WFx_ISS_WFE  (UL(1) << 0)
>  #define ESR_ELx_xVC_IMM_MASK ((1UL << 16) - 1)
>  
> +#define VSESR_ELx_IDS_ISS_MASK