from:"gengdongjiu"

Re: [PATCH V11 10/10] arm/arm64: KVM: add guest SEA support

2017-02-27 Thread gengdongjiu

@@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu
*vcpu, struct kvm_run *run)

/* Check the stage-2 fault is trans. fault or write fault */
fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
-   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-   fault_status != FSC_ACCESS) {
+
+   /* The host kernel will handle the synchronous external abort. There
+* is no need to pass the error into the guest.
+*/
+   if (fault_status == FSC_EXTABT) {
+   if(handle_guest_sea((unsigned long)fault_ipa,
+   kvm_vcpu_get_hsr(vcpu))) {
+   kvm_err("Failed to handle guest SEA, FSC:
EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+   kvm_vcpu_trap_get_class(vcpu),
+   (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+   (unsigned long)kvm_vcpu_get_hsr(vcpu));
+   return -EFAULT;
+   }
+   } else if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
+  fault_status != FSC_ACCESS) {
kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
kvm_vcpu_trap_get_class(vcpu),
(unsigned long)kvm_vcpu_trap_get_fault(vcpu),



if the error is SEA and we want to inject the sea to guest OK, after
finish the handle, whether we need to directly return? instead of
continuation? as shown below:

   if (fault_status == FSC_EXTABT) {
   if(handle_guest_sea((unsigned long)fault_ipa,
   kvm_vcpu_get_hsr(vcpu))) {
   kvm_err("Failed to handle guest SEA, FSC:
EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
   kvm_vcpu_trap_get_class(vcpu),
   (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
   (unsigned long)kvm_vcpu_get_hsr(vcpu));
   return -EFAULT;
  } else
   return 1;






2017-02-24 18:42 GMT+08:00 James Morse :
> Hi Tyler,
>
> On 21/02/17 21:22, Tyler Baicar wrote:
>> Currently external aborts are unsupported by the guest abort
>> handling. Add handling for SEAs so that the host kernel reports
>> SEAs which occur in the guest kernel.
>
>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>> index e22089f..33a77509 100644
>> --- a/arch/arm/include/asm/kvm_arm.h
>> +++ b/arch/arm/include/asm/kvm_arm.h
>> @@ -187,6 +187,7 @@
>>  #define FSC_FAULT(0x04)
>>  #define FSC_ACCESS   (0x08)
>>  #define FSC_PERM (0x0c)
>> +#define FSC_EXTABT   (0x10)
>
> arm64 has ESR_ELx_FSC_EXTABT which is used in inject_abt64(), but for matching
> an external abort coming from hardware the range is wider.
>
> Looking at the ARM-ARMs 'ISS encoding for an exception from an Instruction
> Abort' in 'D7.2.27 ESR_ELx, Exception Syndrome Register (ELx)' (page D7-1954 
> of
> version 'k'...iss10775), the ten flavours of you Synchronous abort you hooked
> with do_sea() in patch 4 occupy 0x10 to 0x1f...
>
>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index a5265ed..04f1dd50 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -29,6 +29,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "trace.h"
>>
>> @@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>
>>   /* Check the stage-2 fault is trans. fault or write fault */
>>   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>
> ... kvm_vcpu_trap_get_fault_type() on both arm and arm64 masks the HSR/ESR_EL2
> with 0x3c ...
>
>
>> - if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>> - fault_status != FSC_ACCESS) {
>> +
>> + /* The host kernel will handle the synchronous external abort. There
>> +  * is no need to pass the error into the guest.
>> +  */
>> + if (fault_status == FSC_EXTABT) {
>
> ... but here we only check for 'Synchronous external abort, not on a 
> translation
> table walk'. Are the other types relevant?
>
> If so we need some helper as this range is sparse and 'all other values are
> reserved'. The aarch32 HSR format is slightly different. (G6-4411 ISS encoding
> from an exception from a Data Abort).
>
> If not, can we change patch 4 to check this type too so we don't call out to
> APEI for a fault type we know isn't relevant.
>
>
>> + if(handle_guest_sea((unsigned long)fault_ipa,
>> + kvm_vcpu_get_hsr(vcpu))) {
>> + kvm_err("Failed to handle guest SEA, FSC: EC=%#x 
>> xFSC=%#lx ESR_EL2=%#lx\n",
>> + kvm_vcpu_trap_get_class(vcpu),
>> + (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
>> + (unsigned long)kvm_vcpu_get_hsr(vcpu));
>> +

Fwd: Delivery Status Notification (Failure)

2016-09-28 Thread gengdongjiu

Hi,
  In the kernel 4.1, I am confused for the
runnable_avg_sum/avg_period/running_avg_sum,
for example below code.
Does the task runnable_avg_sum includes running_avg_sum? Does avg_period
includes the task runnable_avg_sum and the task sleep time? thank you.

static inline void __update_task_entity_contrib(struct sched_entity *se)
{
u32 contrib;

/* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */
contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.
weight);
contrib /= (se->avg.avg_period + 1);
se->avg.load_avg_contrib = scale_load(contrib);
}


static inline void __update_task_entity_utilization(struct sched_entity *se)
{
u32 contrib;

/* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */
contrib = se->avg.running_avg_sum * scale_load_down(SCHED_LOAD_
SCALE);
contrib /= (se->avg.avg_period + 1);
se->avg.utilization_avg_contrib = scale_load(contrib);
}

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-22 Thread gengdongjiu

Hi James,
  Thank you very much for your detailed comment and answer.

On 2017/3/21 21:10, James Morse wrote:
> Hi,
> 
> On 21/03/17 06:32, gengdongjiu wrote:
>> On 2017/3/20 23:08, James Morse wrote:
>>> On 20/03/17 13:58, Marc Zyngier wrote:
>>>> On 20/03/17 12:28, gengdongjiu wrote:
>>>>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>>>> the guest OS
> 
> (I've juggled the order of your replies:)
> 
>> so for both SEA and SEI, do you prefer to below steps?
>> EL0/EL1 SEI/SEA ---> EL3 firmware first handle --> EL2 hypervisor notify 
>> >
> the Qemu to inject SEI/SEA-->Qemu call KVM API to inject SEA/SEI>KVM >
> inject SEA/SEI to guest OS
> 
> Yes, to expand your EL2 hypervisor notify Qemu step:
> 1 The host should call its APEI code to parse the CPER records.
> 2 User space processes are then notified via SIGBUS (or for rasdaemon, trace
>   points).
> 3 Qemu can take the address delivered via SIGBUS and generate CPER records for
>   the guest. It knows how to convert host addresses to guest IPAs, and it 
> knows
>   where in guest memory to write the CPER records.
> 4 Qemu can then notify the guest via whatever mechanism it advertised via the
>   HEST/GHES table. It might not be the same mechanism that the host received
>   the notification through.
> 
> Steps 1 and 2 are the same even if no guest is running, so we don't have to 
> add
> any special case for KVM. This is existing code that x86 uses.
> We can test the Qemu parts without any firmware support and the APEI path in 
> the
> host and guest is the same.
   here do you mean map host APEI table to guest for steps 1 and 2 test? so 
that the APEI path in the
  host and guest is the same.

> 
> 
>>> Is anyone from Huawei looking at adding RAS support for Qemu?
>>  yes, I am looking at Qemu and want to add RAS support.
> 
> Great, support in Qemu is one of the missing pieces. On x86 it looks like it
> emulates machine-check-exceptions, which is how x86 did this before
> firmware-first and APEI became the standard.
> 
> 
>>  do you mean let Qemu inject both the SEA and SEI?
> 
> To do the notification, yes. It needs to happen after the CPER records have 
> been
> written, and the mechanism and CPER memory location need to match what the 
> guest
> was told via the HEST/GHES table.
> 
> If Qemu didn't tell the guest about firmware-first, it can still deliver the
> guest an SError Interrupt.
> 
> 
> SEA should be possible to do with the KVM_SET_REG API, GPIO/GSIV and the other
> kind of interrupts can use irqfd. For SEI we may need to add an API call to 
> KVM
> to let it pend SError with a specific ESR.
> 
> 
> 
>>> How does this work with firmware first?
> 
>> when the Guest OS triggers an SEI, it will firstly trap to EL3 firmware, El3 
>> firmware records the error
>> info to the APEI table, 
> 
> These are CPER records in a memory area pointed to by one of HEST's GHES 
> entries?
> 
> 
>> then copy the ESR_EL3 ELR_EL3 to ESR_EL2 ELR_EL2 and transfers control to the
>> hypervisor, hypervisor delegates the error exception to EL1 guest
> 
> This is a problem, just because the error occurred while the guest was running
> doesn't mean we should deliver it directly to the guest. Some of these errors
> will be fatal for the CPU and the host should try and power it off to contain
yes, some of error does not need to deliver to guest OS directly. for example 
if the error is guest kernel fault error,
hypervisor can directly power off the whole guest OS

> the fault. For example: CPER's 'micro-architectural error', should the guest
> power-off the vCPU? All that really does is return to the hypervisor, the 
> error
for this example, I think it is better hypervisor directly close the whole 
guest OS, instead of
guest power-off the vCPU.

> hasn't been contained.


> 
> Firmware should handle the error first, then the host, finally the guest via 
> Qemu.
> 
> 
>> OS by setting HCR_EL2.VSE to 1 and pass the virtual SEI syndrome through 
>> vsesr_el2. 
>> The EL1 guest OS check the DISR_EL1 syndrome information to decide to
>> terminate the application, or do some other recovery action. because the 
>> HCR_EL2.AMO is set, so in fact, read
>> DISR_EL1, it returns the VDISR_EL2. and VDISR_EL2 is loaded from VSESR_EL2, 
>>

Re: [PATCH] arm/arm64: KVM: send SIGBUS error to qemu

2017-03-24 Thread gengdongjiu

Hi James,
   thanks for your review.

On 2017/3/23 23:06, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 23/03/17 13:01, Dongjiu Geng wrote:
>> when the pfn is KVM_PFN_ERR_HWPOISON, it indicates to send
>> SIGBUS signal from KVM's fault-handling code to qemu, qemu
>> can handle this signal according to the fault address.
> 
> I'm afraid I beat you to it on this one:
> https://www.spinics.net/lists/arm-kernel/msg568919.html
> 
> (Are you the same gengdj who ask me to post that patch?:
>  https://lkml.org/lkml/2017/3/5/187 )

Oh, yes, it is me. recently I do not check my gmail and think you are not reply 
mail.
it is great that you upstream this patch.

> 
> We don't need upstream KVM to do this until either arm or arm64 has
> ARCH_SUPPORTS_MEMORY_FAILURE. Punit and Tyler have discovered problems with 
> the
> way arm64's hugepage and hwpoison interact:
> https://www.spinics.net/lists/arm-kernel/msg568995.html
  ok, thanks James. do you know when the arm or arm64 will have 
ARCH_SUPPORTS_MEMORY_FAILURE?
> 
> 
> Some comments on the differences:
> 
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 962616fd4ddd..1307ec400de3 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -1237,6 +1237,20 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
>> *vcpu, kvm_pfn_t pfn,
>>  __coherent_cache_guest_page(vcpu, pfn, size);
>>  }
>>  
>> +static void kvm_send_hwpoison_signal(unsigned long address,
>> +struct task_struct *tsk)
>> +{
>> +siginfo_t info;
>> +
>> +info.si_signo   = SIGBUS;
>> +info.si_errno   = 0;
>> +info.si_code= BUS_MCEERR_AR;
>> +info.si_addr= (void __user *)address;
>> +info.si_addr_lsb = PAGE_SHIFT;
> 
> Any version of this patch should handle hugepage for the sizes KVM uses in its
> stage2 mappings. By just passing PAGE_SHIFT you let the guest fault for each
> page that makes up the hugepage.
> 
> 
>> +
>> +send_sig_info(SIGBUS, , tsk);
>> +}
>> +
>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>struct kvm_memory_slot *memslot, unsigned long hva,
>>unsigned long fault_status)
>> @@ -1309,6 +1323,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  if (is_error_noslot_pfn(pfn))
>>  return -EFAULT;
>>  
>> +if (is_error_hwpoison_pfn(pfn)) {
>> +kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn),
>> +current);
>> +return -EFAULT;
> 
> This will return -EFAULT from the KVM_RUN ioctl(). Is Qemu expected to know it
> should try again? This is indistinguishable from the is_error_noslot_pfn() 
> error
> above.
> 
> x86 returns 0 from this path, kvm_handle_bad_page() in arch/x86/kvm/mmu.c as 
> the
> SIGBUS should arrive first. If the SIGBUS is handled the error has been 
> resolved
> and Qemu can call KVM_RUN again. Returning an error and sending SIGBUS 
> suggests
> there are two problems.
  thanks for that. I think your Statement is reasonable.
> 
> 
>> +}
>> +
>>  if (kvm_is_device_pfn(pfn)) {
>>  mem_type = PAGE_S2_DEVICE;
>>  flags |= KVM_S2PTE_FLAG_IS_IOMAP;
> 
> 
> 
> Thanks,
> 
> James
> 
> 
> .
>

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread gengdongjiu

Hi all,

On 2017/3/28 19:54, Achin Gupta wrote:
> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>>> Hi Christoffer,
>>>
>>> (CC: Leif and Achin who know more about how UEFI fits into this picture)
>>>
>>> On 21/03/17 19:39, Christoffer Dall wrote:
>>>> On Tue, Mar 21, 2017 at 07:11:44PM +, James Morse wrote:
>>>>> On 21/03/17 11:34, Christoffer Dall wrote:
>>>>>> On Tue, Mar 21, 2017 at 02:32:29PM +0800, gengdongjiu wrote:
>>>>>>> On 2017/3/20 23:08, James Morse wrote:
>>>>>>>>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>>>>>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>>>>>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>>>>>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>>>>>>>>> the guest OS
>>>>>>>>
>>>>>>>> How does this work with firmware first?
>>>>>>>
>>>>>>> I explained it in previous mail about the work flow.
>>>>>>
>>>>>> When delivering and reporting SEIs to the VM, should this happen
>>>>>> directly to the OS running in the VM, or to the guest firmware (e.g.
>>>>>> UEFI) running in the VM as well?
>>>>>
>>>>> 'firmware first' is the ACPI specs name for x86's BIOS or management-mode
>>>>> handling the error. On arm64 we have multiple things called firmware, so 
>>>>> the
>>>>> name might be more confusing than helpful.
>>>>>
>>>>> As far as I understand it, firmware here refers to the secure-world and 
>>>>> EL3.
>>>>> Something like ATF can use SCR_EL3.EA to claim SErrors and external 
>>>>> aborts,
>>>>> routing them to EL3 where secure platform specific firmware generates 
>>>>> CPER records.
>>>>> For a guest, Qemu takes the role of this EL3-firmware.
> 
> +1
> 
>>>>>
>>>> Thanks for the clarification.  So UEFI in the VM would not be involved
>>>> in this at all?
>>>
>>> On the host, part of UEFI is involved to generate the CPER records.
>>> In a guest?, I don't know.
>>> Qemu could generate the records, or drive some other component to do it.
>>
>> I think I am beginning to understand this a bit.  Since the guet UEFI
>> instance is specifically built for the machine it runs on, QEMU's virt
>> machine in this case, they could simply agree (by some contract) to
>> place the records at some specific location in memory, and if the guest
>> kernel asks its guest UEFI for that location, things should just work by
>> having logic in QEMU to process error reports and populate guest memory.
>>
>> Is this how others see the world too?
> 
> I think so!
> 
> AFAIU, the memory where CPERs will reside should be specified in a GHES entry 
> in
> the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
> creates a
> HEST for the guest Kernel?
> 
> If so, then the question is how the guest UEFI finds out where QEMU (acting as
> EL3 firmware) will populate the CPERs. This could either be a contract between
> the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask 
> QEMU
> where the memory is.

whether invoke the guest UEFI will be complex? not see the advantage. it seems 
x86 Qemu directly generate the ACPI table, but I am not sure, we are checking 
the qemu logical.
let Qemu generate CPER record may be clear.

when qemu can take the address delivered via SIGBUS and generate CPER records 
for
the guest, seems the CPER table is simple because there is only one address 
passed by SIGBUS.



> 
> This is the way I expect it to work at the EL3/EL2 boundary. So I am
> extrapolating it to the guest/hypervisor boundary. Do shout if I am missing
> anything.
> 
> hth,
> Achin
> 
>>
>>>
>>> Leif and Achin are the people with the UEFI/bigger picture.
>>>
>>>
>>
>> Thanks!
>> -Christoffer
> 
> [1] 
> http://infocenter.arm.com/help/topic/com.arm.doc.den0060a/DEN0060A_ARM_MM_Interface_Specification.pdf
> 
> .
>

Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread gengdongjiu

Hi,

On 2017/3/22 6:47, Tyler Baicar wrote:
> + fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +
> + /* The host kernel will handle the synchronous external abort. There
> +  * is no need to pass the error into the guest.
> +  */
> + if (is_abort_synchronous(fault_status))
> + sea_status = handle_guest_sea((unsigned long)fault_ipa,
> + kvm_vcpu_get_hsr(vcpu));
>  
>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> - if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
> + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
>   kvm_inject_vabt(vcpu);
>   return 1;
>   }
   After the host kernel correctly handle the synchronous external abort, 
the sea_status
   will return 0, so the code logical will be continue go-no, whether it is 
better directly return
   after correctly handle the SEA? such as below.

if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);
return 1;
} else
return 1;

>  
> - fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> -
>   trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
> kvm_vcpu_get_hfar(vcpu), fault_ipa);
>  
> - /* Check the stage-2 fault is trans. fault or write fault */
> - fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread gengdongjiu

Hi Achin,
  Thanks for your mail and  answer.

2017-03-29 18:36 GMT+08:00, Achin Gupta <achin.gu...@arm.com>:
> Hi gengdongjiu,
>
> On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
>>
>> Hi Laszlo/Biesheuvel/Qemu developer,
>>
>>Now I encounter a issue and want to consult with you in ARM64 platform，
>> as described below:
>>
>>when guest OS happen synchronous or asynchronous abort, kvm needs to
>> send the error address to Qemu or UEFI through sigbus to dynamically
>> generate APEI table. from my investigation, there are two ways:
>>
>>(1) Qemu get the error address, and generate the APEI table, then
>> notify UEFI to know this generation, then inject abort error to guest OS,
>> guest OS read the APEI table.
>>(2) Qemu get the error address, and let UEFI to generate the APEI
>> table, then inject abort error to guest OS, guest OS read the APEI table.

The description may be not precise, I update it

   (1) Qemu get the error address, and generate the CPER table, then
notify guest UEFI to place this CPER table to Guest OS memory, then
Qemu let KVM inject abort error to guest OS, guest OS read the CPER
table.

 (2) Qemu get the error address, and let guest UEFI to directly
generate the CPER
table and place this table to the guest OS memory, not let Qemu gerate
it. then KVM inject abort error to guest OS, guest OS read the CPER
table.

>
> Just being pedantic! I don't think we are talking about creating the APEI
> table
> dynamically here. The issue is: Once KVM has received an error that is
> destined
> for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the
> error
> into the guest OS, a CPER (Common Platform Error Record) has to be
> generated
> corresponding to the error source (GHES corresponding to memory subsystem,
> processor etc) to allow the guest OS to do anything meaningful with the
> error. So who should create the CPER is the question.
>
> At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error
> arrives
> at EL3 and secure firmware (at EL3 or a lower secure exception level) is
> responsible for creating the CPER. ARM is experimenting with using a
> Standalone
> MM EDK2 image in the secure world to do the CPER creation. This will avoid
> adding the same code in ARM TF in EL3 (better for security). The error will
> then
> be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM
> Trusted
> Firmware.
>
> Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
> interface (as discussed with Christoffer below). So it should generate the
> CPER
> before injecting the error.
>
> This is corresponds to (1) above apart from notifying UEFI (I am assuming
> you
> mean guest UEFI). At this time, the guest OS already knows where to pick up
> the
> CPER from through the HEST. Qemu has to create the CPER and populate its
> address
> at the address exported in the HEST. Guest UEFI should not be involved in
> this
> flow. Its job was to create the HEST at boot and that has been done by this
> stage.

 Sorry,  As I understand it, after Qemu generate the CPER table, it
should pass the CPER table to the guest UEFI, then Guest UEFI  place
this CPER table to the guest OS memory. In this flow, the Guest UEFI
should be involved, else the Guest OS can not see the CPER table.

>
> Qemu folk will be able to add but it looks like support for CPER generation
> will
> need to be added to Qemu. We need to resolve this.
>
> Do shout if I am missing anything above.
>
> cheers,
> Achin
>
>
>>
>>
>>Do you think which modules generates the APEI table is better? UEFI or
>> Qemu?
>>
>>
>>
>>
>> On 2017/3/28 21:40, James Morse wrote:
>> > Hi gengdongjiu,
>> >
>> > On 28/03/17 13:16, gengdongjiu wrote:
>> >> On 2017/3/28 19:54, Achin Gupta wrote:
>> >>> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>> >>>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>> >>>>> On the host, part of UEFI is involved to generate the CPER records.
>> >>>>> In a guest?, I don't know.
>> >>>>> Qemu could generate the records, or drive some other component to do
>> >>>>> it.
>> >>>>
>> >>>> I think I am beginning to understand this a bit.  Since the guet
>> >>>> UEFI
>> >>>> instance is specifically built for the machine it runs on, QEMU's
>> >>>> virt
>> >>>> machine in this case, they could simply agree (by some contract) to
>> >>>

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread gengdongjiu

Hi Christoffer/Laszlo,

On 2017/3/30 1:44, Christoffer Dall wrote:
> On Wed, Mar 29, 2017 at 05:37:49PM +0200, Laszlo Ersek wrote:
>> On 03/29/17 16:48, Christoffer Dall wrote:
>>> On Wed, Mar 29, 2017 at 10:36:51PM +0800, gengdongjiu wrote:
>>>> 2017-03-29 18:36 GMT+08:00, Achin Gupta <achin.gu...@arm.com>:
>>
>>>>> Qemu is essentially fulfilling the role of secure firmware at the
>>>>> EL2/EL1 interface (as discussed with Christoffer below). So it
>>>>> should generate the CPER before injecting the error.
>>>>>
>>>>> This is corresponds to (1) above apart from notifying UEFI (I am
>>>>> assuming you mean guest UEFI). At this time, the guest OS already
>>>>> knows where to pick up the CPER from through the HEST. Qemu has
>>>>> to create the CPER and populate its address at the address
>>>>> exported in the HEST. Guest UEFI should not be involved in this 
>>>>> flow. Its job was to create the HEST at boot and that has been
>>>>> done by this stage.
>>>>
>>>> Sorry,  As I understand it, after Qemu generate the CPER table, it
>>>> should pass the CPER table to the guest UEFI, then Guest UEFI  place
>>>> this CPER table to the guest OS memory. In this flow, the Guest UEFI
>>>> should be involved, else the Guest OS can not see the CPER table.
>>>>
>>>
>>> I think you need to explain the "pass the CPER table to the guest UEFI"
>>> concept in terms of what really happens, step by step, and when you say
>>> "then Guest UEFI place the CPER table to the guest OS memory", I'm
>>> curious who is running what code on the hardware when doing that.
>>
>> I strongly suggest to keep the guest firmware's runtime involvement to
>> zero. Two reasons:
>>
>> (1) As you explained above (... which I conveniently snipped), when you
>> inject an interrupt to the guest, the handler registered for that
>> interrupt will come from the guest kernel.
>>
>> The only exception to this is when the platform provides a type of
>> interrupt whose handler can be registered and then locked down by the
>> firmware. On x86, this is the SMI.
>>
>> In practice though,
>> - in OVMF (x86), we only do synchronous (software-initiated) SMIs (for
>> privileged UEFI varstore access),
>> - and in ArmVirtQemu (ARM / aarch64), none of the management mode stuff
>> exists at all.
>>
>> I understand that the Platform Init 1.5 (or 1.6?) spec abstracted away
>> the MM (management mode) protocols from Intel SMM, but at this point
>> there is zero code in ArmVirtQemu for that. (And I'm unsure how much of
>> any eligible underlying hw emulation exists in QEMU.)
>>
>> So you can't get the guest firmware to react to the injected interrupt
>> without the guest OS coming between first.
>>
>> (2) Achin's description matches really-really closely what is possible,
>> and what should be done with QEMU, ArmVirtQemu, and the guest kernel.
>>
>> In any solution for this feature, the firmware has to reserve some
>> memory from the OS at boot. The current facilities we have enable this.
>> As I described previously, the ACPI linker/loader actions can be mapped
>> more or less 1:1 to Achin's design. From a practical perspective, you
>> really want to keep the guest firmware as dumb as possible (meaning: as
>> generic as possible), and keep the ACPI specifics to the QEMU and the
>> guest kernel sides.
>>
>> The error serialization actions -- the co-operation between guest kernel
>> and QEMU on the special memory areas -- that were mentioned earlier by
>> Michael and Punit look like a complication. But, IMO, they don't differ
>> from any other device emulation -- DMA actions in particular -- that
>> QEMU already does. Device models are what QEMU *does*. Read the command
>> block that the guest driver placed in guest memory, parse it, sanity
>> check it, verify it, execute it, write back the status code, inject an
>> interrupt (and/or let any polling guest driver notice it "soon after" --
>> use barriers as necessary).
>>
>> Thus, I suggest to rely on the generic ACPI linker/loader interface
>> (between QEMU and guest firmware) *only* to make the firmware lay out
>> stuff (= reserve buffers, set up pointers, install QEMU's ACPI tables)
>> *at boot*. Then, at runtime, let the guest kernel and QEMU (the "device
>> model") talk to each other directly. Keep runtime firmware involvement
>> to zero.
>>
>> You *really* don't want to debug three components at runtime, when you
>> can solve the thing with two. (Two components whose build systems won't
>> drive you mad, I should add.)
>>
>> IMO, Achin's design nailed it. We can do that.
>>
> I completely agree.
> 
> My questions were intended for gengdongjiu to clarify his/her 
> and clear up any misunderstandings between what Achin suggested and what
> he/she wrote.

  Achin and Laszlo's understanding are right. thanks for your suggestion.

> 
> Thanks,
> -Christoffer
> 
> .
>

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-29 Thread gengdongjiu


Hi Laszlo/Biesheuvel/Qemu developer,

   Now I encounter a issue and want to consult with you in ARM64 platform， as 
described below:

   when guest OS happen synchronous or asynchronous abort, kvm needs to send 
the error address to Qemu or UEFI through sigbus to dynamically generate APEI 
table. from my investigation, there are two ways:

   (1) Qemu get the error address, and generate the APEI table, then notify 
UEFI to know this generation, then inject abort error to guest OS, guest OS 
read the APEI table.
   (2) Qemu get the error address, and let UEFI to generate the APEI table, 
then inject abort error to guest OS, guest OS read the APEI table.


   Do you think which modules generates the APEI table is better? UEFI or Qemu?




On 2017/3/28 21:40, James Morse wrote:
> Hi gengdongjiu,
> 
> On 28/03/17 13:16, gengdongjiu wrote:
>> On 2017/3/28 19:54, Achin Gupta wrote:
>>> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>>>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>>>>> On the host, part of UEFI is involved to generate the CPER records.
>>>>> In a guest?, I don't know.
>>>>> Qemu could generate the records, or drive some other component to do it.
>>>>
>>>> I think I am beginning to understand this a bit.  Since the guet UEFI
>>>> instance is specifically built for the machine it runs on, QEMU's virt
>>>> machine in this case, they could simply agree (by some contract) to
>>>> place the records at some specific location in memory, and if the guest
>>>> kernel asks its guest UEFI for that location, things should just work by
>>>> having logic in QEMU to process error reports and populate guest memory.
>>>>
>>>> Is this how others see the world too?
>>>
>>> I think so!
>>>
>>> AFAIU, the memory where CPERs will reside should be specified in a GHES 
>>> entry in
>>> the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
>>> creates a
>>> HEST for the guest Kernel?
>>>
>>> If so, then the question is how the guest UEFI finds out where QEMU (acting 
>>> as
>>> EL3 firmware) will populate the CPERs. This could either be a contract 
>>> between
>>> the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask 
>>> QEMU
>>> where the memory is.
>>
>> whether invoke the guest UEFI will be complex? not see the advantage. it 
>> seems x86 Qemu
>> directly generate the ACPI table, but I am not sure, we are checking the qemu
> logical.
>> let Qemu generate CPER record may be clear.
> 
> At boot UEFI in the guest will need to make sure the areas of memory that may 
> be
> used for CPER records are reserved. Whether UEFI or Qemu decides where these 
> are
> needs deciding, (but probably not here)...
> 
> At runtime, when an error has occurred, I agree it would be simpler (fewer
> components involved) if Qemu generates the CPER records. But if UEFI made the
> memory choice above they need to interact and it gets complicated again. The
> CPER records are defined in the UEFI spec, so I would expect UEFI to contain
> code to generate/parse them.
> 
> 
> Thanks,
> 
> James
> 
> 
> .
>

Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-27 Thread gengdongjiu

Hi Tyler,

I have a question for below code.

On 2017/3/25 0:01, Christoffer Dall wrote:
>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> - if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
> + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
>   kvm_inject_vabt(vcpu);
when it is SEA synchronized abort, why here inject a asynchronous abort through 
kvm_inject_vabt(vcpu) instead of synchronized abort? seem the original kvm 
source code is also do that, so I am confused about that.
thanks .

>   return 1;
>   }

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-21 Thread gengdongjiu

Hi Marc,
  Thank you very much for your review.


On 2017/3/20 21:58, Marc Zyngier wrote:
> On 20/03/17 12:28, gengdongjiu wrote:
>>
>>
>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>> Please include James Morse on anything RAS related, as he's already
>>> looking at related patches.
>>>
>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>> In the RAS implementation, hardware pass the virtual SEI
>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>> the guest OS
>>>>
>>>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>>> Signed-off-by: Quanming wu <wuquanm...@huawei.com>
>>>> ---
>>>>  arch/arm64/Kconfig   |  8 
>>>>  arch/arm64/include/asm/esr.h |  1 +
>>>>  arch/arm64/include/asm/kvm_emulate.h | 12 
>>>>  arch/arm64/include/asm/kvm_host.h|  4 
>>>>  arch/arm64/kvm/hyp/switch.c  | 15 ++-
>>>>  arch/arm64/kvm/inject_fault.c| 10 ++
>>>>  6 files changed, 49 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 8c7c244247b6..ea62170a3b75 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -908,6 +908,14 @@ endmenu
>>>>  
>>>>  menu "ARMv8.2 architectural features"
>>>>  
>>>> +config HAS_RAS_EXTENSION
>>>> +  bool "Support arm64 RAS extension"
>>>> +  default n
>>>> +  help
>>>> +Reliability, Availability, Serviceability(RAS; part of the ARMv8.2 
>>>> Extensions).
>>>> +
>>>> +Selecting this option OS will try to recover the error that RAS 
>>>> hardware node detected.
>>>> +
>>>
>>> As this is an architectural extension, this should be controlled by the
>>> CPU feature mechanism, and not be chosen at compile time. What you have
>>> here will break horribly when booted on a CPU that doesn't implement RAS.
>>
>> thanks very much for your review, yes, it is, you are right.
>>
>>>
>>>>  config ARM64_UAO
>>>>bool "Enable support for User Access Override (UAO)"
>>>>default y
>>>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>>>> index d14c478976d0..e38d32b2bdad 100644
>>>> --- a/arch/arm64/include/asm/esr.h
>>>> +++ b/arch/arm64/include/asm/esr.h
>>>> @@ -111,6 +111,7 @@
>>>>  #define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
>>>>  #define ESR_ELx_WFx_ISS_WFE   (UL(1) << 0)
>>>>  #define ESR_ELx_xVC_IMM_MASK  ((1UL << 16) - 1)
>>>> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
>>>>  
>>>>  /* ESR value templates for specific events */
>>>>  
>>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>>>> b/arch/arm64/include/asm/kvm_emulate.h
>>>> index f5ea0ba70f07..20d4da7f5dce 100644
>>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>>> @@ -148,6 +148,18 @@ static inline u32 kvm_vcpu_get_hsr(const struct 
>>>> kvm_vcpu *vcpu)
>>>>return vcpu->arch.fault.esr_el2;
>>>>  }
>>>>  
>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>>>> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
>>>> +{
>>>> +  return vcpu->arch.fault.vsesr_el2;
>>>> +}
>>>> +
>>>> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned 
>>>> long val)
>>>> +{
>>>> +  vcpu->arch.fault.vsesr_el2 = val;
>>>> +}
>>>> +#endif
>>>> +
>>>>  static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>>>>  {
>>>>u32 esr = kvm_vcpu_get_hsr(vcpu);
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index e7705e7bb07b..f9e3bb57c461 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -83,6 +83,10 @@ struct kvm_mmu_memory_cache {
>>>>  };
>>>>  
>>>>  struct kvm_vcpu_faul

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-21 Thread gengdongjiu



On 2017/3/20 23:08, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 20/03/17 13:58, Marc Zyngier wrote:
>> On 20/03/17 12:28, gengdongjiu wrote:
>>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>>> Please include James Morse on anything RAS related, as he's already
>>>> looking at related patches.
> 
> (Thanks Marc,)
> 
>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>> the guest OS
> 
> How does this work with firmware first?

I explained it in previous mail about the work flow.

when the Guest OS triggers an SEI, it will firstly trap to EL3 firmware, El3 
firmware records the error
info to the APEI table, then copy the ESR_EL3 ELR_EL3 to ESR_EL2 ELR_EL2 and 
transfers control to the
hypervisor, hypervisor delegates the error exception to EL1 guest OS by setting 
HCR_EL2.VSE to 1 and pass the
virtual SEI syndrome through vsesr_el2. The EL1 guest OS check the DISR_EL1 
syndrome information to decide to
terminate the application, or do some other recovery action. because the 
HCR_EL2.AMO is set, so in fact, read
DISR_EL1, it returns the VDISR_EL2. and VDISR_EL2 is loaded from VSESR_EL2, so 
here I pass the virtual SEI
syndrome vsesr_el2.

> If we took a Physical SError Interrupt the CPER records are in the hosts 
> memory.
> To deliver a RAS event to the guest something needs to generate CPER records 
> and
> put them in the guest memory. Only Qemu knows where these memory regions are.
> 
> Put another way, what is the guest expected to do with this SError interrupt?
No, we do not only panic,if it is EL0 application SEI. the OS error recovery
agent will terminate the EL0 application to isolate the error; If it is EL1 
guest
OS SError, guest OS can see whether it can recover. if the error was in a 
read-only file cache buffer, guest OS
can invalidate the page and reload the data from disk.

if all of the above are failed, OS will panic.


> The only choice is panic(). We should send this via Qemu so that we can add
> proper guest RAS support later. Once Qemu has written the CPER records into
> guest memory, it can notify the guest.
> 
> Is anyone from Huawei looking at adding RAS support for Qemu?
 yes, I am looking at Qemu and want to add RAS support.
 do you mean let Qemu inject both the SEA and SEI?

> 
> 
> It looks like we should save/restore VSESR_EL2 as part of the guest CPU state,
> but this needs doing with the cpufeature framework so that the single-image
> kernel works on platforms with and without these features.
 yes, you are right, we will follow cpufeature framework.


> 
> Xie XiuQi's series for SEI also touches the cpufeature framework.
> 
> 
>>>>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>>>>> index aede1658aeda..770a153fb6ba 100644
>>>>> --- a/arch/arm64/kvm/hyp/switch.c
>>>>> +++ b/arch/arm64/kvm/hyp/switch.c
>>>>> @@ -86,6 +86,13 @@ static void __hyp_text __activate_traps(struct 
>>>>> kvm_vcpu *vcpu)
>>>>>   isb();
>>>>>   }
>>>>>   write_sysreg(val, hcr_el2);
>>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>>>>> + /* If virtual System Error or Asynchronous Abort is pending. set
>>>>> +  * the virtual exception syndrome information
>>>>> +  */
>>>>> + if (vcpu->arch.hcr_el2 & HCR_VSE)
> 
>>>>> + write_sysreg(vcpu->arch.fault.vsesr_el2, vsesr_el2);
> 
> This won't build with versions of binutils that don't recognise vsesr_el2.
> Is there another patch out there that adds a sysreg definition for vsesr_el2?
> 
> 
>>>>> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
>>>>> index da6a8cfa54a0..08a13dfe28a8 100644
>>>>> --- a/arch/arm64/kvm/inject_fault.c
>>>>> +++ b/arch/arm64/kvm/inject_fault.c
>>>>> @@ -242,4 +242,14 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
>>>>>  void kvm_inject_vabt(struct kvm_vcpu *vcpu)
>>>>>  {
>>>>>   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
>>>>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>>>>> + /* If virtual System Error or Asynchronous Abort is set. set
>>>>> +  * the virtual exception syndrome information
>>>>> +  */
>>>>> + kvm_vcpu_set_vsesr(vcpu, ((kvm_vcpu_get_vsesr(vcpu)
>>>>> + & (~VSESR_ELx_IDS_ISS_MASK))
>>>

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-20 Thread gengdongjiu



On 2017/3/20 19:24, Marc Zyngier wrote:
> Please include James Morse on anything RAS related, as he's already
> looking at related patches.
> 
> On 20/03/17 07:55, Dongjiu Geng wrote:
>> In the RAS implementation, hardware pass the virtual SEI
>> syndrome information through the VSESR_EL2, so set the virtual
>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>> the guest OS
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Quanming wu 
>> ---
>>  arch/arm64/Kconfig   |  8 
>>  arch/arm64/include/asm/esr.h |  1 +
>>  arch/arm64/include/asm/kvm_emulate.h | 12 
>>  arch/arm64/include/asm/kvm_host.h|  4 
>>  arch/arm64/kvm/hyp/switch.c  | 15 ++-
>>  arch/arm64/kvm/inject_fault.c| 10 ++
>>  6 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 8c7c244247b6..ea62170a3b75 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -908,6 +908,14 @@ endmenu
>>  
>>  menu "ARMv8.2 architectural features"
>>  
>> +config HAS_RAS_EXTENSION
>> +bool "Support arm64 RAS extension"
>> +default n
>> +help
>> +  Reliability, Availability, Serviceability(RAS; part of the ARMv8.2 
>> Extensions).
>> +
>> +  Selecting this option OS will try to recover the error that RAS 
>> hardware node detected.
>> +
> 
> As this is an architectural extension, this should be controlled by the
> CPU feature mechanism, and not be chosen at compile time. What you have
> here will break horribly when booted on a CPU that doesn't implement RAS.

thanks very much for your review, yes, it is, you are right.

> 
>>  config ARM64_UAO
>>  bool "Enable support for User Access Override (UAO)"
>>  default y
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index d14c478976d0..e38d32b2bdad 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -111,6 +111,7 @@
>>  #define ESR_ELx_COND_MASK   (UL(0xF) << ESR_ELx_COND_SHIFT)
>>  #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
>>  #define ESR_ELx_xVC_IMM_MASK((1UL << 16) - 1)
>> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
>>  
>>  /* ESR value templates for specific events */
>>  
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index f5ea0ba70f07..20d4da7f5dce 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -148,6 +148,18 @@ static inline u32 kvm_vcpu_get_hsr(const struct 
>> kvm_vcpu *vcpu)
>>  return vcpu->arch.fault.esr_el2;
>>  }
>>  
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.fault.vsesr_el2;
>> +}
>> +
>> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long 
>> val)
>> +{
>> +vcpu->arch.fault.vsesr_el2 = val;
>> +}
>> +#endif
>> +
>>  static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>>  {
>>  u32 esr = kvm_vcpu_get_hsr(vcpu);
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index e7705e7bb07b..f9e3bb57c461 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -83,6 +83,10 @@ struct kvm_mmu_memory_cache {
>>  };
>>  
>>  struct kvm_vcpu_fault_info {
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +/* Virtual SError Exception Syndrome Register */
>> +u32 vsesr_el2;
>> +#endif
>>  u32 esr_el2;/* Hyp Syndrom Register */
>>  u64 far_el2;/* Hyp Fault Address Register */
>>  u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index aede1658aeda..770a153fb6ba 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -86,6 +86,13 @@ static void __hyp_text __activate_traps(struct kvm_vcpu 
>> *vcpu)
>>  isb();
>>  }
>>  write_sysreg(val, hcr_el2);
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +/* If virtual System Error or Asynchronous Abort is pending. set
>> + * the virtual exception syndrome information
>> + */
>> +if (vcpu->arch.hcr_el2 & HCR_VSE)
>> +write_sysreg(vcpu->arch.fault.vsesr_el2, vsesr_el2);
>> +#endif
>>  /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
>>  write_sysreg(1 << 15, hstr_el2);
>>  /*
>> @@ -139,8 +146,14 @@ static void __hyp_text __deactivate_traps(struct 
>> kvm_vcpu *vcpu)
>>   * the crucial bit is "On taking a vSError interrupt,
>>   * HCR_EL2.VSE is cleared to 0."
>>   */
>> -if (vcpu->arch.hcr_el2 & HCR_VSE)
>> +if (vcpu->arch.hcr_el2 & HCR_VSE) {
>>  vcpu->arch.hcr_el2 = read_sysreg(hcr_el2);
>> +#ifdef CONFIG_HAS_RAS_EXTENSION
>> +/* set vsesr_el2[24:0] with

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-21 Thread gengdongjiu

Hi kbuild test robot,

  Thank you.
  The build error is due to "vsesr_el2" is armv8.2 register, I will change 
"vsesr_el2" to sysreg usage


On 2017/3/21 21:51, kbuild test robot wrote:
> Hi Dongjiu,
> 
> [auto build test ERROR on arm64/for-next/core]
> [also build test ERROR on v4.11-rc3 next-20170321]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Dongjiu-Geng/kvm-pass-the-virtual-SEI-syndrome-to-guest-OS/20170321-152433
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
> for-next/core
> config: arm64-allmodconfig (attached as .config)
> compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm64 
> 
> All errors (new ones prefixed by >>):
> 
>/tmp/ccWmLqCE.s: Assembler messages:
>>> /tmp/ccWmLqCE.s:677: Error: selected processor does not support system 
>>> register name 'vsesr_el2'
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
>

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-04-06 Thread gengdongjiu

Dear, Laszlo
   Thanks for your detailed explanation.

On 2017/3/29 19:58, Laszlo Ersek wrote:
> (This ought to be one of the longest address lists I've ever seen :)
> Thanks for the CC. I'm glad Shannon is already on the CC list. For good
> measure, I'm adding MST and Igor.)
> 
> On 03/29/17 12:36, Achin Gupta wrote:
>> Hi gengdongjiu,
>>
>> On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
>>>
>>> Hi Laszlo/Biesheuvel/Qemu developer,
>>>
>>>Now I encounter a issue and want to consult with you in ARM64 platform， 
>>> as described below:
>>>
>>> when guest OS happen synchronous or asynchronous abort, kvm needs
>>> to send the error address to Qemu or UEFI through sigbus to
>>> dynamically generate APEI table. from my investigation, there are
>>> two ways:
>>>
>>> (1) Qemu get the error address, and generate the APEI table, then
>>> notify UEFI to know this generation, then inject abort error to
>>> guest OS, guest OS read the APEI table.
>>> (2) Qemu get the error address, and let UEFI to generate the APEI
>>> table, then inject abort error to guest OS, guest OS read the APEI
>>> table.
>>
>> Just being pedantic! I don't think we are talking about creating the APEI 
>> table
>> dynamically here. The issue is: Once KVM has received an error that is 
>> destined
>> for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the 
>> error
>> into the guest OS, a CPER (Common Platform Error Record) has to be generated
>> corresponding to the error source (GHES corresponding to memory subsystem,
>> processor etc) to allow the guest OS to do anything meaningful with the
>> error. So who should create the CPER is the question.
>>
>> At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error 
>> arrives
>> at EL3 and secure firmware (at EL3 or a lower secure exception level) is
>> responsible for creating the CPER. ARM is experimenting with using a 
>> Standalone
>> MM EDK2 image in the secure world to do the CPER creation. This will avoid
>> adding the same code in ARM TF in EL3 (better for security). The error will 
>> then
>> be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM Trusted
>> Firmware.
>>
>> Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
>> interface (as discussed with Christoffer below). So it should generate the 
>> CPER
>> before injecting the error.
>>
>> This is corresponds to (1) above apart from notifying UEFI (I am assuming you
>> mean guest UEFI). At this time, the guest OS already knows where to pick up 
>> the
>> CPER from through the HEST. Qemu has to create the CPER and populate its 
>> address
>> at the address exported in the HEST. Guest UEFI should not be involved in 
>> this
>> flow. Its job was to create the HEST at boot and that has been done by this
>> stage.
>>
>> Qemu folk will be able to add but it looks like support for CPER generation 
>> will
>> need to be added to Qemu. We need to resolve this.
>>
>> Do shout if I am missing anything above.
> 
> After reading this email, the use case looks *very* similar to what
> we've just done with VMGENID for QEMU 2.9.
> 
> We have a facility between QEMU and the guest firmware, called "ACPI
> linker/loader", with which QEMU instructs the firmware to
> 
> - allocate and download blobs into guest RAM (AcpiNVS type memory) --
> ALLOCATE command,
> 
> - relocate pointers in those blobs, to fields in other (or the same)
> blobs -- ADD_POINTER command,
> 
> - set ACPI table checksums -- ADD_CHECKSUM command,
> 
> - and send GPAs of fields within such blobs back to QEMU --
> WRITE_POINTER command.
> 
> This is how I imagine we can map the facility to the current use case
> (note that this is the first time I read about HEST / GHES / CPER):
> 
> etc/acpi/tables etc/hardware_errors
>  ==
>  +---+
> +--+ | address   | +-> +--+
> |HEST  + | registers | |   | Error Status |
> + ++ | +-+ |   | Data Block 1 |
> | | GHES   | --> | | address | +   | ++
> | | GHES   | --> | | address | --+ | |  CPER  |
> | | GHES   | --> | | address | + | | |  CPER  |
> | | GHES   | --> | | address

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-04-21 Thread gengdongjiu

Hi all/Laszlo,

  sorry, I have a question to consult with you.


On 2017/4/7 2:55, Laszlo Ersek wrote:
> On 04/06/17 14:35, gengdongjiu wrote:
>> Dear, Laszlo
>>Thanks for your detailed explanation.
>>
>> On 2017/3/29 19:58, Laszlo Ersek wrote:
>>> (This ought to be one of the longest address lists I've ever seen :)
>>> Thanks for the CC. I'm glad Shannon is already on the CC list. For good
>>> measure, I'm adding MST and Igor.)
>>>
>>> On 03/29/17 12:36, Achin Gupta wrote:
>>>> Hi gengdongjiu,
>>>>
>>>> On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
>>>>>
>>>>> Hi Laszlo/Biesheuvel/Qemu developer,
>>>>>
>>>>>Now I encounter a issue and want to consult with you in ARM64 
>>>>> platform， as described below:
>>>>>
>>>>> when guest OS happen synchronous or asynchronous abort, kvm needs
>>>>> to send the error address to Qemu or UEFI through sigbus to
>>>>> dynamically generate APEI table. from my investigation, there are
>>>>> two ways:
>>>>>
>>>>> (1) Qemu get the error address, and generate the APEI table, then
>>>>> notify UEFI to know this generation, then inject abort error to
>>>>> guest OS, guest OS read the APEI table.
>>>>> (2) Qemu get the error address, and let UEFI to generate the APEI
>>>>> table, then inject abort error to guest OS, guest OS read the APEI
>>>>> table.
>>>>
>>>> Just being pedantic! I don't think we are talking about creating the APEI 
>>>> table
>>>> dynamically here. The issue is: Once KVM has received an error that is 
>>>> destined
>>>> for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the 
>>>> error
>>>> into the guest OS, a CPER (Common Platform Error Record) has to be 
>>>> generated
>>>> corresponding to the error source (GHES corresponding to memory subsystem,
>>>> processor etc) to allow the guest OS to do anything meaningful with the
>>>> error. So who should create the CPER is the question.
>>>>
>>>> At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error 
>>>> arrives
>>>> at EL3 and secure firmware (at EL3 or a lower secure exception level) is
>>>> responsible for creating the CPER. ARM is experimenting with using a 
>>>> Standalone
>>>> MM EDK2 image in the secure world to do the CPER creation. This will avoid
>>>> adding the same code in ARM TF in EL3 (better for security). The error 
>>>> will then
>>>> be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM 
>>>> Trusted
>>>> Firmware.
>>>>
>>>> Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
>>>> interface (as discussed with Christoffer below). So it should generate the 
>>>> CPER
>>>> before injecting the error.
>>>>
>>>> This is corresponds to (1) above apart from notifying UEFI (I am assuming 
>>>> you
>>>> mean guest UEFI). At this time, the guest OS already knows where to pick 
>>>> up the
>>>> CPER from through the HEST. Qemu has to create the CPER and populate its 
>>>> address
>>>> at the address exported in the HEST. Guest UEFI should not be involved in 
>>>> this
>>>> flow. Its job was to create the HEST at boot and that has been done by this
>>>> stage.
>>>>
>>>> Qemu folk will be able to add but it looks like support for CPER 
>>>> generation will
>>>> need to be added to Qemu. We need to resolve this.
>>>>
>>>> Do shout if I am missing anything above.
>>>
>>> After reading this email, the use case looks *very* similar to what
>>> we've just done with VMGENID for QEMU 2.9.
>>>
>>> We have a facility between QEMU and the guest firmware, called "ACPI
>>> linker/loader", with which QEMU instructs the firmware to
>>>
>>> - allocate and download blobs into guest RAM (AcpiNVS type memory) --
>>> ALLOCATE command,
>>>
>>> - relocate pointers in those blobs, to fields in other (or the same)
>>> blobs -- ADD_POINTER command,
>>>
>>> - set ACPI table checksums -- ADD_CHECKSUM command,
>>>
>>> - and send GPAs of fields within such blobs back to QEMU --
>>> WRITE_POINTER command.
>>>
>>> This is how I imagine we can map the facility to the current use case
>>> (note that this is the first time I read about HEST / GHES / CPER):

Laszlo lists a Qemu GHES table generation solution, Mainly use the four 
commands: "ALLOCATE/ADD_POINTER/ADD_CHECKSUM/WRITE_POINTER" to communicate with 
BIOS
so whether the four commands needs to be supported by the guest firware/UEFI.  
I found the  "WRITE_POINTER" always failed. so I suspect guest UEFI/firmware 
not support the "WRITE_POINTER" command. please help me confirm it, thanks so 
much.

[PATCH] irqdomain: handle the per-CPU irq trigger type settings

2017-03-09 Thread gengdongjiu

when devices parse and map an per-cpu interrupt into linux virq space
using irq_of_parse_and_map API, it will always be failed if needs to set
the specified irq trigger type, because irq_set_irq_type is only for 1-N
mode interrupt source, not for per-cpu interrupt source. so handle per-cpu
IRQs for this failure.

Signed-off-by: Dongjiu Geng 
   Zhanghai bin 

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 9fd618d..8116cf2 100755
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -542,8 +542,16 @@ unsigned int irq_create_of_mapping(struct of_phandle_args 
*irq_data)

/* Set type if specified and different than the current one */
if (type != IRQ_TYPE_NONE &&
-   type != irq_get_trigger_type(virq))
-   irq_set_irq_type(virq, type);
+   type != irq_get_trigger_type(virq)) {
+   int ret = 0;
+   struct irq_data *irq_data = irq_get_irq_data(virq);
+
+   ret = irq_set_irq_type(virq, type);
+
+/* Handle per-cpu IRQ: just save type in irq_data */
+   if (-EINVAL == ret && irq_data)
+   irqd_set_trigger_type(irq_data, type);
+   }
return virq;
 }
 EXPORT_SYMBOL_GPL(irq_create_of_mapping);

Re: [PATCH] irqdomain: handle the per-CPU irq trigger type settings

2017-03-10 Thread gengdongjiu

Hi Gleixner,
  Thank you very much for your comment and review, I will update it later.

> 
> 
> 
> On Fri, 10 Mar 2017, gengdongjiu wrote:
> 
>> when devices parse and map an per-cpu interrupt into linux virq space
>> using irq_of_parse_and_map API, it will always be failed if needs to set
>> the specified irq trigger type, because irq_set_irq_type is only for 1-N
>> mode interrupt source, not for per-cpu interrupt source. so handle 
>> per-cpu
>> IRQs for this failure.
> 
> Please format your changelogs proper into sections:
> 
> 1) Context
> 
> 2) Problem
> 
> 3) Solution
> 
> Writing one big lump of a sentence is just unreadable.
> 
> Aside of that:
> 
> 1) No indentation of the changelog
> 2) Sentences start with upper case letters
> 3) Function references want () after the function name
> 
> Aside
> 
>>
>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>Zhanghai bin <zhanghaib...@huawei.com>
> 
> That SOB is bogus.
> 
>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>> index 9fd618d..8116cf2 100755
>> --- a/kernel/irq/irqdomain.c
>> +++ b/kernel/irq/irqdomain.c
>> @@ -542,8 +542,16 @@ unsigned int irq_create_of_mapping(struct 
>> of_phandle_args *irq_data)
>>
>> /* Set type if specified and different than the current one */
>> if (type != IRQ_TYPE_NONE &&
>> -   type != irq_get_trigger_type(virq))
>> -   irq_set_irq_type(virq, type);
>> +   type != irq_get_trigger_type(virq)) {
>> +   int ret = 0;
>> +   struct irq_data *irq_data = irq_get_irq_data(virq);
>> +
>> +   ret = irq_set_irq_type(virq, type);
>> +
>> +/* Handle per-cpu IRQ: just save type in irq_data */
>> +   if (-EINVAL == ret && irq_data)
>> +   irqd_set_trigger_type(irq_data, type);
> 
> This is completely broken. That stores a trigger type for any interrupt if
> the set type function fails.
> 
> You fail to explain
> 
> - WHY irq_set_irq_type() fails for these per cpu interrupts
> 
> - WHY storing the type in irqdata solves anything
> 
> - WHAT sets the type in the actual interrupt hardware
> 
> That information should be in the changelog 
> 
> Thanks,
> 
> tglx
> 
> .
>

Re: [PATCH V11 10/10] arm/arm64: KVM: add guest SEA support

2017-03-05 Thread gengdongjiu

Hi James,


> Hi Wang Xiongfeng,
>
> On 25/02/17 07:15, Xiongfeng Wang wrote:
>> On 2017/2/22 5:22, Tyler Baicar wrote:
>>> Currently external aborts are unsupported by the guest abort
>>> handling. Add handling for SEAs so that the host kernel reports
>>> SEAs which occur in the guest kernel.
>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index a5265ed..04f1dd50 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -1444,8 +1445,21 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
>>> struct kvm_run *run)
>>>
>>>  /* Check the stage-2 fault is trans. fault or write fault */
>>>  fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>>> -if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>>> -fault_status != FSC_ACCESS) {
>>> +
>>> +/* The host kernel will handle the synchronous external abort. There
>>> + * is no need to pass the error into the guest.
>>> + */
>
>> Can we inject an sea into the guest, so that the guest can kill the
>> application which causes the error if the guest won't be terminated
>> later. I'm not sure whether ghes_handle_memory_failure() called in
>> ghes_do_proc() will kill the qemu process. I think it only kill user
>> processes marked with PF_MCE_PROCESS & PF_MCE_EARLY.
>
> My understanding is the pages will get unmapped and recovered where possible
> (e.g. re-read from disk), the user space process will get SIGBUS/SIGSEV when 
> it
> next tries to access that page, which could be some time later.
> These flags in find_early_kill_thread() are a way to make the memory-failure
> code signal the process early, before it does any recovery. The 'MCE' makes me
> think its x86 specific.
> (early and late are described more in [0])
>
>
> Guests are a special case as QEMU may never access the faulty memory itself, 
> so
> it won't receive the 'late' signal. It looks like ARM/arm64 KVM lacks support
> for KVM_PFN_ERR_HWPOISON which sends SIGBUS from KVM's fault-handling code. I
> have patches to add support for this which I intend to send at rc1.

could you push this patch to opensource?


>
> [0] suggests 'KVM qemu' sets these MCE flags to take the 'early' path, but 
> given
> x86s KVM_PFN_ERR_HWPOISON, this may be out of date.
>
>
> Either way, once QEMU gets a signal indicating the virtual address, it can
> generate its own APEI CPER records and use the KVM APIs to mock up an
> Synchronous External Abort, (or inject an IRQ or run the vcpu waiting for the
> guest's polling thread to come round, whichever was described to the guest via
> the HEST/GHES tables).
>
> We can't hand the APEI CPER records we have in the kernel to the guest, as 
> they
> hold a host physical address, and maybe a host virtual address. We don't know
> where in guest memory we could write new APEI CPER records as these locations
> have to be reserved in the guests-UEFI memory map, and only QEMU knows where
> they are.
>
> To deliver RAS events to a guest we have to get QEMU involved.
>
>
> Thanks,
>
> James
>
>
> [0] https://www.kernel.org/doc/Documentation/vm/hwpoison.txt
>

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-04-06 Thread gengdongjiu

Hi Laszlo,
  thanks.

On 2017/4/7 2:55, Laszlo Ersek wrote:
> On 04/06/17 14:35, gengdongjiu wrote:
>> Dear, Laszlo
>>Thanks for your detailed explanation.
>>
>> On 2017/3/29 19:58, Laszlo Ersek wrote:
>>> (This ought to be one of the longest address lists I've ever seen :)
>>> Thanks for the CC. I'm glad Shannon is already on the CC list. For good
>>> measure, I'm adding MST and Igor.)
>>>
>>> On 03/29/17 12:36, Achin Gupta wrote:
>>>> Hi gengdongjiu,
>>>>
>>>> On Wed, Mar 29, 2017 at 05:36:37PM +0800, gengdongjiu wrote:
>>>>>
>>>>> Hi Laszlo/Biesheuvel/Qemu developer,
>>>>>
>>>>>Now I encounter a issue and want to consult with you in ARM64 
>>>>> platform， as described below:
>>>>>
>>>>> when guest OS happen synchronous or asynchronous abort, kvm needs
>>>>> to send the error address to Qemu or UEFI through sigbus to
>>>>> dynamically generate APEI table. from my investigation, there are
>>>>> two ways:
>>>>>
>>>>> (1) Qemu get the error address, and generate the APEI table, then
>>>>> notify UEFI to know this generation, then inject abort error to
>>>>> guest OS, guest OS read the APEI table.
>>>>> (2) Qemu get the error address, and let UEFI to generate the APEI
>>>>> table, then inject abort error to guest OS, guest OS read the APEI
>>>>> table.
>>>>
>>>> Just being pedantic! I don't think we are talking about creating the APEI 
>>>> table
>>>> dynamically here. The issue is: Once KVM has received an error that is 
>>>> destined
>>>> for a guest it will raise a SIGBUS to Qemu. Now before Qemu can inject the 
>>>> error
>>>> into the guest OS, a CPER (Common Platform Error Record) has to be 
>>>> generated
>>>> corresponding to the error source (GHES corresponding to memory subsystem,
>>>> processor etc) to allow the guest OS to do anything meaningful with the
>>>> error. So who should create the CPER is the question.
>>>>
>>>> At the EL3/EL2 interface (Secure Firmware and OS/Hypervisor), an error 
>>>> arrives
>>>> at EL3 and secure firmware (at EL3 or a lower secure exception level) is
>>>> responsible for creating the CPER. ARM is experimenting with using a 
>>>> Standalone
>>>> MM EDK2 image in the secure world to do the CPER creation. This will avoid
>>>> adding the same code in ARM TF in EL3 (better for security). The error 
>>>> will then
>>>> be injected into the OS/Hypervisor (through SEA/SEI/SDEI) through ARM 
>>>> Trusted
>>>> Firmware.
>>>>
>>>> Qemu is essentially fulfilling the role of secure firmware at the EL2/EL1
>>>> interface (as discussed with Christoffer below). So it should generate the 
>>>> CPER
>>>> before injecting the error.
>>>>
>>>> This is corresponds to (1) above apart from notifying UEFI (I am assuming 
>>>> you
>>>> mean guest UEFI). At this time, the guest OS already knows where to pick 
>>>> up the
>>>> CPER from through the HEST. Qemu has to create the CPER and populate its 
>>>> address
>>>> at the address exported in the HEST. Guest UEFI should not be involved in 
>>>> this
>>>> flow. Its job was to create the HEST at boot and that has been done by this
>>>> stage.
>>>>
>>>> Qemu folk will be able to add but it looks like support for CPER 
>>>> generation will
>>>> need to be added to Qemu. We need to resolve this.
>>>>
>>>> Do shout if I am missing anything above.
>>>
>>> After reading this email, the use case looks *very* similar to what
>>> we've just done with VMGENID for QEMU 2.9.
>>>
>>> We have a facility between QEMU and the guest firmware, called "ACPI
>>> linker/loader", with which QEMU instructs the firmware to
>>>
>>> - allocate and download blobs into guest RAM (AcpiNVS type memory) --
>>> ALLOCATE command,
>>>
>>> - relocate pointers in those blobs, to fields in other (or the same)
>>> blobs -- ADD_POINTER command,
>>>
>>> - set ACPI table checksums -- ADD_CHECKSUM command,
>>>
>>> - and send GPAs of fields within such blobs back to QEMU --
>>> WRITE_POINTER command.
>>>

Re: [PATCH] acpi: ghes: fix the OSPM acknowledges error flow

2017-08-04 Thread gengdongjiu

Hi,
   please ignore this fix, original logic is right. the Read ACK register 
directly contain the ACK value, not the ACK address.


On 2017/8/3 23:42, Dongjiu Geng wrote:
> In GHESv2, The read_ack_register is used to specify the
> location of the read ack register, it is only the physical
> address, but not the value. so needs to continue reading
> the address to get the right value. Also It needs to write
> the ack value to the right physical address.
> 
> Signed-off-by: Dongjiu Geng 
> ---
>  drivers/acpi/apei/ghes.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index bb83044..44bb65f 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -703,16 +703,25 @@ static void ghes_estatus_cache_add(
>  static int ghes_ack_error(struct acpi_hest_generic_v2 *gv2)
>  {
>   int rc;
> - u64 val = 0;
> + u64 ack_paddr;
> + u64 ack_val = 0;
>  
> - rc = apei_read(, >read_ack_register);
> + rc = apei_read(_paddr, >read_ack_register);
>   if (rc)
>   return rc;
>  
> - val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
> - val |= gv2->read_ack_write<< gv2->read_ack_register.bit_offset;
> + if (!ack_paddr)
> + return -ENOENT;
> +
> + ghes_copy_tofrom_phys(_val, ack_paddr,
> +   sizeof(u64), 1);
>  
> - return apei_write(val, >read_ack_register);
> + ack_val &= gv2->read_ack_preserve << gv2->read_ack_register.bit_offset;
> + ack_val |= gv2->read_ack_write<< gv2->read_ack_register.bit_offset;
> +
> + ghes_copy_tofrom_phys(_val, ack_paddr,
> + sizeof(u64), 0);
> + return 0;
>  }
>  
>  static void __ghes_panic(struct ghes *ghes)
>

Re: [PATCH v2] acpi: apei: fix the wrongly iterate generic error status block

2017-08-15 Thread gengdongjiu

Borislav,

2017-08-16 0:32 GMT+08:00, Borislav Petkov <b...@suse.de>:
> On Wed, Aug 16, 2017 at 12:30:55AM +0800, gengdongjiu wrote:
>> I think this patch has merged them to one.
>
> Look at both patches again.

I ever discuss it with Tyler about it, as shown below link, thanks

https://lkml.org/lkml/2017/8/14/355


>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB
> 21284 (AG Nürnberg)
> --
>

Re: [PATCH v2] acpi: apei: fix the wrongly iterate generic error status block

2017-08-15 Thread gengdongjiu

Hi Borislav,

>
> ... and uses that accessor.
>
> Tyler?
>
> I'd prefer if you guys merge your two patches, Tyler's from
> https://marc.info/?l=linux-acpi=150179595323038=2 and this one into
> a single one.

I think this patch has merged them to one.

>
> How does that sound?
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB
> 21284 (AG Nürnberg)
> --
>

Re: [PATCH v2] acpi: apei: fix the wrongly iterate generic error status block

2017-08-15 Thread gengdongjiu

Hi Tyler ,

> Hello Boris,
>
> His patch fixes the define for apei_estatus_for_each_section which in turn
> should fix ghes_do_proc(). So my patch should no longer be needed. I'm going
> to test this out just to verify if fixes the issue I found.

I have verified the issue about the iteration for the revision 0x300
generic error data,
it works well. it is good that you will verify that in your  platform.

>
> Dongjiu,
>
> This patch changes cper_estatus_print() to use
> apei_estatus_for_each_section. Can you also make that same change to
> cper_estatus_check() since that function is doing the same iteration?

I will do it, Tyler.


>
> Thanks,
> Tyler
>
> --
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
> Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project.
>

Re: [PATCH v2] acpi: apei: fix the wrongly iterate generic error status block

2017-08-16 Thread gengdongjiu

Hello Tyler,
  I have already updated a new version patch to adds this usage to 
cper_estatus_check(), please re-test.

Hello Boris,
  The original macro of apei_estatus_for_each_section has two issues:
one is the iteration loop termination condition; another is the iteration 
steps. please review it.
thanks.

On 2017/8/16 7:26, Baicar, Tyler wrote:
> On 8/15/2017 3:34 PM, gengdongjiu wrote:
>> Hi Tyler ,
>>
>>> Hello Boris,
>>>
>>> His patch fixes the define for apei_estatus_for_each_section which in turn
>>> should fix ghes_do_proc(). So my patch should no longer be needed. I'm going
>>> to test this out just to verify if fixes the issue I found.
>> I have verified the issue about the iteration for the revision 0x300
>> generic error data,
>> it works well. it is good that you will verify that in your  platform.
> I've verified that this resolves the issue as well! I'll re-test with the 
> next version that adds this usage to cper_estatus_check() and add my 
> tested-by after that.
> 
> Thanks,
> Tyler
>

Re: [PATCH v2] acpi: apei: fix the wrongly iterate generic error status block

2017-08-15 Thread gengdongjiu

Loop more people to review the patch.


2017-08-15 19:15 GMT+08:00, Dongjiu Geng :
> The revision 0x300 generic error data entry is different
> from the old version, but currently iterating through the
> GHES estatus blocks does not take into account this difference.
> This will lead to failure to get the right data entry if GHES
> has revision 0x300 error data entry.
>
> Update the GHES estatus iteration to properly increment using
> acpi_hest_get_next, and correct the iteration termination condition
> because the status block data length only includes error data length.
> Clear the CPER estatus printing iteration logic to use same macro.
>
> Signed-off-by: Dongjiu Geng 
> CC: Tyler Baicar 
> ---
>  drivers/acpi/apei/apei-internal.h | 5 -
>  drivers/firmware/efi/cper.c   | 7 +--
>  include/acpi/ghes.h   | 5 +
>  3 files changed, 6 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/acpi/apei/apei-internal.h
> b/drivers/acpi/apei/apei-internal.h
> index 6e9f14c0a71b..cb4126051f62 100644
> --- a/drivers/acpi/apei/apei-internal.h
> +++ b/drivers/acpi/apei/apei-internal.h
> @@ -120,11 +120,6 @@ int apei_exec_collect_resources(struct
> apei_exec_context *ctx,
>  struct dentry;
>  struct dentry *apei_get_debugfs_dir(void);
>
> -#define apei_estatus_for_each_section(estatus, section)  
> \
> - for (section = (struct acpi_hest_generic_data *)(estatus + 1);  \
> -  (void *)section - (void *)estatus < estatus->data_length;  \
> -  section = (void *)(section+1) + section->error_data_length)
> -
>  static inline u32 cper_estatus_len(struct acpi_hest_generic_status
> *estatus)
>  {
>   if (estatus->raw_data_length)
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 48a8f69da42a..dff454321160 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -606,7 +606,6 @@ void cper_estatus_print(const char *pfx,
>   const struct acpi_hest_generic_status *estatus)
>  {
>   struct acpi_hest_generic_data *gdata;
> - unsigned int data_len;
>   int sec_no = 0;
>   char newpfx[64];
>   __u16 severity;
> @@ -617,14 +616,10 @@ void cper_estatus_print(const char *pfx,
>  "It has been corrected by h/w "
>  "and requires no further action");
>   printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
> - data_len = estatus->data_length;
> - gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
>
> - while (data_len >= acpi_hest_get_size(gdata)) {
> + apei_estatus_for_each_section(estatus, gdata) {
>   cper_estatus_print_section(newpfx, gdata, sec_no);
> - data_len -= acpi_hest_get_record_size(gdata);
> - gdata = acpi_hest_get_next(gdata);
>   sec_no++;
>   }
>  }
> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
> index 9f26e01186ae..9061c5c743b3 100644
> --- a/include/acpi/ghes.h
> +++ b/include/acpi/ghes.h
> @@ -113,6 +113,11 @@ static inline void *acpi_hest_get_next(struct
> acpi_hest_generic_data *gdata)
>   return (void *)(gdata) + acpi_hest_get_record_size(gdata);
>  }
>
> +#define apei_estatus_for_each_section(estatus, section)  
> \
> + for (section = (struct acpi_hest_generic_data *)(estatus + 1);  \
> +  (void *)section - (void *)(estatus + 1) < estatus->data_length; \
> +  section = acpi_hest_get_next(section))
> +
>  int ghes_notify_sea(void);
>
>  #endif /* GHES_H */
> --
> 2.14.0
>
>

Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

2017-08-14 Thread gengdongjiu

Hi,Tyler,
   Yes, I will add a patch based on it, thanks a lot that you will also have a 
test.


On 2017/8/14 22:04, Baicar, Tyler wrote:
> This change works too, I think it just makes sense to have the iterations in 
> the CPER and GHES code match. Do you want to add a patch to your patch here 
> to change the CPER code as well? If so, I'll wait for that and test it out.
> 
> Thanks,
> Tyler

Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

2017-08-10 Thread gengdongjiu

Hello,

   sorry, I do not see that. Just know I have reviewed your modification, may 
be my change can be simpleness and reserve the macro of 
apei_estatus_for_each_section
can be used by other place to avoid duplicated code, such as prints the estatus 
blocks.

On 2017/8/11 1:48, Baicar, Tyler wrote:
> Hello,
> 
> I have already posted a patch fixing this. Please see:
> 
> https://lkml.org/lkml/2017/8/3/824
> 
> This makes the loop identical to the CPER code which prints the estatus 
> blocks to the kernel logs.
> 
> Thanks,
> 
> Tyler
> 
> 
> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different with the old
>> version. when ghes_do_proc traverses to get the data entry, it does not
>> consider this difference. so when error status block has revision 0x300
>> data entry, it will have issue.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>>   drivers/acpi/apei/apei-internal.h | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/apei-internal.h 
>> b/drivers/acpi/apei/apei-internal.h
>> index 6e9f14c0a71b..6491f1c4a96e 100644
>> --- a/drivers/acpi/apei/apei-internal.h
>> +++ b/drivers/acpi/apei/apei-internal.h
>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>> #define apei_estatus_for_each_section(estatus, section)\
>>   for (section = (struct acpi_hest_generic_data *)(estatus + 1);\
>> - (void *)section - (void *)estatus < estatus->data_length;\
>> - section = (void *)(section+1) + section->error_data_length)
>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>> + section = acpi_hest_get_next(section))
>> static inline u32 cper_estatus_len(struct acpi_hest_generic_status 
>> *estatus)
>>   {
>

Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

2017-08-10 Thread gengdongjiu

may be directly remove the macro apei_estatus_for_each_section is not better, 
if other place code also
needs to iterate through the GHES estatus blocks, it will be repeated written 
again.


On 2017/8/11 5:31, gengdongjiu wrote:
> Hello,
> 
>sorry, I do not see that. Just know I have reviewed your modification, may 
> be my change can be simpleness and reserve the macro of 
> apei_estatus_for_each_section
> can be used by other place to avoid duplicated code, such as prints the 
> estatus blocks.
> 
> On 2017/8/11 1:48, Baicar, Tyler wrote:
>> Hello,
>>
>> I have already posted a patch fixing this. Please see:
>>
>> https://lkml.org/lkml/2017/8/3/824
>>
>> This makes the loop identical to the CPER code which prints the estatus 
>> blocks to the kernel logs.
>>
>> Thanks,
>>
>> Tyler
>>
>>
>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>> The revision 0x300 generic error data entry is different with the old
>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>> consider this difference. so when error status block has revision 0x300
>>> data entry, it will have issue.
>>>
>>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>> ---
>>>   drivers/acpi/apei/apei-internal.h | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/acpi/apei/apei-internal.h 
>>> b/drivers/acpi/apei/apei-internal.h
>>> index 6e9f14c0a71b..6491f1c4a96e 100644
>>> --- a/drivers/acpi/apei/apei-internal.h
>>> +++ b/drivers/acpi/apei/apei-internal.h
>>> @@ -122,8 +122,8 @@ struct dentry *apei_get_debugfs_dir(void);
>>> #define apei_estatus_for_each_section(estatus, section)\
>>>   for (section = (struct acpi_hest_generic_data *)(estatus + 1);\
>>> - (void *)section - (void *)estatus < estatus->data_length;\
>>> - section = (void *)(section+1) + section->error_data_length)
>>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>>> + section = acpi_hest_get_next(section))
>>> static inline u32 cper_estatus_len(struct acpi_hest_generic_status 
>>> *estatus)
>>>   {
>>

Re: [PATCH] acpi: apei: fix GHES estatus iteration

2017-08-10 Thread gengdongjiu



On 2017/8/9 8:52, Rafael J. Wysocki wrote:
> On Tuesday, August 8, 2017 6:32:20 PM CEST Will Deacon wrote:
>> On Thu, Aug 03, 2017 at 03:32:25PM -0600, Tyler Baicar wrote:
>>> Currently iterating through the GHES estatus blocks does not
>>> take into account the new generic data v3 structure size. This
>>> can result in garbage non-standard trace events to be triggered
>>> since the loop will not properly iterate through the estatus
>>> blocks and not properly terminate.
>>>
>>> Update the GHES estatus iteration to properly increment through
>>> the estatus blocks similar to how the CPER estatus printing
>>> iterates through them.
>>>
>>> Fixes: bbcc2e7b642e ("ras: acpi/apei: cper: add support for generic data v3 
>>> structure")
I do not think it has relationship with this "bbcc2e7b642e", the root cause is 
that it
does not consider the acpi_hest_generic_data_v300, this code exists long long 
time ago.

so please also review this patch:
https://lkml.org/lkml/2017/8/10/747

previously I do not see Tyler's this modification, just know see it.


>>> Signed-off-by: Tyler Baicar 
>>> Tested-by: Austin Christ 
>>> ---
>>>  drivers/acpi/apei/apei-internal.h | 5 -
>>>  drivers/acpi/apei/ghes.c  | 8 +++-
>>>  2 files changed, 7 insertions(+), 6 deletions(-)
>>
>> Whilst much of the initial code here went through the arm64 tree in the
>> previous merge window, I'm assuming that Boris will take this fix via his
>> tree (likewise for "[PATCH V2] acpi: apei: clear error status before
>> acknowledging the error").
> 
> Actually I will if Boris ACKs these.
> 
> Thanks,
> Rafael
> 
> 
> .
>

Re: [PATCH] acpi: apei: fix the wrongly parse generic error status block

2017-08-11 Thread gengdongjiu

2017-08-11 21:19 GMT+08:00 Baicar, Tyler <tbai...@codeaurora.org>:
> I removed the apei_estatus_for_each_section because it was only being used
> in this one spot even though several other parts of the code do the same
> iteration (it is done several times in the CPER code). I made this iteration
> match the CPER iterations because the CPER iterations are verifying that the
> structures are all valid and lengths are correct. If those checks are being
> done this way, it makes the most sense to mimic that iteration here when
> calling into EDAC and triggering trace events.

I think the macro includes the verification for the structures and
lengths correction, it it does not correct, it will breadk the loop.
I do not see your modifcation does some special validation, it almost
smilar with the macro does.
in all the code there are three functions to do the iteration.
ghes_do_proc/cper_estatus_print/cper_estatus_check
the cper_estatus_check function is especial, because its  purpose is
to validate CPER, so it will  check every length. but not all
function's purpose is to check, for example cper_estatus_print, as
shown below, so we can use this macro to clear the code. Now we can
see there are two function use it, but in the future, if want to
iterate CPER, we can use the macro if it does not want to do special
thing.

#define apei_estatus_for_each_section(estatus, section) \
for (section = (struct acpi_hest_generic_data *)(estatus + 1); \
(void *)section - (void *)(estatus + 1) < estatus->data_length; \
--> here it will check whether length is valid, if not,  it will
break the loop
section = acpi_hest_get_next(section))


Original code:
void cper_estatus_print(const char *pfx,
   const struct acpi_hest_generic_status *estatus)
{
 struct acpi_hest_generic_data *gdata;
 unsigned int data_len;
 int sec_no = 0;
 char newpfx[64];
 __u16 severity;
 severity = estatus->error_severity;
 if (severity == CPER_SEV_CORRECTED)
  printk("%s%s\n", pfx,
 "It has been corrected by h/w "
 "and requires no further action");
 printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 data_len = estatus->data_length;
 gdata = (struct acpi_hest_generic_data *)(estatus + 1);
 snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 while (data_len >= acpi_hest_get_size(gdata)) {
  cper_estatus_print_section(newpfx, gdata, sec_no);
  data_len -= acpi_hest_get_record_size(gdata);
  gdata = acpi_hest_get_next(gdata);
  sec_no++;
 }
}

Can change to:
void cper_estatus_print(const char *pfx,
   const struct acpi_hest_generic_status *estatus)
{
 struct acpi_hest_generic_data *gdata;
 int sec_no = 0;
 char newpfx[64];
 __u16 severity;
 severity = estatus->error_severity;
 if (severity == CPER_SEV_CORRECTED)
  printk("%s%s\n", pfx,
 "It has been corrected by h/w "
 "and requires no further action");
 printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
 snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
 apei_estatus_for_each_section {
  cper_estatus_print_section(newpfx, gdata, sec_no);
  sec_no++;
 }
}



>
> Thanks,
>
> Tyler
>
>
> On 8/10/2017 3:37 PM, gengdongjiu wrote:
>>
>> may be directly remove the macro apei_estatus_for_each_section is not
>> better, if other place code also
>> needs to iterate through the GHES estatus blocks, it will be repeated
>> written again.
>>
>>
>> On 2017/8/11 5:31, gengdongjiu wrote:
>>>
>>> Hello,
>>>
>>> sorry, I do not see that. Just know I have reviewed your
>>> modification, may be my change can be simpleness and reserve the macro of
>>> apei_estatus_for_each_section
>>> can be used by other place to avoid duplicated code, such as prints the
>>> estatus blocks.
>>>
>>> On 2017/8/11 1:48, Baicar, Tyler wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have already posted a patch fixing this. Please see:
>>>>
>>>> https://lkml.org/lkml/2017/8/3/824
>>>>
>>>> This makes the loop identical to the CPER code which prints the estatus
>>>> blocks to the kernel logs.
>>>>
>>>> Thanks,
>>>>
>>>> Tyler
>>>>
>>>>
>>>> On 8/10/2017 12:06 PM, Dongjiu Geng wrote:
>>>>>
>>>>> The revision 0x300 generic error data entry is different with the old
>>>>> version. when ghes_do_proc traverses to get the data entry, it does not
>>>>> consider this difference. so when error status block has revision 0x300
>>>>> data entry, it will have issue.
>>>>>
>>&g

Re: [PATCH v3] acpi: apei: fix the wrongly iterate generic error status block

2017-08-16 Thread gengdongjiu

CC Will and Jonathan


On 2017/8/16 21:55, Baicar, Tyler wrote:
> On 8/16/2017 2:14 AM, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different
>> from the old version, but currently iterating through the
>> GHES estatus blocks does not take into account this difference.
>> This will lead to failure to get the right data entry if GHES
>> has revision 0x300 error data entry.
>>
>> Update the GHES estatus iteration to properly increment using
>> acpi_hest_get_next, and correct the iteration termination condition
>> because the status block data length only includes error data length.
>> Clear the CPER estatus printing iteration logic to use same macro.
>>
>> Signed-off-by: Dongjiu Geng 
>> CC: Tyler Baicar 
> Tested-by: Tyler Baicar 
> 
> Works good for me!
> 
> Thanks,
> Tyler
>> ---
>>   drivers/acpi/apei/apei-internal.h |  5 -
>>   drivers/firmware/efi/cper.c   | 12 ++--
>>   include/acpi/ghes.h   |  5 +
>>   3 files changed, 7 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/apei-internal.h 
>> b/drivers/acpi/apei/apei-internal.h
>> index 6e9f14c0a71b..cb4126051f62 100644
>> --- a/drivers/acpi/apei/apei-internal.h
>> +++ b/drivers/acpi/apei/apei-internal.h
>> @@ -120,11 +120,6 @@ int apei_exec_collect_resources(struct 
>> apei_exec_context *ctx,
>>   struct dentry;
>>   struct dentry *apei_get_debugfs_dir(void);
>>   -#define apei_estatus_for_each_section(estatus, section)\
>> -for (section = (struct acpi_hest_generic_data *)(estatus + 1);\
>> - (void *)section - (void *)estatus < estatus->data_length;\
>> - section = (void *)(section+1) + section->error_data_length)
>> -
>>   static inline u32 cper_estatus_len(struct acpi_hest_generic_status 
>> *estatus)
>>   {
>>   if (estatus->raw_data_length)
>> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
>> index 48a8f69da42a..bf3672a81e49 100644
>> --- a/drivers/firmware/efi/cper.c
>> +++ b/drivers/firmware/efi/cper.c
>> @@ -606,7 +606,6 @@ void cper_estatus_print(const char *pfx,
>>   const struct acpi_hest_generic_status *estatus)
>>   {
>>   struct acpi_hest_generic_data *gdata;
>> -unsigned int data_len;
>>   int sec_no = 0;
>>   char newpfx[64];
>>   __u16 severity;
>> @@ -617,14 +616,10 @@ void cper_estatus_print(const char *pfx,
>>  "It has been corrected by h/w "
>>  "and requires no further action");
>>   printk("%s""event severity: %s\n", pfx, cper_severity_str(severity));
>> -data_len = estatus->data_length;
>> -gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>>   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
>>   -while (data_len >= acpi_hest_get_size(gdata)) {
>> +apei_estatus_for_each_section(estatus, gdata) {
>>   cper_estatus_print_section(newpfx, gdata, sec_no);
>> -data_len -= acpi_hest_get_record_size(gdata);
>> -gdata = acpi_hest_get_next(gdata);
>>   sec_no++;
>>   }
>>   }
>> @@ -653,15 +648,12 @@ int cper_estatus_check(const struct 
>> acpi_hest_generic_status *estatus)
>>   if (rc)
>>   return rc;
>>   data_len = estatus->data_length;
>> -gdata = (struct acpi_hest_generic_data *)(estatus + 1);
>>   -while (data_len >= acpi_hest_get_size(gdata)) {
>> +apei_estatus_for_each_section(estatus, gdata) {
>>   gedata_len = acpi_hest_get_error_length(gdata);
>>   if (gedata_len > data_len - acpi_hest_get_size(gdata))
>>   return -EINVAL;
>> -
>>   data_len -= acpi_hest_get_record_size(gdata);
>> -gdata = acpi_hest_get_next(gdata);
>>   }
>>   if (data_len)
>>   return -EINVAL;
>> diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
>> index 9f26e01186ae..9061c5c743b3 100644
>> --- a/include/acpi/ghes.h
>> +++ b/include/acpi/ghes.h
>> @@ -113,6 +113,11 @@ static inline void *acpi_hest_get_next(struct 
>> acpi_hest_generic_data *gdata)
>>   return (void *)(gdata) + acpi_hest_get_record_size(gdata);
>>   }
>>   +#define apei_estatus_for_each_section(estatus, section)\
>> +for (section = (struct acpi_hest_generic_data *)(estatus + 1);\
>> + (void *)section - (void *)(estatus + 1) < estatus->data_length; \
>> + section = acpi_hest_get_next(section))
>> +
>>   int ghes_notify_sea(void);
>> #endif /* GHES_H */
>

Re: [PATCH v3] acpi: apei: fix the wrongly iterate generic error status block

2017-08-17 Thread gengdongjiu

Borislav,
  thanks for the review,

On 2017/8/17 17:25, Borislav Petkov wrote:
> On Wed, Aug 16, 2017 at 04:14:50PM +0800, Dongjiu Geng wrote:
>> The revision 0x300 generic error data entry is different
>> from the old version, but currently iterating through the
>> GHES estatus blocks does not take into account this difference.
>> This will lead to failure to get the right data entry if GHES
>> has revision 0x300 error data entry.
>>
>> Update the GHES estatus iteration to properly increment using
> iteration macro
> 
>> acpi_hest_get_next, and correct the iteration termination condition
> Please end function names with parentheses.
> 
>> because the status block data length only includes error data length.
> < newline here.
> 
>> Clear the CPER estatus printing iteration logic to use same macro.
> s/Clear ... /Convert ... to the same macro./
> 
>> Signed-off-by: Dongjiu Geng 
>> CC: Tyler Baicar 
>> ---
>>  drivers/acpi/apei/apei-internal.h |  5 -
>>  drivers/firmware/efi/cper.c   | 12 ++--
>>  include/acpi/ghes.h   |  5 +
>>  3 files changed, 7 insertions(+), 15 deletions(-)
> With those addressed you can add:
  Ok, I will add.

> 
> Reviewed-by: Borislav Petkov 
> 
> --

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-03 Thread gengdongjiu

Hi Christoffer,
  thanks for the review.

On 2017/7/3 16:39, Christoffer Dall wrote:
> Hi Dongjiu,
> 
> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>> when SError happen, kvm notifies user space to record the CPER,
>> user space specifies and passes the contents of ESR_EL1 on taking
>> a virtual SError interrupt to KVM, KVM enables virtual system
>> error or asynchronous abort with this specifies syndrome. This
>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>> HCR_EL2.VSE injects an SError. This register is added by the
>> RAS Extensions.
> 
> This commit message is confusing and doesn't help me understand the
> patch.
(1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
the RAS solution?
  you can refer to document: "RAS_Extension_PRD03-PRDC-010953-32-0, 6.5.3 
Example software sequences"
  a). In the firmware-first RAS solution, when guest OS happen a SError 
interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
  b). The firmware logs, triages, and delegates the error exception to the 
hypervisor. As the error came from guest OS  EL1, firmware
  does by faking an SError interrupt exception entry to EL2.
  c). Control transfers to the hypervisor's delegated error recovery 
agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
  Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.

(2) what is this patch mainly do?
  As mentioned above, the hypervisor needs to enable virtual SError and pass 
the virtual syndrome to the guest OS.

  a). when Control transfers to the hypervisor from firmware by faking an 
SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
  host VA address( Qemu translate this VA address to the virtual machine 
physical address(IPA)) using below new added "serror_intr" struct.
/* KVM_EXIT_SERROR_INTR */
struct {
__u32 syndrome_info;
__u64 address;
} serror_intr;

  b). Qemu gets the address(host VA) delivered by KVM, translate this host VA 
address to virtual machine physical address(IPA), and runtime record this 
virtual
 machine physical address(IPA) to the guest OS's APEI table.

  c). Qemu gets the syndrome_info delivered by KVM, it refers to this syndrome 
value(but can be different from it) to specify the virtual SError interrupt's 
syndrome through setting VESR_EL2.

the vsesr_el2 is armv8.2 register, its explanation can be found in 
"RAS_Extension_PRD03-PRDC-010953-33-0, 5.6.18 VSESR_EL2, Virtual SError 
Exception Syndrome Register"

>>The VSESR_EL2 characteristics are:
>>Purpose:
>>Provides the syndrome value reported to software on taking a virtual 
SError interrupt exception:
>>  — If the virtual SError interrupt is taken to EL1 using AArch64 
then VSESR_EL2 provides the
>>syndrome value reported in ESR_EL1.
>>  — If the virtual SError interrupt is taken to EL1 using AArch32 
then VSESR_EL2 provides the
>>syndrome values reported in DFSR.{AET, ExT} and the remainder 
of the DFSR is set as
>>   defined by VMSAv8-32.

 so in the KVM, I added a new IOCTL(#define KVM_ARM_SEI  _IO(KVMIO,  0xb8)) 
to pass the virtual SError syndrome value specified by Qemu and enable a 
virtual System Error.

 d). when world switch to guest OS, guest OS will happen virtual SError(this 
virtual SError can not be route to EL3 firmware), guest OS uses the specified 
syndrome value to do the recovery and
 parses the guest OS CPER which is dynamically recorded by the Qemu in the 
APEI table .

> 
> I think this patch is trying to do too many things.  I suggest you split
> the patch into (at least) one patch that captures exception information
> from the world-switch path, one patch that deals with the new exit
> reason, and finally a patch with the new ioctl.  That way you can write
> a commit message for each patch describing first what the patch does,
> and then why this is a good idea.
  Ok, thanks for the good suggestion.

> 
> Neverthess, I added some random comments below.
> 
>>
>> Changes since v3:
>> (1) Move restore VSESR_EL2 value logic to a helper method
>> (2) In the world-switch, not save VSESR_EL2, because no one cares the
>> old VSESR_EL2 value
>> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
>> a virtual system error
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Quanming Wu 
>> ---
>>  Documentation/virtual/kvm/api.txt| 10 ++
>>  arch/arm/include/asm/kvm_host.h  |  1 +
>>  arch/arm/kvm/arm.c   |  7 +++
>>  arch/arm/kvm/guest.c |  5 +
>>  arch/arm64/include/asm/esr.h |  2 ++
>>

Re: [PATCH v4 2/3] arm64: kvm: route synchronous external abort exceptions to el2

2017-07-04 Thread gengdongjiu

Hi Christoffer,

On 2017/7/3 16:23, Christoffer Dall wrote:
> On Tue, Jun 27, 2017 at 08:15:49PM +0800, gengdongjiu wrote:
>> correct the commit message:
>>
>>  In the firmware-first RAS solution, OS receives an synchronous
>>  external abort, then trapped to EL3 by SCR_EL3.EA. Firmware inspects
>>  the HCR_EL2.TEA and chooses the target to send APEI's SEA notification.
>>  If the SCR_EL3.EA is set, delegates the error exception to the hypervisor,
>>  otherwise it delegates to the host OS kernel
> 
> This commit text has nothing (directly) to do with the content of the
> patch.  Whether or not seting these bits are used by firmware to emulate
> injecting an exception or by the CPU raising a an exception is not the
> core of the issue.
> 
> Please describe your change, then provide rationale.

(1)Below hcr_el2.TEA/TERR two field is introduced by armv8.2, RAS extension.

TEA, bit [37]
Route synchronous External Abort exceptions to EL2. The possible values 
of this bit are:
0 Do not route synchronous External Abort exceptions from Non-secure 
EL0 and EL1 to EL2.
1 Route synchronous External Abort exceptions from Non-secure EL0 and 
EL1 to EL2, if not routed
to EL3.
This bit is RES0 if the RAS extension is not implemented.
TERR, bit [36]
Trap Error record accesses. The possible values of this bit are:
0 Do not trap accesses to error record registers from Non-secure EL1 to 
EL2.
1 Accesses to the ER* registers from Non-secure EL1 generate a Trap 
exception to EL2.
This bit is RES0 if the RAS extension is not implemented.

(2) when synchronous External Abort(SEA) OS happen SEA, it trap to EL3 firmware.
then the firmware needs to do by faking an exception entry to  hypervisor EL2; 
or
by faking an exception entry to EL1
so if the hcr_el2.TEA is set, firmware will eret to EL2; otherwise, eret to EL1.
hcr_el2.TEA is only set for the guest OS.
not set for the host OS.

(3) setting hcr_el2.HCR_TERR want to trap the EL1 error record access to EL2.

> 
> Thanks,
> -Christoffer
> 
> 
>>
>>
>> On 2017/6/26 20:45, Dongjiu Geng wrote:
>>> In the firmware-first RAS solution, guest OS receives an synchronous
>>> external abort, then trapped to EL3 by SCR_EL3.EA. Firmware inspects
>>> the HCR_EL2.TEA and chooses the target to send APEI's SEA notification.
>>> If the SCR_EL3.EA is set, delegates the error exception to the hypervisor,
>>> otherwise it delegates to the guest OS kernel
>>>
>>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_arm.h | 2 ++
>>>  arch/arm64/include/asm/kvm_emulate.h | 7 +++
>>>  2 files changed, 9 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_arm.h 
>>> b/arch/arm64/include/asm/kvm_arm.h
>>> index 61d694c..1188272 100644
>>> --- a/arch/arm64/include/asm/kvm_arm.h
>>> +++ b/arch/arm64/include/asm/kvm_arm.h
>>> @@ -23,6 +23,8 @@
>>>  #include 
>>>  
>>>  /* Hyp Configuration Register (HCR) bits */
>>> +#define HCR_TEA(UL(1) << 37)
>>> +#define HCR_TERR   (UL(1) << 36)
>>>  #define HCR_E2H(UL(1) << 34)
>>>  #define HCR_ID (UL(1) << 33)
>>>  #define HCR_CD (UL(1) << 32)
>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>>> b/arch/arm64/include/asm/kvm_emulate.h
>>> index f5ea0ba..5f64ab2 100644
>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>> @@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>> vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
>>> if (is_kernel_in_hyp_mode())
>>> vcpu->arch.hcr_el2 |= HCR_E2H;
>>> +   if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>>> +   /* route synchronous external abort exceptions to EL2 */
>>> +   vcpu->arch.hcr_el2 |= HCR_TEA;
>>> +   /* trap error record accesses */
>>> +   vcpu->arch.hcr_el2 |= HCR_TERR;
>>> +   }
>>> +
>>> if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
>>> vcpu->arch.hcr_el2 &= ~HCR_RW;
>>>  }
>>>
>>
> 
> .
>

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-04 Thread gengdongjiu

Hi James,
  Thanks for the review. I will read your comments carefully and then reply to 
you.


On 2017/7/4 18:14, James Morse wrote:
> Hi gengdongjiu,
> 
> Can you give us a specific example of an error you are trying to handle?
> How would a non-KVM user space process handle the error?
> 
> KVM-users should be regular user space processes, we should not have a KVM-way
> and everyone-else-way of handling errors.
> 
> 
> On 04/07/17 05:46, gengdongjiu wrote:
>> On 2017/7/3 16:39, Christoffer Dall wrote:
>>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>>>> when SError happen, kvm notifies user space to record the CPER,
>>>> user space specifies and passes the contents of ESR_EL1 on taking
>>>> a virtual SError interrupt to KVM, KVM enables virtual system
>>>> error or asynchronous abort with this specifies syndrome. This
>>>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>>>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>>>> HCR_EL2.VSE injects an SError. This register is added by the
>>>> RAS Extensions.
>>>
>>> This commit message is confusing and doesn't help me understand the
>>> patch.
>> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
>> the RAS solution?
> 
>>   a). In the firmware-first RAS solution, when guest OS happen a SError 
>> interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>>   b). The firmware logs, triages, and delegates the error exception to the 
>> hypervisor. As the error came from guest OS  EL1, firmware
>>   does by faking an SError interrupt exception entry to EL2.
>>   c). Control transfers to the hypervisor's delegated error recovery 
>> agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>>   Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
>> setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
> 
> So (a): a physical-CPU hardware error occurs, and then (c) we tell 
> Qemu/kvmtool
> via a KVM-specific API.
> 
> Don't do this, it doesn't work for non-KVM users. You are exposing 
> host-specific
> implementation details to user space. What if I discover the same error via a
> Polling GHES, or one of the IRQ flavours?
> 
> User space should not have to know, or care, how linux is notified about APEI
> RAS errors.
> 
> 
>> (2) what is this patch mainly do?
>>   As mentioned above, the hypervisor needs to enable virtual SError and pass 
>> the virtual syndrome to the guest OS.
>>
>>   a). when Control transfers to the hypervisor from firmware by faking an 
>> SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>>   host VA address( Qemu translate this VA address to the virtual machine 
>> physical address(IPA)) using below new added "serror_intr" struct.
>>  /* KVM_EXIT_SERROR_INTR */
>>  struct {
>>  __u32 syndrome_info;
>>  __u64 address;
>>  } serror_intr;
> 
> This is for a guest exit to host user-space. Here you are telling Qemu that a
> physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
> parse the ESR, this is the job of the operating system.
> 
> When you're using ACPI firmware-first, SError/SEI is just a notification, the
> important data is in the CPER records, which Qemu can't access, (and should be
> processed by Linux APEI code).
> 
> 
> It looks like you've calculated an address from FAR_EL2/HPFAR_EL2. For an
> SError, these are meaningless.
> 
> (These registers hold real values for Synchronous External Abort, but for
>  firmware-first we should prefer the CPER records.)
> 
> 
>>   b). Qemu gets the address(host VA) delivered by KVM, translate this host 
>> VA address to virtual machine physical address(IPA), and runtime record this 
>> virtual
>>  machine physical address(IPA) to the guest OS's APEI table.
> 
> I agree with this step, but you're acting on the wrong data. (You're 
> converting
> fault_ipa -> virtual address -> fault_ipa, something isn't right ...)
> 
> Qemu should react to a signal like BUS_MCEERR_A{R,O} from memory_failure(). 
> This
> mechanism serves all user space processes, not just kvm users. This is where 
> the
> user-space virtual address should come from. Qemu/kvmtool have to generate the
> guest IPA once they discover the affected memory was presented to the guest
> through KVM.
> 
> 
> Your KVM-specific mechanism exposes too much raw information (raw ESR values 
> to

Re: [PATCH v4 1/3] arm64: kvm: support user space to detect RAS extension feature

2017-07-04 Thread gengdongjiu

Hi Christoffer,

On 2017/7/3 16:21, Christoffer Dall wrote:
> On Mon, Jun 26, 2017 at 08:45:43PM +0800, Dongjiu Geng wrote:
>> Handle userspace's detection for RAS extension, because sometimes
>> the userspace needs to know the CPU's capacity
> 
> Why?  Can you please provide some more rationale.

userspace mainly want to know whether CPU has RAS extension capability to 
decide whether need to specify the syndrome value.
if have, userspace specify the syndrome value. otherwise, not specify the value.

James ever suggest not want userspace to know the capability, and let KVM to 
judge the RAS extension capability.

but I consider it again, userspace know the RAS extension capability may be 
better, which can avoid KVM return error if
CPU does not support RAS extension.

could you give me some suggestion that whether let userspace to know the RAS 
extension capability?

> 
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>>  arch/arm64/kvm/reset.c   | 11 +++
>>  include/uapi/linux/kvm.h |  1 +
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index d9e9697..1004039 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -64,6 +64,14 @@ static bool cpu_has_32bit_el1(void)
>>  return !!(pfr0 & 0x20);
>>  }
>>  
>> +static bool kvm_arm_support_ras_extension(void)
>> +{
>> +u64 pfr0;
>> +
>> +pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
>> +return !!(pfr0 & 0x1000);
>> +}
> 
> Why is this specific to KVM?  This seems to reveal information about the
> underlying physical CPU, not specific to KVM at all, surely if userspace
> is really supposed to be able to figure this out, it should not be KVM
> specific.
  you are right. it should not be KVM specific, thanks for pointing it out.

> 
> Thanks,
> -Christoffer
> 
>> +
>>  /**
>>   * kvm_arch_dev_ioctl_check_extension
>>   *
>> @@ -87,6 +95,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_RAS_EXTENSION:
>> +r = kvm_arm_support_ras_extension();
>> +break;
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index f51d508..27fe556 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_PPC_MMU_RADIX 134
>>  #define KVM_CAP_PPC_MMU_HASH_V3 135
>>  #define KVM_CAP_IMMEDIATE_EXIT 136
>> +#define KVM_CAP_ARM_RAS_EXTENSION 137
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> -- 
>> 2.10.1
>>
> 
> .
>

Re: [PATCH v4] arm64: kvm: inject SError with user space specified syndrome

2017-07-03 Thread gengdongjiu

Hi Christoffer,
  thank you very much for your review.


2017-07-03 15:50 GMT+08:00, Christoffer Dall :
> Hi Dongjiu,
>
> It seems you sent this patch twice, once on its own and then part of a
> series?
Christoffer, yes, it is. once on its own and then part of a
series

>
> Also, please use a cover letter when sending patch series.
Ok, got it, thank you a lot for your suggestion.

>
> Thanks,
> -Christoffer
>
> On Mon, Jun 26, 2017 at 07:39:15PM +0800, Dongjiu Geng wrote:
>> when SError happen, kvm notifies user space to record the CPER,
>> user space specifies and passes the contents of ESR_EL1 on taking
>> a virtual SError interrupt to KVM, KVM enables virtual system
>> error or asynchronous abort with this specifies syndrome. This
>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>> HCR_EL2.VSE injects an SError. This register is added by the
>> RAS Extensions.
>>
>> Changes since v3:
>> (1) Move restore VSESR_EL2 value logic to a helper method
>> (2) In the world-switch, not save VSESR_EL2, because no one cares the
>> old VSESR_EL2 value
>> (3) Add a new KVM_ARM_SEI ioctl to set the VSESR_EL2 value and pend
>> a virtual system error
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Quanming Wu 
>> ---
>>  Documentation/virtual/kvm/api.txt| 10 ++
>>  arch/arm/include/asm/kvm_host.h  |  1 +
>>  arch/arm/kvm/arm.c   |  6 ++
>>  arch/arm/kvm/guest.c |  5 +
>>  arch/arm64/include/asm/esr.h |  2 ++
>>  arch/arm64/include/asm/kvm_emulate.h | 10 ++
>>  arch/arm64/include/asm/kvm_host.h|  2 ++
>>  arch/arm64/include/asm/sysreg.h  |  3 +++
>>  arch/arm64/kvm/guest.c   | 14 ++
>>  arch/arm64/kvm/handle_exit.c | 25 +++--
>>  arch/arm64/kvm/hyp/switch.c  | 14 ++
>>  include/uapi/linux/kvm.h |  8 
>>  12 files changed, 94 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt
>> b/Documentation/virtual/kvm/api.txt
>> index 3c248f7..852ac55 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -3377,6 +3377,16 @@ struct kvm_ppc_resize_hpt {
>>  __u32 pad;
>>  };
>>
>> +4.104 KVM_ARM_SEI
>> +
>> +Capability: KVM_EXIT_SERROR_INTR
>> +Architectures: arm/arm64
>> +Type: vcpu ioctl
>> +Parameters: u64 (syndrome)
>> +Returns: 0 in case of success
>> +
>> +Pend an virtual system error or asynchronous abort with user space
>> specified.
>> +
>>  5. The kvm_run structure
>>  
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h
>> b/arch/arm/include/asm/kvm_host.h
>> index 31ee468..566292a 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -244,6 +244,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu,
>> const struct kvm_one_reg *);
>>
>>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>  int exception_index);
>> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>>
>>  static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>> unsigned long hyp_stack_ptr,
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 96dba7c..2622501 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -987,6 +987,12 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>>  return -EFAULT;
>>  return kvm_arm_vcpu_has_attr(vcpu, );
>>  }
>> +case KVM_ARM_SEI: {
>> +u64 syndrome;
>> +if (copy_from_user(, argp, sizeof(syndrome)))
>> +return -EFAULT;
>> +return kvm_vcpu_ioctl_sei(vcpu, );
>> +}
>>  default:
>>  return -EINVAL;
>>  }
>> diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
>> index fa6182a..a610f8f 100644
>> --- a/arch/arm/kvm/guest.c
>> +++ b/arch/arm/kvm/guest.c
>> @@ -248,6 +248,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu
>> *vcpu,
>>  return -EINVAL;
>>  }
>>
>> +int kvm_vcpu_ioctl_sei(struct kvm_vcpu *vcpu, u64 *syndrome);
>> +{
>> +return 0;
>> +}
>> +
>>  int __attribute_const__ kvm_target_cpu(void)
>>  {
>>  switch (read_cpuid_part()) {
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 22f9c90..d009c99 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -127,6 +127,8 @@
>>  #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
>>  #define ESR_ELx_xVC_IMM_MASK((1UL << 16) - 1)
>>
>> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
>> +
>>  /* ESR value templates for specific events */
>>
>>  /* BRK instruction trap from AArch64 state */
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index

Re: [PATCH v4 3/3] arm64: kvm: inject SError with user space specified syndrome

2017-07-05 Thread gengdongjiu

Hi James,


On 2017/7/4 18:14, James Morse wrote:
> Hi gengdongjiu,
> 
> Can you give us a specific example of an error you are trying to handle?
For example:
guest OS user space accesses device type memory, but happen SError. because the
SError is asynchronous faults, it does not take immediately. when guest OS call 
"SVC" to enter guest os
kernel space, the ESB instruction(Error Synchronization Barrier) will defter 
this SError. so the SError happen immediately.


> How would a non-KVM user space process handle the error?
it is indeed, non-KVM user space can not get the notification from hypervisor 
or host kernel. thanks for the pointing out
do you mean still Signal SIGBUS from memory_failure?


> 
> KVM-users should be regular user space processes, we should not have a KVM-way
> and everyone-else-way of handling errors.
> 
> 
> On 04/07/17 05:46, gengdongjiu wrote:
>> On 2017/7/3 16:39, Christoffer Dall wrote:
>>> On Mon, Jun 26, 2017 at 08:46:39PM +0800, Dongjiu Geng wrote:
>>>> when SError happen, kvm notifies user space to record the CPER,
>>>> user space specifies and passes the contents of ESR_EL1 on taking
>>>> a virtual SError interrupt to KVM, KVM enables virtual system
>>>> error or asynchronous abort with this specifies syndrome. This
>>>> patch modify the world-switch to restore VSESR_EL2, VSESR_EL2
>>>> saves the virtual SError syndrome, it becomes the ESR_EL1 value when
>>>> HCR_EL2.VSE injects an SError. This register is added by the
>>>> RAS Extensions.
>>>
>>> This commit message is confusing and doesn't help me understand the
>>> patch.
>> (1) what is the rationale for the guest OS SError interrupt(SEI) handling in 
>> the RAS solution?
> 
>>   a). In the firmware-first RAS solution, when guest OS happen a SError 
>> interrupt (SEI), it will firstly trap to EL3(SCR_EL3.EA = 1);
>>   b). The firmware logs, triages, and delegates the error exception to the 
>> hypervisor. As the error came from guest OS  EL1, firmware
>>   does by faking an SError interrupt exception entry to EL2.
>>   c). Control transfers to the hypervisor's delegated error recovery 
>> agent.Because HCR_EL2.AMO is set to 1, the hypervisor can use a
>>   Virtual SError interrupt to delegate an asynchronous abort to EL1, by 
>> setting HCR_EL2.VSE to 1 and using VESR_EL2 to pass syndrome.
> 
> So (a): a physical-CPU hardware error occurs, and then (c) we tell 
> Qemu/kvmtool
> via a KVM-specific API.
> 
> Don't do this, it doesn't work for non-KVM users. You are exposing 
> host-specific
> implementation details to user space. What if I discover the same error via a
> Polling GHES, or one of the IRQ flavours?
James, you mainly concern the way that "tell Qemu/kvmtool via a KVM-specific 
API", right?
so how about still delivered SIGBUS same as the SEA(Synchronous External Abort)?

by the way, what is your meaning of below words?
 >"What if I discover the same error via a Polling GHES, or one of the IRQ 
 >flavours?"


> 
> User space should not have to know, or care, how linux is notified about APEI
> RAS errors.
> 
> 
>> (2) what is this patch mainly do?
>>   As mentioned above, the hypervisor needs to enable virtual SError and pass 
>> the virtual syndrome to the guest OS.
>>
>>   a). when Control transfers to the hypervisor from firmware by faking an 
>> SError interrupt, the hypervisor delivered the syndrome_info(esr_el2) and
>>   host VA address( Qemu translate this VA address to the virtual machine 
>> physical address(IPA)) using below new added "serror_intr" struct.
>>  /* KVM_EXIT_SERROR_INTR */
>>  struct {
>>  __u32 syndrome_info;
>>  __u64 address;
>>  } serror_intr;
> 
> This is for a guest exit to host user-space. Here you are telling Qemu that a
> physical CPU hardware error occurred. Qemu/kvmtool should not be expected to
> parse the ESR, this is the job of the operating system.
  it does not want Qemu/kvmtool to parse the ESR.
  Qemu/kvmtool can refer to the ESR to specify the vsesr's value, only for 
reference.

  As mentioned above, firmware does by faking an SError interrupt exception 
entry to EL2.
  so the esr_El2 may contain some useful information, Qemu can refer to this 
value to set the vsesr_el2(esr_el1).

  when qemu specified the vsesr value, do you mean not refer to the esr_el2 
value?
  if so, what is the suggested value for the vsesr_el2 value?

> 
> When you're using ACPI firmware-first, SError/SEI is just a notification, the
> important data is in the CPER records, which Qemu can't acc

Re: [PATCH] KVM: arm64: add esr_el2 and far_el2 to sysreg

2017-08-08 Thread gengdongjiu

Hi James,

On 2017/8/9 0:27, James Morse wrote:
> Hi gengdongjiu,
> 
> On 07/08/17 18:43, gengdongjiu wrote:
>> Another question, For the SEI, I want to also use SIGBUS both for the KVM 
>> user and non-kvm user,
>> if SEA and SEI Error all use the SIGBUS to notify user space(Qemu),
> 
> User-space shouldn't necessarily be notified about Synchronous External Aborts
> or SError Interrupts. You're really asking about RAS firmware-first
> notifications that use these as the notification mechanism.
Firstly, we am talking the RAS firmware-first solution. I mainly
want to let user space to know what is the Error type for this hardware 
error(Synchronous or asynchronous).
we do not care the notification mechanism. As our agreement before , Qemu will 
record the CPER for the
guest OS. if Qemu does not know the Error type, it can not record the CPER. 
because in the ghes there is a field to
fill the error type.

I paste the APEI table layout:
https://wiki.linaro.org/LEG/Engineering/Kernel/RAS/APEITables

usual the notification type is classified by the hardware type




> 
> We should not notify user-space that the guest happened to be interrupted by a
> RAS firmware-first notification. It may not be relevant, and we can't know 
> until
> we parse the CPER records. The notification mechanism is between firmware and
> the host kernel, we should never expose anything about it to user space or a 
> guest.
I agree with you this sentence, for the hardware error, host kernel will 
firstly deal with, and then decided whether to
notified Qemu/KVM tools. In this process, we do not care what is the 
notification mechanism between
firmware and host kernel. we only concern the hardware error type. different 
type, Qemu/KVM tools will have different behavior.

> 
> Linux should act on the CPER records first to determine if the host kernel can
> keep running. Once it has done this it can deliver signals to affected
> processes, but which signal and its properties depends on the CPER records.

 if want Qemu to handle this Error, I think qemu/kvmtools should know hardware 
error type, else it will be confused and do not know how to deal with.

> 
> The example here is BUS_MCEERR_AO and BUS_MCEERR_AR. These notify userspace 
> that
> si_addr_lsb bits of memory are corrupt at si_addr, this is either
> Action-Optional or Action-Required.
> 
> For arm64 we just needed to turn this code on, it already presents the minimum
> necessary information to user-space in an architecture-agnostic way. We didn't
> need to do anything to this code to support NOTIFY_SEA, the notification
> mechanism is irrelevant, this is all driven by the CPER records.
 we do not care the notification type, we are only care the hardware error type.
 if user-space do not know the error type, it can not record the CPER and can 
not inject the proper Error to guest OS.
 because record CPER and inject the Error to guest OS need this hardware error 
type. different Error type, there is
 different behavior.

 For different hardware error type, X86 Qemu/kvm tools code also have different 
behaviour in

> 
> If you have a class of error that isn't covered by the memory-failure code, 
> then
> we need to add something similar. This should be based on the CPER records, 
> and
> should work in exactly the same way for all processes on all ACPI platforms.
> 
> 
>> do you agree my solution for the SEI? thanks.
> 
> No, you are trying to notify userspace that firmware notified the host. This
> creates an ABI between EL3 firmware and EL0 user space that we can't possibly
> support.

you may misunderstand I mentioned solution here. I mean using memory-failure 
code to signal user space
for the SError(SEI), this way does not creates any ABI. SEA/SEI all use same 
method.
Qemu can judge the ESR to know the hardware type.


> 
> I think you've come to this because you are merging two steps together:
> 1. The OS uses the v8.2 RAS extensions to isolate errors and notify firmware.
> 2. If firmware has to tell the OS about the error, firmware generates CPER 
> records.
> 3. Firmware triggers the GHES notification mechanism for this error source.
> 4. Linux receives the notification and calls ghes_proc(), (if KVM gets the
> notification because a guest happened to be running, it should switch back to
> the host and arrange for ghes_proc() to be called).
> 5. ghes_proc() parses the CPER records and calls other kernel helpers to 
> handle
> the specific type of error, e.g. memory_failure().
> 6. If the helper knows the kernel can keep running, the error is visible to
> user-space and user space could do further processing to correct the error, an
> error-specific signal is sent.
> 7. User-space reloads the webpage, notifies the guest or whatever is 
> appropriate.
> 
> You are merging ste

Re: [PATCH] KVM: arm64: add esr_el2 and far_el2 to sysreg

2017-08-07 Thread gengdongjiu

Marc,

On 2017/8/8 0:56, Marc Zyngier wrote:
> On 07/08/17 17:23, gengdongjiu wrote:
>> Hi Marc,
>>   As James's suggestion, I move injection SEA Error logic to the user 
>> space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr
>> through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL 
>> bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped
>> instruction was a 16-bit or a 32-bit instruction, also it needs to set 
>> far_el1 using far_el2, because this is synchronization abort.
> 
> Usespace may need some fault information, but certainly not the full set
> of FAR_EL2/ESR_EL2. What it needs is a very small set of well defined
> information, properly abstracted, and not data that is completely
> private to the hypervisor.

Marc, just now I update the patch, may be use the vcpu->arch.fault.esr_el2 and 
vcpu->arch.fault.far_el2 to set the 
vcpu_sys_reg(vcpu,FAR_EL2)/vcpu_sys_reg(vcpu,ESR_EL2) can be better.
Now the user space can not directly get the vcpu->arch.faul. value, so need 
use vcpu_sys_reg to pass.



> 
> Thanks,
> 
>   M.
>>
>>
>>
>>
>> On 2017/8/7 23:57, Marc Zyngier wrote:
>>> +James, since he deals with all things RAS. Please keep him on CC at all
>>> times.
>>>
>>> On 07/08/17 17:08, Dongjiu Geng wrote:
>>>> For the firmware-first RAS solution, SEA and SEI is injected
>>>> by the user space, user space needs to know the esr_el2 and
>>>> far_el2's value, so add them to sysreg. user space uses
>>>> the IOCTL KVM_GET_ONE_REG can get their value.
>>>
>>> No.
>>>
>>> This has zero purpose being exposed to userspace. Userspace sees a VM
>>> that runs at EL1, and nothing else, so exposing EL2 registers doesn't
>>> make *any* sense.
>>>
>>> If you want something to be exposed to userspace, it has to be properly
>>> abstracted and describe something that is relevant to the VM. An EL2
>>> register satisfies none of these conditions.
>>>
>>>>
>>>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>>> Signed-off-by: Quanming Wu <wuquanm...@huawei.com>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_host.h | 6 --
>>>>  arch/arm64/kvm/sys_regs.c | 6 ++
>>>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index b6242fb..6063eec 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -103,10 +103,12 @@ enum vcpu_sysreg {
>>>>TTBR0_EL1,  /* Translation Table Base Register 0 */
>>>>TTBR1_EL1,  /* Translation Table Base Register 1 */
>>>>TCR_EL1,/* Translation Control Register */
>>>> -  ESR_EL1,/* Exception Syndrome Register */
>>>> +  ESR_EL1,/* Exception Syndrome Register for EL1 */
>>>> +  ESR_EL2,/* Exception Syndrome Register for EL2 */
>>>>AFSR0_EL1,  /* Auxiliary Fault Status Register 0 */
>>>>AFSR1_EL1,  /* Auxiliary Fault Status Register 1 */
>>>> -  FAR_EL1,/* Fault Address Register */
>>>> +  FAR_EL1,/* Fault Address Register for EL1 */
>>>> +  FAR_EL2,/* Fault Address Register for EL2 */
>>>>MAIR_EL1,   /* Memory Attribute Indirection Register */
>>>>VBAR_EL1,   /* Vector Base Address Register */
>>>>CONTEXTIDR_EL1, /* Context ID Register */
>>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>>> index 0e26f8c..0c286bf 100644
>>>> --- a/arch/arm64/kvm/sys_regs.c
>>>> +++ b/arch/arm64/kvm/sys_regs.c
>>>> @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>>>/* ESR_EL1 */
>>>>{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000),
>>>>  access_vm_reg, reset_unknown, ESR_EL1 },
>>>> +  /* ESR_EL2 */
>>>> +  { Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000),
>>>> +access_vm_reg, reset_unknown, ESR_EL2 },
>>>>/* FAR_EL1 */
>>>>{ Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b), Op2(0b000),
>>>>  access_vm_reg, reset_unknown, FAR_EL1 },
>>>> +  /* FAR_EL2 */
>>>> +  { Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b), Op2(0b000),
>>>> +access_vm_reg, reset_unknown, FAR_EL2 },
>>>>/* PAR_EL1 */
>>>>{ Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000),
>>>>  NULL, reset_unknown, PAR_EL1 },
>>>>
>>>
>>> Also, what do you return here? All you're doing to return to userspace
>>> is 0x1de7ec7edbadc0deULL (which perfectly matches this patch).
>>>
>>> So for all intents and purposes, this patch is pretty useless.
>>>
>>> Thanks,
>>>
>>> M.
>>>
>>
> 
>

Re: [PATCH v2] KVM: arm64: pass vcpu esr_el2 and far_el2 sysre to user space

2017-08-07 Thread gengdongjiu

Marc,

On 2017/8/8 3:07, Marc Zyngier wrote:
> So if you want that information, extract it, expose what is required,
> strictly what is required, and only when it is required.
> 
> In the meantime, I'm NAKing this patch, and any patch that will expose
> _EL2 registers outside of nested virtualization.
Thanks for your comments, I will follow your suggestion.

Re: [PATCH] KVM: arm64: add esr_el2 and far_el2 to sysreg

2017-08-07 Thread gengdongjiu

Hi Marc,
  As James's suggestion, I move injection SEA Error logic to the user 
space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr
through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL bit, 
it needs to refer to esr_el2.IL, else Qemu does not know the trapped
instruction was a 16-bit or a 32-bit instruction, also it needs to set far_el1 
using far_el2, because this is synchronization abort.




On 2017/8/7 23:57, Marc Zyngier wrote:
> +James, since he deals with all things RAS. Please keep him on CC at all
> times.
> 
> On 07/08/17 17:08, Dongjiu Geng wrote:
>> For the firmware-first RAS solution, SEA and SEI is injected
>> by the user space, user space needs to know the esr_el2 and
>> far_el2's value, so add them to sysreg. user space uses
>> the IOCTL KVM_GET_ONE_REG can get their value.
> 
> No.
> 
> This has zero purpose being exposed to userspace. Userspace sees a VM
> that runs at EL1, and nothing else, so exposing EL2 registers doesn't
> make *any* sense.
> 
> If you want something to be exposed to userspace, it has to be properly
> abstracted and describe something that is relevant to the VM. An EL2
> register satisfies none of these conditions.
> 
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Quanming Wu 
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 6 --
>>  arch/arm64/kvm/sys_regs.c | 6 ++
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index b6242fb..6063eec 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -103,10 +103,12 @@ enum vcpu_sysreg {
>>  TTBR0_EL1,  /* Translation Table Base Register 0 */
>>  TTBR1_EL1,  /* Translation Table Base Register 1 */
>>  TCR_EL1,/* Translation Control Register */
>> -ESR_EL1,/* Exception Syndrome Register */
>> +ESR_EL1,/* Exception Syndrome Register for EL1 */
>> +ESR_EL2,/* Exception Syndrome Register for EL2 */
>>  AFSR0_EL1,  /* Auxiliary Fault Status Register 0 */
>>  AFSR1_EL1,  /* Auxiliary Fault Status Register 1 */
>> -FAR_EL1,/* Fault Address Register */
>> +FAR_EL1,/* Fault Address Register for EL1 */
>> +FAR_EL2,/* Fault Address Register for EL2 */
>>  MAIR_EL1,   /* Memory Attribute Indirection Register */
>>  VBAR_EL1,   /* Vector Base Address Register */
>>  CONTEXTIDR_EL1, /* Context ID Register */
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 0e26f8c..0c286bf 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -987,9 +987,15 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>  /* ESR_EL1 */
>>  { Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0010), Op2(0b000),
>>access_vm_reg, reset_unknown, ESR_EL1 },
>> +/* ESR_EL2 */
>> +{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000),
>> +  access_vm_reg, reset_unknown, ESR_EL2 },
>>  /* FAR_EL1 */
>>  { Op0(0b11), Op1(0b000), CRn(0b0110), CRm(0b), Op2(0b000),
>>access_vm_reg, reset_unknown, FAR_EL1 },
>> +/* FAR_EL2 */
>> +{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b), Op2(0b000),
>> +  access_vm_reg, reset_unknown, FAR_EL2 },
>>  /* PAR_EL1 */
>>  { Op0(0b11), Op1(0b000), CRn(0b0111), CRm(0b0100), Op2(0b000),
>>NULL, reset_unknown, PAR_EL1 },
>>
> 
> Also, what do you return here? All you're doing to return to userspace
> is 0x1de7ec7edbadc0deULL (which perfectly matches this patch).
> 
> So for all intents and purposes, this patch is pretty useless.
> 
> Thanks,
> 
>   M.
>

Re: [PATCH] KVM: arm64: add esr_el2 and far_el2 to sysreg

2017-08-07 Thread gengdongjiu

Ok, thanks for James's confirmation.

Another question, For the SEI, I want to also use SIGBUS both for the KVM user 
and non-kvm user, if SEA and SEI Error all use the SIGBUS to notify user 
space(Qemu),
the user space(Qemu) will be confused, and do not know whether this is SEA or 
SEI error. so here I pass the sysreg ESR_EL2(vcpu->arch.fault.esr_el2) to the 
user space, let user space judges the (vcpu->arch.fault.esr_el2)'s value
to know this is a SEA or SEI Error. do you agree my solution for the SEI? 
thanks.

because the vcpu->arch.fault.esr_el2 can not directly passed to userspace, so I 
defined the vcpu->arch.fault.esr_el2 to sysreg ESR_EL2/FAR_EL2, sysreg register 
can pass to user space.

+   vcpu_sys_reg(vcpu,ESR_EL2) = kvm_vcpu_get_hsr(vcpu);
+   vcpu_sys_reg(vcpu,FAR_EL2) = kvm_vcpu_get_hfar(vcpu);

On 2017/8/8 0:59, James Morse wrote:
> Hi gengdongjiu,
> 
> On 07/08/17 17:23, gengdongjiu wrote:
>>   As James's suggestion, I move injection SEA Error logic to the user 
>> space(Qemu), Qemu sets the related guest OS esr/elr/pstate/spsr
> 
> (because for firmware-first its the CPER records that matter, and only QEMU
> knows where it reserved the memory for these, and what it told the guest it
> would use as the notification method).
> 
>> through IOCTL KVM_SET_ONE_REG. For the SEA, when Qemu sets the esr_el1.IL 
>> bit, it needs to refer to esr_el2.IL, else Qemu does not know the trapped
>> instruction was a 16-bit or a 32-bit instruction, also it needs to set 
>> far_el1 using far_el2, because this is synchronization abort.
> 
> The 32bit kernel doesn't support ACPI firmware first, and aarch64 doesn't
> support 16-bit instructions.
 thanks, so how about the SEA's error FAR_EL1's value? may be FAR_EL1's value 
get from FAR_EL2's value.

> 
> 
> James
> 
> 
> 
> .
>

Re: [PATCH v5 0/7] Add RAS virtualization support to SEA/SEI notification type

2017-08-22 Thread gengdongjiu

Jonathan,
   Thanks for the review, will correct the typo issue in the next patch version.


On 2017/8/22 15:54, Jonathan Cameron wrote:
> On Fri, 18 Aug 2017 22:11:50 +0800
> Dongjiu Geng  wrote:
> 
>> In the firmware-first RAS solution, corrupt data is detected in a
>> memory location when guest OS application software executing at EL0
>> or guest OS kernel El1 software are reading from the memory. The
>> memory node records errors in an error record accessible using
>> system registers.
>>
>> Because SCR_EL3.EA is 1, then CPU will trap to El3 firmware, EL3
>> firmware records the error to APEI table through reading system
>> register.
>>
>> Because the error was taken from a lower Exception leve, if the
> 
> leve -> level
> 
>> exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware
>> sets ESR_EL2/FAR_El to fake a exception trap to EL2, then
>> transfers to hypervisor.
>>
>> Hypervisor calls the momory failure to deal with this error, momory
> 
> momory -> memory
> 
> memory failure -> memory failure function? Or callback perhaps?
> 
>> failure read the APEI table and decide whether it needs to deliver
>> SIGBUS signal to user space, the advantage of using SIGBUS signal
>> to notify user space is that it can be compatible Non-Kvm users.
> 
> Seems like a good description to me. Thanks.
> 
> Jonathan
> 
>>
>> Dongjiu Geng (5):
>>   acpi: apei: Add SEI notification type support for ARMv8
>>   support user space to query RAS extension feature
>>   arm64: kvm: route synchronous external abort exceptions to el2
>>   KVM: arm/arm64: Allow get exception syndrome and
>>   arm64: kvm: handle SEI notification and inject virtual SError
>>
>> James Morse (1):
>>   KVM: arm64: Save ESR_EL2 on guest SError
>>
>> Xie XiuQi (1):
>>   arm64: cpufeature: Detect CPU RAS Extentions
>>
>>  arch/arm/include/asm/kvm_host.h  |  2 ++
>>  arch/arm/kvm/guest.c |  5 +++
>>  arch/arm64/Kconfig   | 16 ++
>>  arch/arm64/include/asm/barrier.h |  1 +
>>  arch/arm64/include/asm/cpucaps.h |  3 +-
>>  arch/arm64/include/asm/kvm_arm.h |  2 ++
>>  arch/arm64/include/asm/kvm_emulate.h | 17 ++
>>  arch/arm64/include/asm/kvm_host.h|  2 ++
>>  arch/arm64/include/asm/sysreg.h  |  5 +++
>>  arch/arm64/include/asm/system_misc.h |  1 +
>>  arch/arm64/include/uapi/asm/kvm.h|  5 +++
>>  arch/arm64/kernel/cpufeature.c   | 13 
>>  arch/arm64/kernel/process.c  |  3 ++
>>  arch/arm64/kvm/guest.c   | 48 +
>>  arch/arm64/kvm/handle_exit.c | 21 +++--
>>  arch/arm64/kvm/hyp/switch.c  | 29 +++--
>>  arch/arm64/kvm/reset.c   |  3 ++
>>  arch/arm64/mm/fault.c| 21 +++--
>>  drivers/acpi/apei/Kconfig| 15 +
>>  drivers/acpi/apei/ghes.c | 60 
>> +++-
>>  include/acpi/ghes.h  |  2 +-
>>  include/uapi/linux/kvm.h |  3 ++
>>  virt/kvm/arm/arm.c   |  7 +
>>  23 files changed, 254 insertions(+), 30 deletions(-)
>>
> 
> 
> .
>

Re: [PATCH v3 1/3] arm64: kvm: support kvmtool to detect RAS extension feature

2017-05-10 Thread gengdongjiu

Dear, James

On 2017/5/9 1:31, James Morse wrote:
> Hi gengdongjiu,
> 
> On 04/05/17 18:20, gengdongjiu wrote:
>>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>>> Handle kvmtool's detection for RAS extension, because sometimes
>>>> the APP needs to know the CPU's capacity
>>>
>>>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>>>> index d9e9697..1004039 100644
>>>> --- a/arch/arm64/kvm/reset.c
>>>> +++ b/arch/arm64/kvm/reset.c
>>>> @@ -64,6 +64,14 @@ static bool cpu_has_32bit_el1(void)
>>>>   return !!(pfr0 & 0x20);
>>>>  }
>>>>
>>>> +static bool kvm_arm_support_ras_extension(void)
>>>> +{
>>>> + u64 pfr0;
>>>> +
>>>> + pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
>>>> + return !!(pfr0 & 0x1000);
>>>> +}
>>>
>>> Why are we telling user-space that the CPU has RAS extensions? EL0 can't do
>>> anything with this and the guest EL1 can detect it from the id registers.
>>>
>>>
>>> Are you using this to decide whether or not to generate a HEST for the 
>>> guest?
>>
>> James, yes, it is.  my current user-space qemu EL0 patches indeed will
>> check the RAS  extensions.
>> if has the RAS extensions. for SEA, userspace qemu will generate the
>> CPER and inject the SEA to guest;
>> for SEI,  userspace qemu sets the virtual SEI with the specified
>> Syndrome(set the HCR_EL2.VSE and vsesr_el2 );
>> if not have RAS extensions, Qemu does nothing
> 
> But you can use APEI in a guest on CPUs without the RAS extensions: the host 
> may
> signal memory errors to Qemu for any number of reasons, user-space shouldn't
> care how it knows. Examples are PCI-AER, any APEI event notified by polling or
> one of the flavours of irq.
> 
> I would expect Qemu to generate a HEST based on its abilities, i.e. if it
> supports any mechanism of notifying the guest about errors. Choosing the
> mechanism then depends on the type of error.
> 
> Ideally the Qemu code for HEST/GHES/CPER generation code using some of the 
> irqs
> and polling could be shared with x86, as these should be possible using common
> KVM APIs.
Ok, got it.

> 
> 
>>> If Qemu/kvmtool supports handling memory-failure notifications from signals 
>>> you
>>> should always generate a HEST. The GHES notification method could be 
>>> anything
>>> Qemu can deliver to the guest using the KVM APIs. Notifications from Qemu 
>>> to the
>>> guest don't depend on the RAS extensions. KVM has APIs for IRQ and SEA (you 
>>> can
>>> use KVM_SET_ONE_REG).
>>
>> I will consider your suggestion to  always generate a CPER instead of
> 
> (generate a HEST, CPER are the runtime records. There are too many acronyms in
> this space!)
  thanks James's correction.

> 
>> relying on the RAS extensions, thanks
> 
> 
> Thanks,
> 
> James
> 
> 
> .
>

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-10 Thread gengdongjiu

Hi James,
  thanks a lot for your answer.

On 2017/5/9 1:28, James Morse wrote:
> Hi gengdongjiu,
> 
> On 04/05/17 17:52, gengdongjiu wrote:
>> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1...@gmail.com>:
>>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 105b6ab..a96594f 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> +static void kvm_handle_bad_page(unsigned long address,
>>>> + bool hugetlb, bool hwpoison)
>>>> +{
>>>> + /* handle both hwpoison and other synchronous external Abort */
>>>> + if (hwpoison)
>>>> + kvm_send_signal(address, hugetlb, true);
>>>> + else
>>>> + kvm_send_signal(address, hugetlb, false);
>>>> +}
>>>
>>> Why the extra level of indirection? We only want to signal userspace like 
>>> this
>>> from KVM for hwpoison. Signals for RAS related reasons should come from the 
>>> bits
>>> of the kernel that decoded the error.
>>
>> For the SEA, the are maily two types:
>> 0b01 Synchronous External Abort on memory access.
>> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
>> encode the level.
> 
> (KVM shouldn't have to make decisions about this)
> 
> 
>> hwpoison should belong to the  "Synchronous External Abort on memory access"
>> if the SEA type is not hwpoison, such as page table walk, do you mean
>> KVM do not deliver the SIGBUS?
> 
> 
> The flow of events should be SEI/SEA from firmware to the hosts's APEI code. 
> KVM
> should only be involved to get us back to the host if we were running a guest.
> The APEI/hwpoison code may cause a set of processes to be sent signals. The 
> code
> in mm/memory-failure.c does this by walking the process rmaps using the 
> physical
> addresses in the CPER records.
> 
> We want user space to be sent signals as this can (and should) work in exactly
> the same way on arm64 as it does on x86 or any other architecture. If a
> web-browser can handle SIGBUS notifications for memory-corruption, it 
> shouldn't
> have to care what architecture it is running on.
 Ok, James, understand.

> 
> So what is that KVM+SIGBUS patch about?...
> 
>>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two 
>>> users,
>>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
> 
> KVM creates guests as if they were additional users of Qemu's memory. The code
> in mm/memory-failure.c may find that Qemu didn't have the affected page mapped
> to user-space - but it may have been in use by stage2.
> 
> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when 
> the
> guest touches the hwpoison page as if Qemu had touched the page itself.
> 
> Signals from KVM is a corner case, for firmware-first decisions should happen 
> in
> the APEI code based on CPER records.
> 
> 
>> If so, how the KVM handle the SEA type other than hwpoison?
> 
> To deliver to a guest? It shouldn't have to know, user space should use a KVM
> API to drive this.
> 
> When received from hardware? It shouldn't have to care, these things should be
> passed into the APEI code for handling. KVM just needs to put the host 
> registers
> back.
Recently I confirmed with the hardware team. they said almost all the SEA 
errors have the
Poison flag, so may be there is no need to consider other SEA errors other than 
 hwPoison.
only consider SEA hwpoison errors can be enough.

> 
> 
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index bb02909..1d2e2e7 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct 
>>>> kvm_s390_irq_state)
>>>>  /* Available with KVM_CAP_X86_SMM */
>>>>  #define KVM_SMI   _IO(KVMIO,   0xb7)
>>>> +#define KVM_ARM_SEA   _IO(KVMIO,   0xb8)
>>>>
>>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>>>
>>>
>>> Why do we need a userspace API for SEA? It can also be done by using
>>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it 
>>> this
>>> way is you can choose which ESR value to use.
>>>
>>> Adding a new API call to do something you could do with an ol

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-10 Thread gengdongjiu

Thanks James's explanation.

Hi Christoffer,

On 2017/5/9 22:28, James Morse wrote:
> Hi Christoffer,
> 
> On 08/05/17 18:54, Christoffer Dall wrote:
>> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:
>> I must admit I am losing track of exactly what this proposed API was
>> supposed to do.
> 
> There are two, and we keep jumping between them!
> This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'.
> 
> SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these
> today using the KVM_GET/SET_ONE_REG API whenever it wants to.
> 
> SEI uses SError, is asynchronous and can be masked. In addition these need to 
> be
> consumed/synchronised by the ESB instruction, even when executed by a guest.
> Hardware has the necessary bits to drive all this, we need to expose an API to
> drive it.
> 
> (I try to spell them out each time so I don't confuse SEI with something
> synchronous!)
> 
> 
> This patch was about SEA. I think you've answered our question:

we are talking about the SEA(synchronous data abort) injection two methods:

(1)change vcpu registers in the Qemu/kvmtools and using the KVM_GET/SET_ONE_REG 
API to set.
(2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL 
command from Qemu.


> 
>> However, if it's a question about setting up VCPU registers to a certain
>> state and potentially modifying memory, then I think experience has
>> shown us (psci) that emulating something in the kernel that userspace
>> can have fine-grained control over is a bad idea, and should be left to
>> userspace using as generic APIs as possible.
>>
>> Furthermore, if I understand what injecting a SEA requires, it is very
>> similar to resetting the CPU and loading data into guest memory, which
>> QEMU already does today, and there is no reason to introduce additional
>> APIs if it can be done using KVM_GET/SET_ONE_REG ioctls.
> 
> 
> Thanks,
> 
> James
> 
> .
>

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-10 Thread gengdongjiu

Hi Christoffer,

On 2017/5/10 20:20, Christoffer Dall wrote:
> On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote:
>> Thanks James's explanation.
>>
>> Hi Christoffer,
>>
>> On 2017/5/9 22:28, James Morse wrote:
>>> Hi Christoffer,
>>>
>>> On 08/05/17 18:54, Christoffer Dall wrote:
>>>> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:
>>>> I must admit I am losing track of exactly what this proposed API was
>>>> supposed to do.
>>>
>>> There are two, and we keep jumping between them!
>>> This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'.
>>>
>>> SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject 
>>> these
>>> today using the KVM_GET/SET_ONE_REG API whenever it wants to.
>>>
>>> SEI uses SError, is asynchronous and can be masked. In addition these need 
>>> to be
>>> consumed/synchronised by the ESB instruction, even when executed by a guest.
>>> Hardware has the necessary bits to drive all this, we need to expose an API 
>>> to
>>> drive it.
>>>
>>> (I try to spell them out each time so I don't confuse SEI with something
>>> synchronous!)
>>>
>>>
>>> This patch was about SEA. I think you've answered our question:
>>
>> we are talking about the SEA(synchronous data abort) injection two methods:
>>
>> (1)change vcpu registers in the Qemu/kvmtools and using the 
>> KVM_GET/SET_ONE_REG API to set.
> 
> Yes, if this is possible, why would you want something more?
  we will use this method.

> 
>> (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL 
>> command from Qemu.
>>
> 
> I'm not really going to consider this, because "use internal API from
> userspace" doesn't work.
> 
> So this should be:
> 
>   (2) Introduce a new API to do X.
  you can ignore the second method, now we will not use it.

> 
> I still think you know what my preference is; use the existing API if at
> all possible.
> 
> Thanks,
> -Christoffer
> 
> .
>

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-21 Thread gengdongjiu

Hi James,
sorry for the late response due to recently verify and debug the
RAS solution.

2017-05-13 1:24 GMT+08:00, James Morse <james.mo...@arm.com>:
> Hi gengdongjiu,
>
> On 05/05/17 13:31, gengdongjiu wrote:
>> when guest OS happen an SEA, My current solution is shown below:
>>
>> (1) host EL3 firmware firstly handle the SEA error and generate the CPER
>> record.
>> (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3,
>> far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2.
>
> Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm
> sure
> you exclude values from EL3 or the secure-world, we should never hand those
> to
> the normal world.

 it is sure that needs to  exclude the EL3 Error and secure-world.

>
>
>> (3) then jump the EL2 hypervisor
>
>> so the EL2 hypervisor uses the ESR that come from esr_el3,  here the
>> ESR(esr_el3) value may be different with the exist KVM API's ESR.
>
> The ESR may be different between EL3 and EL2. The ESR contains the severity
> of
> the event, the CPU will choose this when it takes the SError to EL3. If it
> had
> taken the SError to EL2, the CPU may have classified the error differently.
>
> Firmware may need to generate a more severe ESR if it receives an error
> that
> would be propagated by delivering SEI to a lower exception level, for
> example if
> an EL2 system register is 'infected'.
>
> This is the same for Qemu/kvmtool. A contained error at EL2 may be an
> uncontained error if we hand it to guest EL1. Linux's RAS code will decide
> this
> with its choice of signal to send, (and possibly which code to set).
> Qemu/kvmtool need to choose an appropriate APEI notification, which may
> involve
> generating a relevant ESR.
>
> Also relevant is the problem we discussed earlier with trying to deliver
> fake
> Physical-SError from software at EL3: If the SError is routed to EL2, and
> EL2
> has PSTATE.A masked, EL3 has to wait and try again later. This is another
> case
> where firmware may have to upgrade the classification of an error to
> uncontainable.
   it makes sense. thanks to James.

>
>
> Thanks,
>
> James
>

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-21 Thread gengdongjiu

2017-05-13 1:25 GMT+08:00, James Morse <james.mo...@arm.com>:
> Hi gengdongjiu,
>
> On 10/05/17 09:44, gengdongjiu wrote:
>> On 2017/5/9 1:28, James Morse wrote:
>>>>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two
>>>>> users,
>>>>> Qemu and KVM. This isn't the example of how user-space gets
>>>>> signalled.)
>>>
>>> KVM creates guests as if they were additional users of Qemu's memory. The
>>> code
>>> in mm/memory-failure.c may find that Qemu didn't have the affected page
>>> mapped
>>> to user-space - but it may have been in use by stage2.
>>>
>>> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal
>>> when the
>>> guest touches the hwpoison page as if Qemu had touched the page itself.
>>>
>>> Signals from KVM is a corner case, for firmware-first decisions should
>>> happen in
>>> the APEI code based on CPER records.
>
>>>> If so, how the KVM handle the SEA type other than hwpoison?
>
>>> To deliver to a guest? It shouldn't have to know, user space should use a
>>> KVM
>>> API to drive this.
>>>
>>> When received from hardware? It shouldn't have to care, these things
>>> should be
>>> passed into the APEI code for handling. KVM just needs to put the host
>>> registers
>>> back.
>
>> Recently I confirmed with the hardware team. they said almost all the SEA
>> errors have the
>> Poison flag, so may be there is no need to consider other SEA errors other
>> than  hwPoison.
>> only consider SEA hwpoison errors can be enough.
>
> We should be careful here, by hwpoison I meant the Linux feature.
> From Documentation/vm/hwpoison.txt:
>> Upcoming Intel CPUs have support for recovering from some memory errors
>> (``MCA recovery''). This requires the OS to declare a page "poisoned",
>> kill the processes associated with it and avoid using it in the future.
>
> We were talking about KVM's reaction to 'the OS declaring a page poisoned'.
> Lets try to call this one memory-failure, as that is its Kconfig name. (now
> I
> understand why we've been confusing each other!)
>
> Your hwpoison looks like something the CPU reports in the ERRSTATUS
> registers
> (4.6.10 of DDI0587). This is something firmware should read, then describe
> to
> the OS via CPER records. Depending on these CPER records linux may invoke
> its
> memory-failure code.
  yes

>
>
>>>> injection a SEA is no more than setting some registers: elr_el1, PC,
>>>> PSTATE, SPSR_el1, far_el1, esr_el1
>>>> I seen this KVM API do the same thing as Qemu.  do you found call this
>>>> API will have issue and necessary to choose another ESR value?
>>>
>>> Should we let user-space pick the ESR to deliver to the guest? Yes,
>>> letting
>>> user-space specify the ESR gives the most flexibility to do something
>>> clever in
>>> the future. An obvious choice for SEA is between the external-abort and
>>> 'parity
>>> or ECC error' codes. If we tell user-space which of these happened (I
>>> don't
>>> think Linux does today) then Qemu can relay that information to the
>>> guest.
>
>> may be the ESR is delivered by the KVM.
>> (1) guest OS EL0 happen SEA due to hwpoison
>> (2) CPU traps to EL3 firmware, and update the ESR_EL3
>> (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2
>> (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the
>> SEA.
>>
>> May be the esr_el2 can provide the accurate error information.
>> or do you think user-space specify the ESR instead of esr_el2 is better?
>
> I think the severity needs to be considered as the notification is handled
> by
> each exception level. There are cases where it will need to be upgraded
> from
> 'contained' to 'uncontained'. (more discussion on another part of the
> thread).
  understand it.

>
>
> Thanks,
>
> James
>

Re: [Qemu-devel] [PATCH v3 1/4] ACPI: Add APEI GHES Table Generation support

2017-05-29 Thread gengdongjiu

Dear Laszlo,
  Thank your very much for your review and detailed comment. and very sorry for 
the late response due to recently debug the wholes RAS solution. 

On 2017/5/22 22:23, Laszlo Ersek wrote:
> Keeping some context:
> 
> On 05/12/17 23:00, Laszlo Ersek wrote:
>> On 04/30/17 07:35, Dongjiu Geng wrote:
>>> This implements APEI GHES Table by passing the error cper info to 
>>> the guest via a fw_cfg_blob. After a CPER info is added, an SEA/SEI 
>>> exception will be injected into the guest OS.
>>>
>>> Below is the table layout, the max number of error soure is 11, 
>>> which is classified by notification type.
>>>
>>> etc/acpi/tables etc/hardware_errors
>>>  ==
>>>  +---+
>>> +--+ | address   | +-> +--+
>>> |HEST  + | registers | |   | Error Status |
>>> + ++ | +-+ |   | Data Block 1 |
>>> | | GHES1  | --> | |address1 | +   | ++
>>> | | GHES2  | --> | |address2 | --+ | |  CPER  |
>>> | | GHES3  | --> | |address3 | + | | |  CPER  |
>>> | |    | --> | | ... | | | | |  CPER  |
>>> | | GHES10 | --> | |address10| -+  | | | |  CPER  |
>>> +-++ +-+-+  |  | | +-++
>>> |  | |
>>> |  | +---> +--+
>>> |  |   | Error Status |
>>> |  |   | Data Block 2 |
>>> |  |   | ++
>>> |  |   | |  CPER  |
>>> |  |   | |  CPER  |
>>> |  |   +-++
>>> |  |
>>> |  +-> +--+
>>> |  | Error Status |
>>> |  | Data Block 3 |
>>> |  | ++
>>> |  | |  CPER  |
>>> |  +-++
>>> |...
>>> +> +--+
>>>| Error Status |
>>>| Data Block 10|
>>>| ++
>>>| |  CPER  |
>>>| |  CPER  |
>>>| |  CPER  |
>>>+-++
>>>
>>> Signed-off-by: Dongjiu Geng 
>>> ---
>>>  default-configs/arm-softmmu.mak |   1 +
>>>  hw/acpi/Makefile.objs   |   1 +
>>>  hw/acpi/aml-build.c |   2 +
>>>  hw/acpi/hest_ghes.c | 203 +++
>>>  hw/arm/virt-acpi-build.c|   6 ++
>>>  include/hw/acpi/acpi-defs.h | 227 
>>> 
>>>  include/hw/acpi/aml-build.h |   1 +
>>>  include/hw/acpi/hest_ghes.h |  43 
>>>  8 files changed, 484 insertions(+)
>>>  create mode 100644 hw/acpi/hest_ghes.c  create mode 100644 
>>> include/hw/acpi/hest_ghes.h
> 
>> Next file:
>>
>>> diff --git a/include/hw/acpi/hest_ghes.h 
>>> b/include/hw/acpi/hest_ghes.h new file mode 100644 index 
>>> 000..0cadc2b
>>> --- /dev/null
>>> +++ b/include/hw/acpi/hest_ghes.h
>>> @@ -0,0 +1,43 @@
>>> +#ifndef ACPI_GHES_H
>>> +#define ACPI_GHES_H
>>> +
>>> +#include "hw/acpi/bios-linker-loader.h"
>>> +
>>> +#define GHES_ERRORS_FW_CFG_FILE  "etc/hardware_errors"
>>> +#define GHES_DATA_ADDR_FW_CFG_FILE  "etc/hardware_errors_addr"
>>> +
>>> +#define GAS_ADDRESS_OFFSET  4
>>> +#define ERROR_STATUS_ADDRESS_OFFSET 20
>>> +#define NOTIFICATION_STRUCTURE  32
>>> +
>>> +#define BFAPEI_OK   0
>>> +#define BFAPEI_FAIL 1
>>> +
>>> +/* The max number of error source, the error sources
>>> + * are classified by notification type, below is the definition
>>> + * 0 - Polled
>>> + * 1 - External Interrupt
>>> + * 2 - Local Interrupt
>>> + * 3 - SCI
>>> + * 4 - NMI
>>> + * 5 - CMCI
>>> + * 6 - MCE
>>> + * 7 - GPIO-Signal
>>> + * 8 - ARMv8 SEA
>>> + * 9 - ARMv8 SEI
>>> + * 10 - External Interrupt - GSIV
>>> + */
>>> +#define MAX_ERROR_SOURCE_COUNT_V6   11
>>
>> I'll have to review this header file more thoroughly, once I see the 
>> code that references these macros. For now, I have one comment:
>>
>> (42) I think the notification type list should be removed from this 
>> location. Also,

Re: [Qemu-devel] [PATCH v3 1/4] ACPI: Add APEI GHES Table Generation support

2017-05-30 Thread gengdongjiu

Laszlo,
   very sorry for that, it was my mistake that missing your name.
when I reply mail, I copy the "CC" list to the mail reply list, but forget to 
copy the "To" list.
I will check your comments in detailed later and reply you. thanks again.



On 2017/5/30 0:03, Laszlo Ersek wrote:
> Hi,
> 
> did you remove me from the To: / Cc: list intentionally, or was that an
> oversight? I caught your message in my list folders only by luck.
> 
> Some followup below:
> 
> On 05/29/17 17:27, gengdongjiu wrote:
> 
>>> (46) What is "physical_addr" good for? Below I can only see an 
>>> assignment to it, in ghes_update_guest(). Where is the field read?
> 
>> this "physical_addr" address is the physical error address in the
>> CPER. such as the physical address that happen hwpoison, this address
>> is delivered by the KVM and QEMU transfer this address to physical.
> I understand that in the ghes_update_guest() function, you accept a
> parameter called "physical_address", and you pass it on to
> ghes_generate_cper_record(). That makes sense, yes.
> 
> However, you also assign the same value to "ges.physical_addr". And that
> structure field is never read. So my point is that the
> "GhesErrorState.physical_addr" field is superfluous and should be removed.
> 
> I checked the other three patches in the series and they don't seem to
> read that structure member either. Correct me if I'm wrong.
> 
>>> (55) What happens if you run out of the preallocated memory?
> 
>> if it run out of the preallocated memory. it will overwrite other 
>> error source. every block's size is fixed. so it does not easy
>> dynamically extend the size if it is overflow. Anyway I will add a
>> error report if it happens overwrite.
> I understand (and agree) that dynamic allocation is not possible here.
> 
> But that doesn't justify overwriting the error status data block that
> belongs to a different data source. (Worse, if this happens with the
> last error status data block, for error source 10, you could overwrite
> memory that belongs to the OS.)
> 
> If an error status data block becomes full, then we should either wrap
> back to the start of the same data block, or else stop forwarding errors
> for that error source.
> 
> Does the ACPI spec say anything about this? I.e., about the case when
> the system runs out of the memory that was reserved for recording
> hardware errors?
> 
>>>>> +
>>>>> +mem_err = (struct cper_sec_mem_err *) (gdata + 1);
>>>>> +
>>>>> +/* In order to simplify simulation, hardcode the CPER section to 
>>>>> memory
>>>>> + * section.
>>>>> + */
>>>>> +mem_err->validation_bits |= CPER_MEM_VALID_ERROR_TYPE;
>>>>> +mem_err->error_type = 3;
>>>
>>> (58) Is this supposed to stand for "Multi-bit ECC" (from "N.2.5 Memory 
>>> Error Section" in UEFI 2.6)? Should we have a macro for that?
> 
>> Yes, it is. What do you mean a macro?
> 
> A #define for the integer value 3.
> 
>> For all the errors that happen in the guest OS, in order to simulate
>> easy, I abstract all the error section to memory section, even though
>> the error section is processor or other section.
> Why is that a valid thing to do? (I'm not doubting it is valid, I'm just
> asking.) Will that not confuse the ACPI subsystem of the guest OS?
> 
>> I do not know whether do you have some suggestion for that.
> Well I would have thought (without any expertise on the subject) that
> hardware errors from the host side should be mapped to the guest more or
> less "type correct". IOW, it looks strange that, say, a CPU error is
> reported as a memory error. But this is just an uneducated guess.
> 
>>>>> +mem_err->validation_bits |= CPER_MEM_VALID_CARD | 
>>>>> CPER_MEM_VALID_MODULE |
>>>>> +CPER_MEM_VALID_BANK | CPER_MEM_VALID_ROW |
>>>>> +CPER_MEM_VALID_COLUMN | CPER_MEM_VALID_BIT_POSITION;
>>>>> +mem_err->card = 1;
>>>>> +mem_err->module = 2;
>>>>> +mem_err->bank = 3;
>>>>> +mem_err->row = 1;
>>>>> +mem_err->column = 2;
>>>>> +mem_err->bit_pos = 5;
>>>
>>> (60) I have no idea where these values come from.
> 
>> For all the errors that happen in the guest OS, in order to simulate
>> easy, I abstract all the error section to memory section, and hard
>> code the memory section error value as above.
> Sure, but why is that safe? Will the guest OS not want to do something
> about these error details? If we are feeding the guest OS invalid error
> details, will that not lead to confusion?
> 
>>> (64) What does "reqr" stand for?
>> It stand for the request size.
> Can you please call it "req_size" or something similar? The English
> expression
> 
>   request size
> 
> contains only one "r" letter, so it's hard to understand where the
> second "r" in "reqr" comes from.
> 
> Thanks,
> Laszlo
> 
> .
>

Re: [PATCH v3 1/3] arm64: kvm: support kvmtool to detect RAS extension feature

2017-05-02 Thread gengdongjiu

Hi Christoffer,
   thanks for your review and comments.

On 2017/5/2 15:56, Christoffer Dall wrote:
> Hi Dongjiu,
> 
> Please send a cover letter for patch series with more than a single
> patch.
 OK, got it.

> 
> The subject and description of these patches are also misleading.
> Hopefully this is in no way tied to kvmtool, but to userspace
> generically, for example also to be used by QEMU?
> 
> On Sun, Apr 30, 2017 at 01:37:55PM +0800, Dongjiu Geng wrote:
>> Handle kvmtool's detection for RAS extension, because sometimes
>> the APP needs to know the CPU's capacity
> 
> the APP ?
> 
> the CPU's capacity?
I will fix it.

> 
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>>  arch/arm64/kvm/reset.c   | 11 +++
>>  include/uapi/linux/kvm.h |  1 +
>>  2 files changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index d9e9697..1004039 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -64,6 +64,14 @@ static bool cpu_has_32bit_el1(void)
>>  return !!(pfr0 & 0x20);
>>  }
>>  
>> +static bool kvm_arm_support_ras_extension(void)
>> +{
>> +u64 pfr0;
>> +
>> +pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
>> +return !!(pfr0 & 0x1000);
>> +}
>> +
>>  /**
>>   * kvm_arch_dev_ioctl_check_extension
>>   *
>> @@ -87,6 +95,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_RAS_EXTENSION:
>> +r = kvm_arm_support_ras_extension();
>> +break;
> 
> You need to document this capability and API in
> Documentation/virtual/kvm/api.txt and explain how this works.
 Ok, thanks for your suggestion.

> 
> 
> 
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index f51d508..27fe556 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -883,6 +883,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_PPC_MMU_RADIX 134
>>  #define KVM_CAP_PPC_MMU_HASH_V3 135
>>  #define KVM_CAP_IMMEDIATE_EXIT 136
>> +#define KVM_CAP_ARM_RAS_EXTENSION 137
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> -- 
>> 2.10.1
>>
> 
> Thanks,
> -Christoffer
> 
> .
>

Re: [PATCH v3 2/3] arm64: kvm: inject SError with virtual syndrome

2017-05-02 Thread gengdongjiu

Hello Christoffer.


On 2017/5/2 16:03, Christoffer Dall wrote:
> On Sun, Apr 30, 2017 at 01:37:56PM +0800, Dongjiu Geng wrote:
>> when SError happen, kvm notifies kvmtool to generate GHES table
>> to record the error, then kvmtools inject the SError with specified
> 
> again, is this really specific to kvmtool?  Pleae try to explain this
> mechanism in generic terms.
  It is both for qemu and other userspace application. I will correct it.

> 
>> virtual syndrome. when switch to guest, a virtual SError will happen with
>> this specified syndrome.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>>  arch/arm64/include/asm/esr.h |  2 ++
>>  arch/arm64/include/asm/kvm_emulate.h | 10 ++
>>  arch/arm64/include/asm/kvm_host.h|  1 +
>>  arch/arm64/include/asm/sysreg.h  |  3 +++
>>  arch/arm64/kvm/handle_exit.c | 25 +++--
>>  arch/arm64/kvm/hyp/switch.c  | 15 ++-
>>  include/uapi/linux/kvm.h |  5 +
>>  7 files changed, 54 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 22f9c90..d009c99 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -127,6 +127,8 @@
>>  #define ESR_ELx_WFx_ISS_WFE (UL(1) << 0)
>>  #define ESR_ELx_xVC_IMM_MASK((1UL << 16) - 1)
>>  
>> +#define VSESR_ELx_IDS_ISS_MASK((1UL << 25) - 1)
>> +
>>  /* ESR value templates for specific events */
>>  
>>  /* BRK instruction trap from AArch64 state */
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index f5ea0ba..a3259a9 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -148,6 +148,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct 
>> kvm_vcpu *vcpu)
>>  return vcpu->arch.fault.esr_el2;
>>  }
>>  
>> +static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.fault.vsesr_el2;
>> +}
>> +
>> +static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long 
>> val)
>> +{
>> +vcpu->arch.fault.vsesr_el2 = val;
>> +}
>> +
>>  static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
>>  {
>>  u32 esr = kvm_vcpu_get_hsr(vcpu);
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index e7705e7..84ed239 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -86,6 +86,7 @@ struct kvm_vcpu_fault_info {
>>  u32 esr_el2;/* Hyp Syndrom Register */
>>  u64 far_el2;/* Hyp Fault Address Register */
>>  u64 hpfar_el2;  /* Hyp IPA Fault Address Register */
>> +u32 vsesr_el2;  /* Virtual SError Exception Syndrome Register */
>>  };
>>  
>>  /*
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index 32964c7..b6afb7a 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -125,6 +125,9 @@
>>  #define REG_PSTATE_PAN_IMM  sys_reg(0, 0, 4, 0, 4)
>>  #define REG_PSTATE_UAO_IMM  sys_reg(0, 0, 4, 0, 3)
>>  
>> +#define VSESR_EL2   sys_reg(3, 4, 5, 2, 3)
>> +
>> +
>>  #define SET_PSTATE_PAN(x) __emit_inst(0xd500 | REG_PSTATE_PAN_IMM | 
>> \
>>(!!x)<<8 | 0x1f)
>>  #define SET_PSTATE_UAO(x) __emit_inst(0xd500 | REG_PSTATE_UAO_IMM | 
>> \
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index c89d83a..3d024a9 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -180,7 +180,11 @@ static exit_handle_fn kvm_get_exit_handler(struct 
>> kvm_vcpu *vcpu)
>>  
>>  static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -unsigned long fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>> +unsigned long hva, fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>> +struct kvm_memory_slot *memslot;
>> +int hsr, ret = 1;
>> +bool writable;
>> +gfn_t gfn;
>>  
>>  if (handle_guest_sei((unsigned long)fault_ipa,
>>  kvm_vcpu_get_hsr(vcpu))) {
>> @@ -190,9 +194,20 @@ static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>  (unsigned long)kvm_vcpu_get_hsr(vcpu));
>>  
>>  kvm_inject_vabt(vcpu);
>> +} else {
>> +hsr = kvm_vcpu_get_hsr(vcpu);
>> +
>> +gfn = fault_ipa >> PAGE_SHIFT;
>> +memslot = gfn_to_memslot(vcpu->kvm, gfn);
>> +hva = gfn_to_hva_memslot_prot(memslot, gfn, );
>> +
>> +run->exit_reason = KVM_EXIT_INTR;
>> +run->intr.syndrome_info = hsr;
>> +run->intr.address = hva;
>> +ret = 0;
>>  }
>>  
>> -return 0;
>> +return ret;
>>  }
>>  
>>  /*
>> @@ -218,8 +233,7 @@ int

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-05 Thread gengdongjiu

HI James,

2017-05-05 0:52 GMT+08:00 gengdongjiu <gengdj.1...@gmail.com>:
> Dear James,
>Thanks a lot for your review and comments. I am very sorry for the
> late response.
>
>
> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1...@gmail.com>:
>>  Hi Dongjiu Geng,
>>
>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>> when happen SEA, deliver signal bus and handle the ioctl that
>>> inject SEA abort to guest, so that guest can handle the SEA error.
>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 105b6ab..a96594f 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -20,8 +20,10 @@
>>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct 
>>> kvm_vcpu *vcpu, kvm_pfn_t pfn,
>>>   __coherent_cache_guest_page(vcpu, pfn, size);
>>>  }
>>>
>>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool 
>>> hwpoison)
>>> +{
>>> + siginfo_t info;
>>> +
>>> + info.si_signo   = SIGBUS;
>>> + info.si_errno   = 0;
>>> + if (hwpoison)
>>> + info.si_code= BUS_MCEERR_AR;
>>> + else
>>> + info.si_code= 0;
>>> +
>>> + info.si_addr= (void __user *)address;
>>> + if (hugetlb)
>>> + info.si_addr_lsb = PMD_SHIFT;
>>> + else
>>> + info.si_addr_lsb = PAGE_SHIFT;
>>> +
>>> + send_sig_info(SIGBUS, , current);
>>> +}
>>> +
>> «  [hide part of quote]
>>
>> Punit reviewed the other version of this patch, this PMD_SHIFT is not the 
>> right
>> thing to do, it needs a more accurate set of calls and shifts as there may be
>> hugetlbfs pages other than PMD_SIZE.
>>
>> https://www.spinics.net/lists/arm-kernel/msg568919.html
>>
>> I haven't posted a new version of that patch because I was still hunting a 
>> bug
>> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT
>> returned to userspace instead of this hwpoison code being invoked.
>
>   Ok, got it, thanks for your information.
>>
>> Please avoid duplicating functionality between patches, it wastes reviewers
>> time, especially when we know there are problems with this approach.
>>
>>
>>> +static void kvm_handle_bad_page(unsigned long address,
>>> + bool hugetlb, bool hwpoison)
>>> +{
>>> + /* handle both hwpoison and other synchronous external Abort */
>>> + if (hwpoison)
>>> + kvm_send_signal(address, hugetlb, true);
>>> + else
>>> + kvm_send_signal(address, hugetlb, false);
>>> +}
>>
>> Why the extra level of indirection? We only want to signal userspace like 
>> this
>> from KVM for hwpoison. Signals for RAS related reasons should come from the 
>> bits
>> of the kernel that decoded the error.
>
> For the SEA, the are maily two types:
> 0b01 Synchronous External Abort on memory access.
> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
> encode the level.
>
> hwpoison should belong to the  "Synchronous External Abort on memory access"
> if the SEA type is not hwpoison, such as page table walk, do you mean
> KVM do not deliver the SIGBUS?
> If so, how the KVM handle the SEA type other than hwpoison?
>
>>
>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two 
>> users,
>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>>
>>
>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>>> index b37446a..780e3c4 100644
>>> --- a/arch/arm64/kvm/guest.c
>>> +++ b/arch/arm64/kvm/guest.c
>>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu 
>>> *vcpu,
>>>   return -EINVAL;
>>>  }
>>>
>>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu)
>>> +{
>>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>>> +
>>> + return 0;
>>> +}
>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index bb02909..1d2e2e7 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct 
>>> kvm_s390_irq_state)
>>>  /* Available with KVM_CAP_X86_SMM */
>>>  #define KVM_SMI   _IO(KVMIO,   0xb7)
>>> +#define KVM_ARM_SEA   _IO(KVMIO,   0xb8)
>>>
>>>  #define

Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

2017-05-04 Thread gengdongjiu

Dear James,
   Thanks a lot for your review and comments. I am very sorry for the
late response.


2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1...@gmail.com>:
>  Hi Dongjiu Geng,
>
> On 30/04/17 06:37, Dongjiu Geng wrote:
>> when happen SEA, deliver signal bus and handle the ioctl that
>> inject SEA abort to guest, so that guest can handle the SEA error.
>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 105b6ab..a96594f 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -20,8 +20,10 @@
>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu 
>> *vcpu, kvm_pfn_t pfn,
>>   __coherent_cache_guest_page(vcpu, pfn, size);
>>  }
>>
>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool 
>> hwpoison)
>> +{
>> + siginfo_t info;
>> +
>> + info.si_signo   = SIGBUS;
>> + info.si_errno   = 0;
>> + if (hwpoison)
>> + info.si_code= BUS_MCEERR_AR;
>> + else
>> + info.si_code= 0;
>> +
>> + info.si_addr= (void __user *)address;
>> + if (hugetlb)
>> + info.si_addr_lsb = PMD_SHIFT;
>> + else
>> + info.si_addr_lsb = PAGE_SHIFT;
>> +
>> + send_sig_info(SIGBUS, , current);
>> +}
>> +
> «  [hide part of quote]
>
> Punit reviewed the other version of this patch, this PMD_SHIFT is not the 
> right
> thing to do, it needs a more accurate set of calls and shifts as there may be
> hugetlbfs pages other than PMD_SIZE.
>
> https://www.spinics.net/lists/arm-kernel/msg568919.html
>
> I haven't posted a new version of that patch because I was still hunting a bug
> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT
> returned to userspace instead of this hwpoison code being invoked.

  Ok, got it, thanks for your information.
>
> Please avoid duplicating functionality between patches, it wastes reviewers
> time, especially when we know there are problems with this approach.
>
>
>> +static void kvm_handle_bad_page(unsigned long address,
>> + bool hugetlb, bool hwpoison)
>> +{
>> + /* handle both hwpoison and other synchronous external Abort */
>> + if (hwpoison)
>> + kvm_send_signal(address, hugetlb, true);
>> + else
>> + kvm_send_signal(address, hugetlb, false);
>> +}
>
> Why the extra level of indirection? We only want to signal userspace like this
> from KVM for hwpoison. Signals for RAS related reasons should come from the 
> bits
> of the kernel that decoded the error.

For the SEA, the are maily two types:
0b01 Synchronous External Abort on memory access.
0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
encode the level.

hwpoison should belong to the  "Synchronous External Abort on memory access"
if the SEA type is not hwpoison, such as page table walk, do you mean
KVM do not deliver the SIGBUS?
If so, how the KVM handle the SEA type other than hwpoison?

>
> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>
>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index b37446a..780e3c4 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>   return -EINVAL;
>>  }
>>
>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu)
>> +{
>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>> +
>> + return 0;
>> +}
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index bb02909..1d2e2e7 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>  /* Available with KVM_CAP_X86_SMM */
>>  #define KVM_SMI   _IO(KVMIO,   0xb7)
>> +#define KVM_ARM_SEA   _IO(KVMIO,   0xb8)
>>
>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>
>
> Why do we need a userspace API for SEA? It can also be done by using
> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it 
> this
> way is you can choose which ESR value to use.
>
> Adding a new API call to do something you could do with an old one doesn't 
> look
> right.

James, I considered your suggestion before that use the
KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
not have difference to use the alread existed KVM API.

Re: [PATCH v3 1/3] arm64: kvm: support kvmtool to detect RAS extension feature

2017-05-04 Thread gengdongjiu

Dear James,

>
>  Hi Dongjiu Geng,
>
> On 30/04/17 06:37, Dongjiu Geng wrote:
>> Handle kvmtool's detection for RAS extension, because sometimes
>> the APP needs to know the CPU's capacity
>
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index d9e9697..1004039 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -64,6 +64,14 @@ static bool cpu_has_32bit_el1(void)
>>   return !!(pfr0 & 0x20);
>>  }
>>
>> +static bool kvm_arm_support_ras_extension(void)
>> +{
>> + u64 pfr0;
>> +
>> + pfr0 = read_system_reg(SYS_ID_AA64PFR0_EL1);
>> + return !!(pfr0 & 0x1000);
>> +}
>
> Why are we telling user-space that the CPU has RAS extensions? EL0 can't do
> anything with this and the guest EL1 can detect it from the id registers.
>
>
> Are you using this to decide whether or not to generate a HEST for the guest?

James, yes, it is.  my current user-space qemu EL0 patches indeed will
check the RAS  extensions.
if has the RAS extensions. for SEA, userspace qemu will generate the
CPER and inject the SEA to guest;
for SEI,  userspace qemu sets the virtual SEI with the specified
Syndrome(set the HCR_EL2.VSE and vsesr_el2 );
if not have RAS extensions, Qemu does nothing


>
> If Qemu/kvmtool supports handling memory-failure notifications from signals 
> you
> should always generate a HEST. The GHES notification method could be anything
> Qemu can deliver to the guest using the KVM APIs. Notifications from Qemu to 
> the
> guest don't depend on the RAS extensions. KVM has APIs for IRQ and SEA (you 
> can
> use KVM_SET_ONE_REG).

I will consider your suggestion to  always generate a CPER instead of
relying on the RAS extensions, thanks
For this comments "SEA (you can use KVM_SET_ONE_REG)",  may be it will
be duplicated with the existed KVM API  as I mentioned in another mail
thread.I am considering that whether it is necessary to change the
vcpu registers in the Qemu/KVMTools

>
>
> I think we need a new API for injecting SError for SEI from Qemu/kvmtool, but 
> it
> shouldn't be related to the RAS extensions. All v8.0 CPUs have HCR_EL2.VSE, so
> we need to know KVM supports this API.
>
> Your later patch adds code to set VSESR to make virtual RAS SErrors work, I
> think we need to expose that to user-space.
>
>
> Thanks,
>
> James

Re: [PATCH v4 2/3] arm64: kvm: route synchronous external abort exceptions to el2

2017-06-27 Thread gengdongjiu

correct the commit message:

 In the firmware-first RAS solution, OS receives an synchronous
 external abort, then trapped to EL3 by SCR_EL3.EA. Firmware inspects
 the HCR_EL2.TEA and chooses the target to send APEI's SEA notification.
 If the SCR_EL3.EA is set, delegates the error exception to the hypervisor,
 otherwise it delegates to the host OS kernel


On 2017/6/26 20:45, Dongjiu Geng wrote:
> In the firmware-first RAS solution, guest OS receives an synchronous
> external abort, then trapped to EL3 by SCR_EL3.EA. Firmware inspects
> the HCR_EL2.TEA and chooses the target to send APEI's SEA notification.
> If the SCR_EL3.EA is set, delegates the error exception to the hypervisor,
> otherwise it delegates to the guest OS kernel
> 
> Signed-off-by: Dongjiu Geng 
> ---
>  arch/arm64/include/asm/kvm_arm.h | 2 ++
>  arch/arm64/include/asm/kvm_emulate.h | 7 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> b/arch/arm64/include/asm/kvm_arm.h
> index 61d694c..1188272 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -23,6 +23,8 @@
>  #include 
>  
>  /* Hyp Configuration Register (HCR) bits */
> +#define HCR_TEA  (UL(1) << 37)
> +#define HCR_TERR (UL(1) << 36)
>  #define HCR_E2H  (UL(1) << 34)
>  #define HCR_ID   (UL(1) << 33)
>  #define HCR_CD   (UL(1) << 32)
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index f5ea0ba..5f64ab2 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>   vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
>   if (is_kernel_in_hyp_mode())
>   vcpu->arch.hcr_el2 |= HCR_E2H;
> + if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> + /* route synchronous external abort exceptions to EL2 */
> + vcpu->arch.hcr_el2 |= HCR_TEA;
> + /* trap error record accesses */
> + vcpu->arch.hcr_el2 |= HCR_TERR;
> + }
> +
>   if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
>   vcpu->arch.hcr_el2 &= ~HCR_RW;
>  }
>

Re: [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2

2017-09-14 Thread gengdongjiu

James,

On 2017/9/8 0:31, James Morse wrote:
> KVM already handles external aborts from lower exception levels, no more work
> needs doing for TEA.
If it is firmware first solution, that is SCR_EL3.EA=1, all SError interrupt 
and synchronous External
Abort exceptions are taken to EL3, so EL3 firmware will handle it, KVM no needs 
to handle it.

HCR_EL3.TEA is only for EL3 to check its value to decide to jump to hypervisor 
or kernel.

> 
> What happens when a guest access the RAS-Error-Record registers?
> 
> Before we can set HCR_EL2.TERR I think we need to add some minimal emulation 
> for
> the registers it traps. Most of them should be RAZ/WI, so it should be
> straightforward. (I think KVMs default is to emulate an undef for unknown 
> traps).
Today I added the support to do some minimal emulation for RAS-Error-Record 
registers, thanks
for the good suggestion.

> 
> Eventually we will want to back this with a page of memory that lets
> Qemu/kvmtool configure what the guest can see. (i.e. the emulated machine's
> errors for kernel-first handling.)

Re: [PATCH v6 3/7] acpi: apei: remove the unused code

2017-09-14 Thread gengdongjiu



On 2017/9/14 20:35, James Morse wrote:
>> James, whether it is possible you can review the previous v5 patch which 
>> adds the support for
> Spreading 'current discussion' over two versions is a problem for anyone 
> trying
> to follow this series.
> 
> If you post a newer version its normal for people to delete the older 
> versions.
> When you post a new version you should be happy that its the latest and 
> greatest.
> 
>
James, today I paste the new version here, thanks.

https://patchwork.kernel.org/patch/9952495/
https://patchwork.kernel.org/patch/9950955/

Re: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace

2017-09-18 Thread gengdongjiu

James，
   Thanks for your comments, hope we can make the solution better.

On 2017/9/14 21:00, James Morse wrote:
> Hi gengdongjiu,
> 
> (re-ordered hunks)
> 
> On 13/09/17 08:32, gengdongjiu wrote:
>> On 2017/9/8 0:30, James Morse wrote:
>>> On 28/08/17 11:38, Dongjiu Geng wrote:
>>> For BUS_MCEERR_A* from memory_failure() we can't know if they are caused by
>>> an access or not.
> 
> Actually it looks like we can: I thought 'BUS_MCEERR_AR' could be triggered 
> via
> some CPER flags, but its not. The only code that flags MF_ACTION_REQUIRED is
> x86's kernel-first handling, which nicely matches this 'direct access' 
> problem.
> BUS_MCEERR_AR also come from KVM stage2 faults (and the x86 equivalent). 
> Powerpc
> also triggers these directly, both from what look to be synchronous paths, so 
> I
> think its fair to equate BUS_MCEERR_AR to a synchronous access and 
> BUS_MCEERR_AO
> to something_else.

James, thanks for your explanation.
can I understand that your meaning that "BUS_MCEERR_AR" stands for synchronous 
access and BUS_MCEERR_AO stands for asynchronous access?
Then for "BUS_MCEERR_AO", how to distinguish it is asynchronous data 
access(SError) and PCIE AER error?
In the user space, we can check the si_code, if it is "BUS_MCEERR_AR", we use 
SEA notification type for the guest;
if it is "BUS_MCEERR_AO", we use SEI notification type for the guest.
Because there are only two values for si_code("BUS_MCEERR_AR" and 
BUS_MCEERR_AO), in which case we can use the GSIV(IRQ) notification type?


> 
> I don't think we need anything else.
> 
> 
>>> When the mm code gets -EHWPOISON when trying to resolve a
>>
>> Because of that, so I allow  userspace getting exception information
> 
> ... and there are cases where you can't get the exception information, and 
> other
> cases where it wasn't an exception at all.
> 
> [...]
> 
> 
>>> What happens if the dram-scrub hardware spots an error in guest memory, but
>>> the guest wasn't running? KVM won't have a relevant ESR value to give you.
> 
>> if the dram-scrub hardware spots an error in guest memory, it will generate
>> IRQ in DDR controller, not SEA or SEI exception. I still do not consider the
>> GSIV. For GSIV, may be we can only handle it in the host OS.
> 
> Great example: this IRQ pulls us out of a guest, we tromp through APEI and 
> then
> memory_failure(), the memory happened to belong to the same guest
> (coincidence!), we send it some signal and now its user-space's problem.
> 
> Your KVM_REG_ARM64_FAULT mechanism is going to return stale data, even though
> the notification interrupted the guest, and it was guest memory that was
> affected. KVM doesn't have a relevant ESR.
> 
> 
> I'm strongly against exposing 'which notification type' this error originally
> came from because:
> * it doesn't matter once we've got the CPER records,
> * there isn't always an answer (there are/will-be other ways of tripping
>   memory_failure())
> * it creates ABI between firwmare, host userspace and guest userspace.
>   Firmware's choice of notification type shouldn't affect anything other than
>   the host kernel.
> 
> 
> On 13/09/17 08:32, gengdongjiu wrote:
>> On 2017/9/8 0:30, James Morse wrote:
>>> On 28/08/17 11:38, Dongjiu Geng wrote:
>>>> when userspace gets SIGBUS signal, it does not know whether
>>>> this is a synchronous external abort or SError,
>>>
>>> Why would Qemu/kvmtool need to know if the original notification (if there 
>>> was
>>> one) was synchronous or asynchronous? This is between firmware and the 
>>> kernel.
> 
>> there are two reasons:
>>
>> 1. Let us firstly discuss the SEA and SEI, there are different workflow for 
>> the two different Errors.
>> 2. when record the CPER in the user space, it needs to know the error type, 
>> because SEA and SEI are different Error source,
>>so they have different offset in the APEI table, that is to say they will 
>> be recorded to different place of the APEI table.
> 
> user-space can choose whether to use SEA or SEI, it doesn't have to choose the
> same notification type that firmware used, which in turn doesn't have to be 
> the
> same as that used by the CPU to notify firmware.
> 
> The choice only matters because these notifications hang on an existing pieces
> of the Arm-architecture, so the notification can only add to the 
> architecturally
> defined meaning. (i.e. You can only send an SEA for something that can already
> be described as a synchronous external abort).
> 
> Once we get to user-space, for memory_failure() notif

Re: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace

2017-09-21 Thread gengdongjiu

Hi James

On 2017/9/14 21:00, James Morse wrote:
> Hi gengdongjiu,

> user-space can choose whether to use SEA or SEI, it doesn't have to choose the
> same notification type that firmware used, which in turn doesn't have to be 
> the
> same as that used by the CPU to notify firmware.
> 
> The choice only matters because these notifications hang on an existing pieces
> of the Arm-architecture, so the notification can only add to the 
> architecturally
> defined meaning. (i.e. You can only send an SEA for something that can already
> be described as a synchronous external abort).
> 
> Once we get to user-space, for memory_failure() notifications, (which so far 
> is
> all we are talking about here), the only thing that could matter is whether 
> the
> guest hit a PG_hwpoison page as a stage2 fault. These can be described as
> Synchronous-External-Abort.
> 
> The Synchronous-External-Abort/SError-Interrupt distinction matters for the 
> CPU
> because it can't always make an error synchronous. For memory_failure()
> notifications to a KVM guest we really can do this, and we already have this
> behaviour for free. An example:
> 
> A guest touches some hardware:poisoned memory, for whatever reason the CPU 
> can't
> put the world back together to make this a synchronous exception, so it 
> reports
> it to firmware as an SError-interrupt.
> Linux gets an APEI notification and memory_failure() causes the affected page 
> to
> be unmapped from the guest's stage2, and SIGBUS_MCEERR_AO sent to user-space.
> 
> Qemu/kvmtool can now notify the guest with an IRQ or POLLed notification. AO->
> action optional, probably asynchronous.
> 
> But in our example it wasn't really asynchronous, that was just a property of
> the original CPU->firmware notification. What happens? The guest vcpu is 
> re-run,
> it re-runs the same instructions (this was a contained error so KVM's ELR 
> points
> at/before the instruction that steps in the problem). This time KVM takes a
> stage2 fault, which the mm code will refuse to fixup because the relevant page
> was marked as PG_hwpoision by memory_failure(). KVM signals Qemu/kvmtool with
> SIGBUS_MCEERR_AR. Now Qemu/kvmtool can notify the guest using SEA.

CC Achin

I have some personal opinion, if you think it is not right, hope you can point 
out.

Synchronous External Abort and SError Interrupt are hardware exception(hardware 
concept), which is independent of software notification,
in armv8 without RAS, the two concepts already exist. In the APEI spec, in 
order to better describe the two exceptions, so use SEA and SEI notification to 
stand for them.

SEA notification stands for Synchronous External Abort, so may be it is not 
only a notification, it also stands for a hardware error type.
SEI notification stands for SError Interrupt, so may be it is not only a 
notification, it also stands for a hardware error type.

In the OS, it has different handling flow to the two exception(two 
notification):
when the guest OS running, if the hardware generates a Synchronous External 
Abort, we told the guest OS this error is SError Interrupt instead of 
Synchronous External Abort.
guest OS uses SEI notification handling flow to deal with it, I am not sure 
whether it will have problem, because the true hardware exception is 
Synchronous External Abort,
but software treats it as SError interrupt to handle.

In the mainline code, it does not have SEI notification support, the reason I 
think it is because of the error address record by firmware is not 
accurate(SError Interrupt is asynchronous exception).
so if treat a hardware Synchronous External Abort as SError interrupt(SEI). The 
default OS behavior for SEI is PANIC, that is to say, when hardware triggers a 
Synchronous External Abort(SEA), if guest
treat it as SError interrupt(SEI), the OS will be panic. in fact, it can be 
recoverable instead of Panic.

I ever added a patch to support the SEI notification, but not sure whether it 
is can be accepted by open source, until now, not receive response.

Re: [PATCH v3 1/2] acpi: apei: remove the unused dead-code for SEA notification type

2017-10-15 Thread gengdongjiu

Borislav,
  Thank you for your time to review it.

On 2017/10/13 21:21, Borislav Petkov wrote:
>>  .notifier_call = ghes_notify_hed,
>>  };
>>  
>> -#ifdef CONFIG_ACPI_APEI_SEA
>>  static LIST_HEAD(ghes_sea);
> But now those get compiled in on x86 where there's no SEA and where we
> don't need them. So no, I don't think this patch is correct.

If have updated the patch for the x86, you can review the version 4 patches.
thanks.

Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-14 Thread gengdongjiu

Hi Marc,

On 2017/10/13 23:12, Marc Zyngier wrote:
> On 13/10/17 15:29, gengdongjiu wrote:
>> Hi Marc,
>> Thank you very much for your time to review it.
>>
>>> On 12/10/17 17:44, Dongjiu Geng wrote:
>>>> When a exception is trapped to EL2, hardware uses  ELR_ELx to hold the
>>>> current fault instruction address. If KVM wants to inject a abort to
>>>> 32 bit guest, it needs to set the LR register for the guest to emulate
>>>> this abort  happened in the guest. Because ARM32 architecture is
>>>> Multi-pipeline, so the LR value has an offset to
>>>
>>> What does "Multi-pipeline" mean?
>>
>> I mean the ARM's single-cycle instruction 3-stage pipeline operation, as 
>> shown below:
>>
>> fetch   decode   execute
>> fetchdecode   execute
>>  fetchdecode   execute
>>
>> when CPU finish instructions fetch,  PC=PC + 4
>> when CPU finish instructions decode, PC=PC + 8
>> when CPU finish instructions execution, PC=PC+12
> 
> Yeah, and that's called pipelined execution. "Multi-pipeline" doesn't 
> mean anything. Also, that's an artefact of the original ARM1 
> implementation, and not how modern CPUs work anymore.
Ok, thanks for the clarification.

> 
>>
>> that is to say, when happen data abort, 
>> the PC = fault instruction address + 12, LR_abt = fault instruction address 
>> + 8
>>
>> In order to emulate this abort for KVM, LR_abt needs to add an offset 8 when 
>> inject data abort. 
>>
>>>
>>>> the fault instruction address.
>>>>
>>>> The offsets applied to Link value for exceptions as shown below, which
>>>> should be added for the ARM32 link register(LR).
>>>>
>>>> Exception  Offset, for PE state of:
>>>>A32   T32
>>>> Undefined Instruction  +4+2
>>>> Prefetch Abort +4+4
>>>> Data Abort +8+8
>>>> IRQ or FIQ +4+4
>>>
>>> Please document where this table is coming from.
>>
>>
>> Thanks for pointing out. Will add it.
>> It come from:  DDI0487A_k_armv8_arm_iss10775, "G1.12.3 Overview of exception 
>> entry", Table G1-10 Offsets applied to Link value for exceptions taken to 
>> non-EL2 modes
>>
>>>
>>>>
>>>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>>> Signed-off-by: Haibin Zhang <zhanghaib...@huawei.com>
>>>>
>>>> ---
>>>> For example, to the undefined instruction injection:
>>>>
>>>> 1. Guest OS call SMC(Secure Monitor Call) instruction in the address
>>>> 0xc025405c, then Guest traps to hypervisor
>>>>
>>>> c0254050:   e59d5028ldr r5, [sp, #40]   ; 0x28
>>>> c0254054:   e3a03001mov r3, #1
>>>> c0254058:   e1a01003mov r1, r3
>>>> c025405c:   e1600070smc 0
>>>> c0254060:   e30a0270movwr0, #41584  ; 0xa270
>>>> c0254064:   e34c00bfmovtr0, #49343  ; 0xc0bf
>>>>
>>>> 2. KVM  injects undefined abort to guest 3. We will find the fault PC
>>>> is 0xc0254058, not 0xc025405c.
>>>>
>>>> [   12.348072] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
>>>> [   12.349786] Modules linked in:
>>>> [   12.350563] CPU: 1 PID: 71 Comm: cat Not tainted 4.1.0-dirty #25
>>>> [   12.352061] Hardware name: Generic DT based system
>>>> [   12.353275] task: d9d08000 ti: d9cfe000 task.ti: d9cfe000
>>>> [   12.354637] PC is at proc_dointvec+0x20/0x60
>>>> [   12.355717] LR is at proc_sys_call_handler+0xb0/0xc4
>>>> [   12.356972] pc : []lr : []psr: a0060013
>>>> [   12.356972] sp : d9cffe90  ip : c0254038  fp : 0001
>>>> [   12.359824] r10: d9cfff80  r9 : 0004  r8 : 
>>>> [   12.361132] r7 : bec21cb0  r6 : d9cffec4  r5 : d9cfff80  r4 : c0e82de0
>>>> [   12.362766] r3 : 0001  r2 : bec21cb0  r1 : 0001  r0 : c0e82de0
>>>> [   12.364400] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  
>>>> Segment user
>>>> [   12.366183] Control: 10c5383d  Table: 59d3406a  DAC: 0015
>>>> [   12.367623] Process cat (pid: 71, stack limit = 0xd9cfe220)
>>>>
>>>> 4. After correct the LR register, it will have right value
>>>>
>&

Re: [PATCH v4 2/2] acpi: apei: Add SEI notification type support for ARMv8

2017-10-17 Thread gengdongjiu

Have fixed it in the patch v5.

On 2017/10/17 18:20, kbuild test robot wrote:
> Hi Dongjiu,
> 
> [auto build test ERROR on pm/linux-next]
> [also build test ERROR on v4.14-rc5 next-20171016]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Dongjiu-Geng/acpi-apei-remove-the-unused-dead-code-for-SEA-NMI-notification-type/20171017-141237
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git 
> linux-next
> config: x86_64-kexec (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>drivers/acpi/apei/ghes.c: In function 'ghes_probe':
>>> drivers/acpi/apei/ghes.c:1191:3: error: implicit declaration of function 
>>> 'ghes_abort_add' [-Werror=implicit-function-declaration]
>   ghes_abort_add(ghes);
>   ^~
>drivers/acpi/apei/ghes.c: In function 'ghes_remove':
>>> drivers/acpi/apei/ghes.c:1245:3: error: implicit declaration of function 
>>> 'ghes_abort_remove' [-Werror=implicit-function-declaration]
>   ghes_abort_remove(ghes);
>   ^
>cc1: some warnings being treated as errors
> 
> vim +/ghes_abort_add +1191 drivers/acpi/apei/ghes.c
> 
>   1085
>   1086static int ghes_probe(struct platform_device *ghes_dev)
>   1087{
>   1088struct acpi_hest_generic *generic;
>   1089struct ghes *ghes = NULL;
>   1090
>   1091int rc = -EINVAL;
>   1092
>   1093generic = *(struct acpi_hest_generic 
> **)ghes_dev->dev.platform_data;
>   1094if (!generic->enabled)
>   1095return -ENODEV;
>   1096
>   1097switch (generic->notify.type) {
>   1098case ACPI_HEST_NOTIFY_POLLED:
>   1099case ACPI_HEST_NOTIFY_EXTERNAL:
>   1100case ACPI_HEST_NOTIFY_SCI:
>   1101case ACPI_HEST_NOTIFY_GSIV:
>   1102case ACPI_HEST_NOTIFY_GPIO:
>   1103break;
>   1104
>   1105case ACPI_HEST_NOTIFY_SEA:
>   1106if (!IS_ENABLED(CONFIG_ACPI_APEI_SEA)) {
>   1107pr_warn(GHES_PFX "Generic hardware 
> error source: %d notified via SEA is not supported\n",
>   1108generic->header.source_id);
>   1109rc = -ENOTSUPP;
>   1110goto err;
>   }
>   1112break;
>   1113case ACPI_HEST_NOTIFY_SEI:
>   1114if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
>   1115pr_warn(GHES_PFX "Generic hardware 
> error source: %d notified via SEI is not supported!\n",
>   1116generic->header.source_id);
>   1117goto err;
>   1118}
>   1119break;
>   1120case ACPI_HEST_NOTIFY_NMI:
>   1121if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
>   1122pr_warn(GHES_PFX "Generic hardware 
> error source: %d notified via NMI interrupt is not supported!\n",
>   1123generic->header.source_id);
>   1124goto err;
>   1125}
>   1126break;
>   1127case ACPI_HEST_NOTIFY_LOCAL:
>   1128pr_warning(GHES_PFX "Generic hardware error 
> source: %d notified via local interrupt is not supported!\n",
>   1129   generic->header.source_id);
>   1130goto err;
>   1131default:
>   1132pr_warning(FW_WARN GHES_PFX "Unknown 
> notification type: %u for generic hardware error source: %d\n",
>   1133   generic->notify.type, 
> generic->header.source_id);
>   1134goto err;
>   1135}
>   1136
>   1137rc = -EIO;
>   1138if (generic->error_block_length <
>   1139sizeof(struct acpi_hest_generic_status)) {
>   1140pr_warning(FW_BUG GHES_PFX "Invalid error block 
> length: %u for generic hardware error source: %d\n",
>   1141   generic->error_block_length,
>   1142   generic->header.source_id);
>   1143goto err;
>   1144}
>   1145ghes = ghes_new(generic);
>   1146if (IS_ERR(ghes)) {
>   1147rc =

Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-16 Thread gengdongjiu

Hi Marc,

> 
> Please also update the 32bit code accordingly, as it looks broken too.

I have updated the 32 bit code according, in my hand, there is no arm32 host 
environment,
So there is no method to verify it in the arm32 host, only verify the patch in 
the arm64 host.

Anyway I firstly send the patch out for review. Thanks.

> 
> Thanks,
> 
>   M.
> --
> Jazz is not dead. It just smells funny...

Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-17 Thread gengdongjiu

Hi Christoffer

On 2017/10/17 3:59, Christoffer Dall wrote:
> On Mon, Oct 16, 2017 at 04:10:01PM +0000, gengdongjiu wrote:
>> Hi Marc,
>>
>>>
>>> Please also update the 32bit code accordingly, as it looks broken too.
>>
>> I have updated the 32 bit code according, in my hand, there is no arm32 host 
>> environment,
>> So there is no method to verify it in the arm32 host, only verify the patch 
>> in the arm64 host.
>>
>> Anyway I firstly send the patch out for review. Thanks.
>>
> In this case, if you just clearly specify in the patches you send out
> that the 32-bit one is untested, you can ask someone to test it for you.
> 

Thanks for your reminder, today I found a arm32 board and have tested it.
So I have tested in both arm32 and arm64. please review it. thanks.


> Thanks,
> -Christoffer
> 
> .
>

Re: [PATCH] arm64: KVM: set right LR register value for 32 bit guest when inject abort

2017-10-13 Thread gengdongjiu

Hi Marc,
Thank you very much for your time to review it.

> On 12/10/17 17:44, Dongjiu Geng wrote:
> > When a exception is trapped to EL2, hardware uses  ELR_ELx to hold the
> > current fault instruction address. If KVM wants to inject a abort to
> > 32 bit guest, it needs to set the LR register for the guest to emulate
> > this abort  happened in the guest. Because ARM32 architecture is
> > Multi-pipeline, so the LR value has an offset to
> 
> What does "Multi-pipeline" mean?

I mean the ARM's single-cycle instruction 3-stage pipeline operation, as shown 
below:

fetch   decode   execute
fetchdecode   execute
 fetchdecode   execute

when CPU finish instructions fetch,  PC=PC + 4
when CPU finish instructions decode, PC=PC + 8
when CPU finish instructions execution, PC=PC+12

that is to say, when happen data abort, 
the PC = fault instruction address + 12, LR_abt = fault instruction address + 8

In order to emulate this abort for KVM, LR_abt needs to add an offset 8 when 
inject data abort. 

> 
> > the fault instruction address.
> >
> > The offsets applied to Link value for exceptions as shown below, which
> > should be added for the ARM32 link register(LR).
> >
> > Exception   Offset, for PE state of:
> > A32   T32
> > Undefined Instruction   +4+2
> > Prefetch Abort  +4+4
> > Data Abort  +8+8
> > IRQ or FIQ  +4+4
> 
> Please document where this table is coming from.


Thanks for pointing out. Will add it.
It come from:  DDI0487A_k_armv8_arm_iss10775, "G1.12.3 Overview of exception 
entry", Table G1-10 Offsets applied to Link value for exceptions taken to 
non-EL2 modes

> 
> >
> > Signed-off-by: Dongjiu Geng 
> > Signed-off-by: Haibin Zhang 
> >
> > ---
> > For example, to the undefined instruction injection:
> >
> > 1. Guest OS call SMC(Secure Monitor Call) instruction in the address
> > 0xc025405c, then Guest traps to hypervisor
> >
> > c0254050:   e59d5028ldr r5, [sp, #40]   ; 0x28
> > c0254054:   e3a03001mov r3, #1
> > c0254058:   e1a01003mov r1, r3
> > c025405c:   e1600070smc 0
> > c0254060:   e30a0270movwr0, #41584  ; 0xa270
> > c0254064:   e34c00bfmovtr0, #49343  ; 0xc0bf
> >
> > 2. KVM  injects undefined abort to guest 3. We will find the fault PC
> > is 0xc0254058, not 0xc025405c.
> >
> > [   12.348072] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
> > [   12.349786] Modules linked in:
> > [   12.350563] CPU: 1 PID: 71 Comm: cat Not tainted 4.1.0-dirty #25
> > [   12.352061] Hardware name: Generic DT based system
> > [   12.353275] task: d9d08000 ti: d9cfe000 task.ti: d9cfe000
> > [   12.354637] PC is at proc_dointvec+0x20/0x60
> > [   12.355717] LR is at proc_sys_call_handler+0xb0/0xc4
> > [   12.356972] pc : []lr : []psr: a0060013
> > [   12.356972] sp : d9cffe90  ip : c0254038  fp : 0001
> > [   12.359824] r10: d9cfff80  r9 : 0004  r8 : 
> > [   12.361132] r7 : bec21cb0  r6 : d9cffec4  r5 : d9cfff80  r4 : c0e82de0
> > [   12.362766] r3 : 0001  r2 : bec21cb0  r1 : 0001  r0 : c0e82de0
> > [   12.364400] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> > user
> > [   12.366183] Control: 10c5383d  Table: 59d3406a  DAC: 0015
> > [   12.367623] Process cat (pid: 71, stack limit = 0xd9cfe220)
> >
> > 4. After correct the LR register, it will have right value
> >
> > [  125.763370] Internal error: Oops - undefined instruction: 0 [#2]
> > SMP ARM [  125.767010] Modules linked in:
> > [  125.768472] CPU: 1 PID: 74 Comm: cat Tainted: G  D 
> > 4.1.0-dirty #25
> > [  125.771854] Hardware name: Generic DT based system [  125.774053]
> > task: db0bb900 ti: d9d1 task.ti: d9d1 [  125.776821] PC is at
> > proc_dointvec+0x24/0x60 [  125.778919] LR is at
> > proc_sys_call_handler+0xb0/0xc4
> > [  125.781269] pc : []lr : []psr: a0060013
> > [  125.781269] sp : d9d11e90  ip : c0254038  fp : 0001 [
> > 125.786581] r10: d9d11f80  r9 : 0004  r8 :  [  125.789673]
> > r7 : be92ccb0  r6 : d9d11ec4  r5 : d9d11f80  r4 : c0e82de0 [
> > 125.792828] r3 : 0001  r2 : be92ccb0  r1 : 0001  r0 : c0e82de0
> > [  125.795890] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> > Segment user
> >
> > For other exception injection, such as Data/Prefetch abort, also needs
> > to correct
> > ---
> >  arch/arm64/kvm/inject_fault.c | 18 --
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/inject_fault.c
> > b/arch/arm64/kvm/inject_fault.c index da6a8cf..da93508 100644
> > --- a/arch/arm64/kvm/inject_fault.c
> > +++ b/arch/arm64/kvm/inject_fault.c
> > @@ -33,12 +33,11 @@
> >  #define LOWER_EL_AArch64_VECTOR0x400
> >  #define LOWER_EL_AArch32_VECTOR0x600
> >
> > -static void

Re: [PATCH v3 1/2] acpi: apei: remove the unused dead-code for SEA notification type

2017-10-12 Thread gengdongjiu

Hi James/Rafael/Borislav,

what is your comments about these two patches? Seems they are pending
long time, I will appreciate that if you can give some review
comments. Thanks very much, Tyler has tested the second patch.

[PATCH v3 1/2] acpi: apei: remove the unused dead-code for SEA notification type
[PATCH v3 2/2] acpi: apei: Add SEI notification type support for ARMv8


2017-09-28 20:41 GMT+08:00 Dongjiu Geng :
> In current code logic, the two functions ghes_sea_add() and
> ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA
> is defined. If not, it will return errors in the ghes_probe()
> and not continue. If the probe is failed, the ghes_sea_remove()
> also has no chance to be called. Hence, remove the unnecessary
> handling when CONFIG_ACPI_APEI_SEA is not defined.
>
> In the firmware-first RAS solution, the IPA fault address recorded
> by hpfar_el2 may be UNKNOWN, and also current code does not use it,
> so remove it.
>
> Cc: Stephen Boyd 
> Cc: James Morse 
> Cc: Tyler Baicar 
> Signed-off-by: Dongjiu Geng 
>
> ---
> v2->v3:
> 1. remove the fault_ipa address
> If ESR_ELx.DFSC is Synchronous External Abort on memory access(0b01),
> the hpfar_el2's value will be UNKNOWN, so this value is not accurate.
>
> It is ever discussed here:
> https://lkml.org/lkml/2017/9/8/623
> ---
>  arch/arm/include/asm/system_misc.h   |  2 +-
>  arch/arm64/include/asm/system_misc.h |  2 +-
>  arch/arm64/mm/fault.c|  2 +-
>  drivers/acpi/apei/ghes.c | 14 --
>  virt/kvm/arm/mmu.c   |  2 +-
>  5 files changed, 4 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm/include/asm/system_misc.h 
> b/arch/arm/include/asm/system_misc.h
> index 8c4a89f..5b53a1c 100644
> --- a/arch/arm/include/asm/system_misc.h
> +++ b/arch/arm/include/asm/system_misc.h
> @@ -22,7 +22,7 @@ extern void (*arm_pm_idle)(void);
>
>  extern unsigned int user_debug;
>
> -static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> +static inline int handle_guest_sea(unsigned int esr)
>  {
> return -1;
>  }
> diff --git a/arch/arm64/include/asm/system_misc.h 
> b/arch/arm64/include/asm/system_misc.h
> index 07aa8e3..3f0d0a8 100644
> --- a/arch/arm64/include/asm/system_misc.h
> +++ b/arch/arm64/include/asm/system_misc.h
> @@ -56,7 +56,7 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, 
> const char *cmd);
> __show_ratelimited; \
>  })
>
> -int handle_guest_sea(phys_addr_t addr, unsigned int esr);
> +int handle_guest_sea(unsigned int esr);
>
>  #endif /* __ASSEMBLY__ */
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 2509e4f..13391f4 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -677,7 +677,7 @@ static const struct fault_info fault_info[] = {
>   * and non-zero if there was an error processing the error or there was
>   * no error to process.
>   */
> -int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> +int handle_guest_sea(unsigned int esr)
>  {
> int ret = -ENOENT;
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index d661d45..c15a08d 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -813,7 +813,6 @@ static struct notifier_block ghes_notifier_hed = {
> .notifier_call = ghes_notify_hed,
>  };
>
> -#ifdef CONFIG_ACPI_APEI_SEA
>  static LIST_HEAD(ghes_sea);
>
>  /*
> @@ -848,19 +847,6 @@ static void ghes_sea_remove(struct ghes *ghes)
> mutex_unlock(_list_mutex);
> synchronize_rcu();
>  }
> -#else /* CONFIG_ACPI_APEI_SEA */
> -static inline void ghes_sea_add(struct ghes *ghes)
> -{
> -   pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not 
> supported\n",
> -  ghes->generic->header.source_id);
> -}
> -
> -static inline void ghes_sea_remove(struct ghes *ghes)
> -{
> -   pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is 
> not supported\n",
> -  ghes->generic->header.source_id);
> -}
> -#endif /* CONFIG_ACPI_APEI_SEA */
>
>  #ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  /*
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 2ea21da..07636c2 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1504,7 +1504,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>  * is no need to pass the error into the guest.
>  */
> if (is_abort_sea(fault_status)) {
> -   if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
> +   if (!handle_guest_sea(kvm_vcpu_get_hsr(vcpu)))
> return 1;
> }
>
> --
> 2.10.1
>

Re: [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2

2017-10-16 Thread gengdongjiu

Hi James,

>
>> Today I added the support to do some minimal emulation for
>> RAS-Error-Record registers, thanks
>> for the good suggestion.
>
> Where can I find this patch?
> I'd like to repost it as part of the SError_rework/RAS/IESB series: this is
> one
> of the bits KVM needs but I didn't touch as it looks like your updated
> version
> of this patch should cover it.

I have updated this patch according your suggestion that do some
emulation for the ERR* trap. and have verified it. but still not sent
it out. Tomorrow I will send it out for your review(now it is Chinese
midnight), if you think it is ok, you can add it as part of the
SError_rework/RAS/IESB series. thanks for your reminder and good
review comments.


>
>
> Thanks,
>
> James
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>

Re: [PATCH] arm64: KVM: VHE: reset PSTATE.UAO when switch to host

2017-09-08 Thread gengdongjiu

Hi Marc,

> 
> On 08/09/17 10:05, gengdongjiu wrote:
> > Marc,
> >Thanks for reply.
> >
> > On 2017/9/8 16:21, Marc Zyngier wrote:
> >>> Marc,
> >>>
> >>> sorry I have another question for the PAN.
[...]
> There cannot be any userspace mapping at EL2 when non-VHE, so there cannot be 
> any valid PAN setting. I repeat: there is not such thing as
> PAN at EL2 when HCR_EL2.E2H==0. This bit *has no effect*. Just read the 
> documentation (ARM DDI 0487B.a, D4.4.2).

Thanks Marc's comments, I completely agree with you, I understand the 
architecture, in non-VHE, user space mapping only exists at EL1 and pointed by 
the ttbr0_el1.
In the EL1, it uses ttbr0_el1 to point userspace page table, uses ttbr1_el1 to 
point kernel mapping.
In the EL2, it uses ttbr0_el2 or vttbr_el2 for address translation, cannot use 
the ttbr0_el1. 

> 
> If you're going to change this kind of code, please start by understanding 
> the architecture.

OK, so I plant to remove the PAN setting at EL2 in non-VHE.

> 
>   M.
> --
> Jazz is not dead. It just smells funny...

Re: [PATCH] arm64: KVM: VHE: reset PSTATE.UAO when switch to host

2017-09-08 Thread gengdongjiu

On 2017/9/7 23:23, Marc Zyngier wrote:
> On 07/09/17 16:03, gengdongjiu wrote:
>>> On 07/09/17 12:49, gengdongjiu wrote:
>>>>
[...]

> 
> I really cannot think of a good reason why we'd want to do that. Playing
> with set_fs() is almost universally wrong, and I'm certainly going to
> oppose to any change in that area unless the code that calls set_fs()
> has been made public and properly reviewed. Until then, UAO/PAN will
> stay as they are unless you prove that our current code is wrong.

Marc,

sorry I have another question for the PAN.

In the non-VHE mode, The host kernel is running in the EL1. Before host kernel 
enter guest, host OS will call 'HVC' instruction to do the world-switch,
and the pstate.PAN will be saved into the SPSR_EL2. When world-switch back to 
host kernel from EL2, it will call 'eret' instruction to EL1 host,
this 'eret' instruction will restore the SPSR_EL2 to the PSTATE. so the 
PSTATE.PAN will be restored.

For the Non-VHE mode, in the EL2 where mainly have word-switch code, do you 
think it needs to reset the PSTATE.PAN? From the spec, it does not provide 
SCTLR_EL2.SPAN bit for non-VHE mode,
so reset the PSTATE.PAN does not sure whether it is needed or whether affects 
the performance. If you think it is needed for El2 in Non-VHE mode, moving the 
reset PSTATE.PAN to
the exception entry to EL2 may be better, such as "el1_sync", because host can 
also call 'hvc' instruction without guest running.

> 
>   M.
>

Re: [PATCH] arm64: KVM: VHE: reset PSTATE.UAO when switch to host

2017-09-08 Thread gengdongjiu

Marc,
   Thanks for reply.

On 2017/9/8 16:21, Marc Zyngier wrote:
>> Marc,
>>
>> sorry I have another question for the PAN.
>>
>> In the non-VHE mode, The host kernel is running in the EL1. Before
>> host kernel enter guest, host OS will call 'HVC' instruction to do
>> the world-switch, and the pstate.PAN will be saved into the SPSR_EL2.
>> When world-switch back to host kernel from EL2, it will call 'eret'
>> instruction to EL1 host, this 'eret' instruction will restore the
>> SPSR_EL2 to the PSTATE. so the PSTATE.PAN will be restored.
>>
>> For the Non-VHE mode, in the EL2 where mainly have word-switch code,
>> do you think it needs to reset the PSTATE.PAN? From the spec, it does
>> not provide SCTLR_EL2.SPAN bit for non-VHE mode, so reset the
>> PSTATE.PAN does not sure whether it is needed or whether affects the
>> performance. If you think it is needed for El2 in Non-VHE mode,
>> moving the reset PSTATE.PAN to the exception entry to EL2 may be
>> better, such as "el1_sync", because host can also call 'hvc'
>> instruction without guest running.
> So let's see if I correctly understand your question:
> 
> You're worried that we don't set/reset PSTATE.PAN at EL2 in non-VHE?
> In non-VHE, there is no user-space mapping that is present at the
> same time as the hypervisor mappings. Actually, we hardly have any
> mapping other than the HYP text/data and the vcpu/vm structures.

Not that meaning.
there are two meanings:

In short, we should not set PAN for El2 in non-VHE; If you think we should, 
current code does not cover all scenarios.

1. In the current mainline code it sets the PSTATE.PAN at EL2 in non-VHE. As 
you said,
in non-VHE, there is no user-space mapping that is present at the same time as 
the
hypervisor mappings, so I think it may not need to set both for EL1 and El2 in 
non-VHE,
but current code sets it. As you see[1], the code does not check VHE.

2. Conversely, in non-VHE, if you think we should set PAN in the EL2,
current code only sets it in the guest_exit path, do not cover all scenarios.
For example, when there is no guest, only have host, host calling 'HVC' 
instruction enter to El2 to do somethings,
then it will not call the guest_exit, so the PAN will not be set.
In order to handle this case, we should move it to the 'el1_sync'

ENTRY(__guest_exit)
// x0: return code
// x1: vcpu
// x2-x29,lr: vcpu regs
// vcpu x0-x1 on the stack

add x1, x1, #VCPU_CONTEXT

ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
[1]

// Store the guest regs x2 and x3
stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]

// Retrieve the guest regs x0-x1 from the stack
ldp x2, x3, [sp], #16   // x0, x1

> 
> So how is PAN relevant in this context? What does it even mean?
> If you have a ARMv8.0 behaviour, PAN doesn't even seem to *exist* at
> EL2.
> 
> Or am I completely missing the point here?

Re: [PATCH] arm64: KVM: VHE: reset PSTATE.UAO when switch to host

2017-09-07 Thread gengdongjiu

On 2017/9/7 18:13, Marc Zyngier wrote:
> On 07/09/17 11:05, gengdongjiu wrote:
>> Hi James,
>>
>> On 2017/9/7 17:20, James Morse wrote:
>>> Hi Dongjiu Geng,
>>>
>>> On 07/09/17 06:54, Dongjiu Geng wrote:
>>>> In VHE mode, host kernel runs in the EL2 and can enable
>>>> 'User Access Override' when fs==KERNEL_DS so that it can
>>>> access kernel memory. However, PSTATE.UAO is set to 0 on
>>>> an exception taken from EL1 to EL2. Thus when VHE is used
>>>> and exception taken from a guest UAO will be disabled and
>>>> host will use the incorrect PSTATE.UAO. So check and reset
>>>> the PSTATE.UAO when switching to host.
>>>
>>> This would only be a problem if KVM were calling into world-switch with
>>> fs==KERNEL_DS. I can't see where this happens.
>>  Not only KVM, may also kernel sets the fs == KERNEL_DS before calling into 
>> world-switch
> 
> How? Please describe the exact sequence of event that lead to this
> situation with the current code base.

Hi Marc,

   Different tasks have different fs, such as USER_DS or KERNEL_DS. In the 
context switch, it will restore the
task's fs. Thus, that depends on task itself, as shown below code. UAO is 
different with PAN, PAN will be always enabled if
hardware CPU supports PAN feature, but UAO is dynamical change.

/*
 * Thread switching.
 */
__notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *next)
{
struct task_struct *last;

fpsimd_thread_switch(next);
tls_thread_switch(next);
hw_breakpoint_thread_switch(next);
contextidr_thread_switch(next);
entry_task_switch(next);
uao_thread_switch(next);
..
}

/* Restore the UAO state depending on next's addr_limit */
void uao_thread_switch(struct task_struct *next)
{
if (IS_ENABLED(CONFIG_ARM64_UAO)) {
if (task_thread_info(next)->addr_limit == KERNEL_DS)
asm(ALTERNATIVE("nop", SET_PSTATE_UAO(1), 
ARM64_HAS_UAO));
else
asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), 
ARM64_HAS_UAO));
}
}

> 
>   M.
>

Re: [PATCH] arm64: KVM: VHE: reset PSTATE.UAO when switch to host

2017-09-07 Thread gengdongjiu

Hi James,

On 2017/9/7 17:20, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 07/09/17 06:54, Dongjiu Geng wrote:
>> In VHE mode, host kernel runs in the EL2 and can enable
>> 'User Access Override' when fs==KERNEL_DS so that it can
>> access kernel memory. However, PSTATE.UAO is set to 0 on
>> an exception taken from EL1 to EL2. Thus when VHE is used
>> and exception taken from a guest UAO will be disabled and
>> host will use the incorrect PSTATE.UAO. So check and reset
>> the PSTATE.UAO when switching to host.
> 
> This would only be a problem if KVM were calling into world-switch with
> fs==KERNEL_DS. I can't see where this happens.
 Not only KVM, may also kernel sets the fs == KERNEL_DS before calling into 
world-switch

> 
> kvm_arch_vcpu_ioctl_run() is the only place KVM calls world-switch, there are 
> no
> set_fs() calls in it, or on the path to it. The addr_limit should be USER_DS,
> PSTATE.UAO will be clear, as it is when we come back from a guest.
how about if kernel set the KERNEL_DS? but not the kvm_arch_vcpu_ioctl_run().

> 
> This isn't broken today. I agree it will break if KVM decides to
> set_fs(KERNEL_DS) around world switch, but until then we don't need this 
> patch.
KVM and host kernel set_fs(KERNEL_DS) all can break this.
In the normal way, after world-switch, I think it should check whether it needs 
to restore to its previous state.
we should not always consider the set_fs(KERNEL_DS) is disabled for the host.

> 
> 
>> Move the reset PSTATE.PAN on entry to EL2 together with
>> PSTATE.UAO reset.
> 
> Moving this breaks PAN-at-HYP for systems with PAN but without VHE.
No, without VHE, the host kernel is running in the EL1.
Before host kernel enter guest, host OS will call 'HVC' instruction to do the 
world-switch, and the pstate.PAN will be saved into the SPSR_EL2.
When world-switch back to host kernel from EL2, it will call 'eret' instruction 
to EL1 host, this 'eret' instruction will restore the SPSR_EL2 to the PSTATE.
so the PSTATE.PAN will be restored.
So without VHE, we should not reset the PAN. I paste the spec statement

--
PSTATE.PAN is copied to SPSR_ELx.PAN on an exception taken from AArch64 to 
AArch64
SPSR_ELx.PAN is copied to PSTATE.PAN on an exception return to AArch64 from 
AArch64


> 
> 
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 12ee62d..7662ef5 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -96,8 +96,6 @@ ENTRY(__guest_exit)
>>  
>>  add x1, x1, #VCPU_CONTEXT
>>  
>> -ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
>> -
>>  // Store the guest regs x2 and x3
>>  stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
>>  
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index a733461..715b3941 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -22,6 +22,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
>>  {
>> @@ -399,6 +400,17 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>>  
>>  __sysreg_restore_host_state(host_ctxt);
>>  
>> +if (has_vhe()) {
>> +/*
>> + * PSTATE was not saved over guest enter/exit, re-enable
>> + * any detecte features that might not have been set
>> + * correctly.
>> + */
>> +uao_thread_switch(current);
> 
> I don't see how addr_limit will ever be KERNEL_DS, so this is always clearing
> PSTATE.UAO, which was already clear from the guest-exit exception.
I think we should not always consider the host kernel does not set the 
KERNEL_DS before entering guest.

> 
> (Also, the uao_thread_switch() code isn't accessible from EL2, neither is 
> current)
No,
for the VHE, both the uao_thread_switch() and current can be accessible from 
the EL2.
all the host kernel runs in the EL2.
The API can be accessible from EL2 to the VHE.
The current is Qemu or other kvm tools
I have tested this patch, it is workable.

> 
> 
>> +asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1),
>> +ARM64_HAS_PAN, CONFIG_ARM64_PAN));
> 
> ... and this is setting PSTATE.PAN on VHE, which was already set, and breaking
> PAN-at-HYP on non-VHE systems.
> 
> Vladimir's commit message for that patch that added this enabling explained it
> is needed for !VHE as SCTLR_EL2 when HCR_EL2.E2H is clear doesn't have a SPAN 
> bit.
> 
> When we have VHE clearing SCTLR_EL2.SPAN (clearing because it was RES1 on 
> v8.0)
> will cause the CPU to set PSTATE.PAN when we take an exception.
> 
> 
>> +}
>> +
>>  if (fp_enabled) {
>>  __fpsimd_save_state(_ctxt->gp_regs.fp_regs);
>>  __fpsimd_restore_state(_ctxt->gp_regs.fp_regs);
>>
> 
> 
> James
> 
> 
> .
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-05 Thread gengdongjiu

CC Catalin


On 2017/9/6 2:58, gengdongjiu wrote:
> when exit from guest, some host PSTATE bits may be lost, such as
> PSTATE.PAN or PSTATE.UAO. It is because host and hypervisor all run
> in the EL2, host PSTATE value cannot be saved and restored via
> SPSR_EL2. So if guest has changed the PSTATE, host continues with
> a wrong value guest has set.
> 
> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
> Signed-off-by: Haibin Zhang <zhanghaib...@huawei.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  8 +++
>  arch/arm64/include/asm/kvm_hyp.h  |  2 ++
>  arch/arm64/include/asm/sysreg.h   | 23 +++
>  arch/arm64/kvm/hyp/entry.S|  2 --
>  arch/arm64/kvm/hyp/switch.c   | 24 ++--
>  arch/arm64/kvm/hyp/sysreg-sr.c| 48 
> ---
>  6 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e923b58..cba7d3e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -193,6 +193,12 @@ struct kvm_cpu_context {
>   };
>  };
>  
> +struct kvm_cpu_host_pstate {
> + u64 daif;
> + u64 uao;
> + u64 pan;
> +};
> +
>  typedef struct kvm_cpu_context kvm_cpu_context_t;
>  
>  struct kvm_vcpu_arch {
> @@ -227,6 +233,8 @@ struct kvm_vcpu_arch {
>  
>   /* Pointer to host CPU context */
>   kvm_cpu_context_t *host_cpu_context;
> + /* Host PSTATE value */
> + struct kvm_cpu_host_pstate host_pstate;
>   struct {
>   /* {Break,watch}point registers */
>   struct kvm_guest_debug_arch regs;
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b..a75587a 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -134,6 +134,8 @@
>  
>  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> +void __sysreg_save_host_pstate(struct kvm_vcpu *vcpu);
> +void __sysreg_restore_host_pstate(struct kvm_vcpu *vcpu);
>  void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
>  void __sysreg32_save_state(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 248339e..efdcf40 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -295,6 +295,29 @@
>  #define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
>  #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
>  
> +#define REG_PSTATE_PAN   sys_reg(3, 0, 4, 2, 3)
> +#define REG_PSTATE_UAO   sys_reg(3, 0, 4, 2, 4)
> +
> +#define GET_PSTATE_PAN   \
> + ({  \
> + u64 reg;\
> + asm volatile(ALTERNATIVE("mov %0, #0",  \
> + "mrs_s %0, " __stringify(REG_PSTATE_PAN),\
> + ARM64_HAS_PAN)\
> + : "=r" (reg));\
> + reg;\
> + })
> +
> +#define GET_PSTATE_UAO   \
> + ({  \
> + u64 reg;\
> + asm volatile(ALTERNATIVE("mov %0, #0",\
> + "mrs_s %0, " __stringify(REG_PSTATE_UAO),\
> + ARM64_HAS_UAO)\
> + : "=r" (reg));\
> + reg;\
> + })
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_EE(1 << 25)
>  #define SCTLR_ELx_I  (1 << 12)
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 12ee62d..7662ef5 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -96,8 +96,6 @@ ENTRY(__guest_exit)
>  
>   add x1, x1, #VCPU_CONTEXT
>  
> - ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
> -
>   // Store the guest regs x2 and x3
>   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c..9b380a1 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -278,6 +278,26 @@ static void __hyp_text __skip_instr(struct kvm_vcpu 
> *vcpu)
>

Re: [PATCH v11 4/6] target-arm: kvm64: detect guest RAS EXTENSION feature

2017-09-08 Thread gengdongjiu

Hi Peter,
  Sorry for my late response.

> 
> On 18 August 2017 at 15:23, Dongjiu Geng  wrote:
> > check if kvm supports guest RAS EXTENSION. if so, set corresponding
> > feature bit for vcpu.
> >
> > Signed-off-by: Dongjiu Geng 
> > ---
> >  linux-headers/linux/kvm.h | 1 +
> >  target/arm/cpu.h  | 3 +++
> >  target/arm/kvm64.c| 8 
> >  3 files changed, 12 insertions(+)
> >
> > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> > index 7971a4f..2aa176e 100644
> > --- a/linux-headers/linux/kvm.h
> > +++ b/linux-headers/linux/kvm.h
> > @@ -929,6 +929,7 @@ struct kvm_ppc_resize_hpt {  #define
> > KVM_CAP_PPC_SMT_POSSIBLE 147  #define KVM_CAP_HYPERV_SYNIC2 148
> > #define KVM_CAP_HYPERV_VP_INDEX 149
> > +#define KVM_CAP_ARM_RAS_EXTENSION 150
> >
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >
> 
> Hi. Changes to linux-headers need to be done as a patch of their own created 
> using scripts/update-linux-headers.sh run against a mainline
> kernel tree (and with a commit message that quotes the kernel commit hash 
> used). This ensures that we have a consistent set of headers
> that don't diverge from the kernel copy.
Sure, I will, thanks a lot for your reminder.

> 
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h index
> > b39d64a..6b0961b 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -611,6 +611,8 @@ struct ARMCPU {
> >
> >  /* CPU has memory protection unit */
> >  bool has_mpu;
> > +/* CPU has ras extension unit */
> > +bool has_ras_extension;
> >  /* PMSAv7 MPU number of supported regions */
> >  uint32_t pmsav7_dregion;
> >
> > @@ -1229,6 +1231,7 @@ enum arm_features {
> >  ARM_FEATURE_THUMB_DSP, /* DSP insns supported in the Thumb encodings */
> >  ARM_FEATURE_PMU, /* has PMU support */
> >  ARM_FEATURE_VBAR, /* has cp15 VBAR */
> > +ARM_FEATURE_RAS_EXTENSION, /*has RAS extension support */
> 
> Missing space after '/*' ?
Yes, thanks for the pointing out.

> 
> >  };
> >
> >  static inline int arm_feature(CPUARMState *env, int feature) diff
> > --git a/target/arm/kvm64.c b/target/arm/kvm64.c index a16abc8..0781367
> > 100644
> > --- a/target/arm/kvm64.c
> > +++ b/target/arm/kvm64.c
> > @@ -518,6 +518,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
> >  unset_feature(>features, ARM_FEATURE_PMU);
> >  }
> >
> > +if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_RAS_EXTENSION)) {
> > +cpu->has_ras_extension = true;
> > +set_feature(>features, ARM_FEATURE_RAS_EXTENSION);
> > +} else {
> > +cpu->has_ras_extension = false;
> > +unset_feature(>features, ARM_FEATURE_RAS_EXTENSION);
> > +}
> > +
> 
> Shouldn't we need to also tell the kernel that we actually want it to expose 
> RAS to the guest? Compare the PMU code in this function,
> where we set a kvm_init_features bit to do this.
> (This suggests that your ABI for the kernel part of this feature may not be 
> correct?)

In the PMU code, it indeed sets a kvm_init_features bit. Here ARM James has a 
concern that we are depend on the host CPU RAS extension,
He means that if userspace receives the SIGBUS delivered by host 
memory_failure(), user space should record the CPER for guest
and handling the error regardless whether host CPU supports RAS extension. But 
I think if user space receives the SIGBUS signal, that means
host CPU RAS module detects the error or CPU consumes the poison data, thus we 
should check whether physical CPU support RAS extension. 

> 
> You should also not be calling set_feature() here -- if the CPU features bit 
> doesn't say "this CPU should have the RAS extensions" we
> shouldn't create a CPU with them. Instead you should set it in 
> kvm_arm_get_host_cpu_features() (again, compare the PMU code).

Understand, I will loop you to another mail thread to consult with you that 
whether userspace should detect CPU RAS extension.
If all agree to detect CPU RAS feature, I will fix the issue that you pointing 
out.

> 
> thanks
> -- PMM

答复: [PATCH v11 5/6] target-arm: kvm64: handle SIGBUS signal for synchronous External Abort

2017-09-08 Thread gengdongjiu

[...]
> >
> > /*
> >  * xx
> >  */
> > void kvm_hwpoison_page_add(ram_addr_t ram_addr);
> 
> It should be in the doc-comment format, which begins "/**" and has some 
> stylization of how you list parameters and so on. Lots of
> examples in the existing headers.

understand, thanks for the explanation.

> 
> thanks
> -- PMM

Re: [PATCH v11 5/6] target-arm: kvm64: handle SIGBUS signal for synchronous External Abort

2017-09-08 Thread gengdongjiu

Hi peter,
  Sorry for the late response.

> 
> On 18 August 2017 at 15:23, Dongjiu Geng  wrote:
> > Add SIGBUS signal handler. In this handler, it checks the exception
> > type, translates the host VA which is delivered by host or KVM to
> > guest PA, then fills this PA to CPER, finally injects a Error to guest
> > OS through KVM.
> >
> > Add synchronous external abort injection logic, setup spsr_elx,
> > esr_elx, PSTATE, far_elx, elr_elx etc, when switch to guest OS, it
> > will jump to the synchronous external abort vector table entry.
> >
> > Signed-off-by: Dongjiu Geng 
> > Signed-off-by: Quanming Wu 
> > ---
> >  include/sysemu/kvm.h  |   2 +-
> >  linux-headers/asm-arm64/kvm.h |   5 ++
> >  target/arm/internals.h|  13 
> >  target/arm/kvm.c  |  34 ++
> >  target/arm/kvm64.c| 150 
> > ++
> >  target/arm/kvm_arm.h  |   1 +
> >  6 files changed, 204 insertions(+), 1 deletion(-)
> 
> Have you tested whether this patchset builds OK on aarch32 ?


Sorry, I have not tested the build on aarch32, because we only support RAS 
extension on aarch64 in software.
I will fix the build issue on aarch32.

> 
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index
> > 3a458f5..90c1605 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -361,7 +361,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
> >  /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */  unsigned
> > long kvm_arch_vcpu_id(CPUState *cpu);
> >
> > -#ifdef TARGET_I386
> > +#if defined(TARGET_I386) || defined(TARGET_AARCH64)
> >  #define KVM_HAVE_MCE_INJECTION 1
> >  void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
> > #endif diff --git a/linux-headers/asm-arm64/kvm.h
> > b/linux-headers/asm-arm64/kvm.h index d254700..5909c30 100644
> > --- a/linux-headers/asm-arm64/kvm.h
> > +++ b/linux-headers/asm-arm64/kvm.h
> > @@ -181,6 +181,11 @@ struct kvm_arch_memory_slot {  #define
> > KVM_REG_ARM64_SYSREG_OP2_MASK  0x0007  #define
> > KVM_REG_ARM64_SYSREG_OP2_SHIFT 0
> >
> > +/* AArch64 fault registers */
> > +#define KVM_REG_ARM64_FAULT (0x0014 << 
> > KVM_REG_ARM_COPROC_SHIFT)
> > +#define KVM_REG_ARM64_FAULT_ESR_EC  (0)
> > +#define KVM_REG_ARM64_FAULT_FAR (1)
> > +
> >  #define ARM64_SYS_REG_SHIFT_MASK(x,n) \
> > (((x) << KVM_REG_ARM64_SYSREG_ ## n ## _SHIFT) & \
> > KVM_REG_ARM64_SYSREG_ ## n ## _MASK)
> 
> Again, linux-headers changes need to go in their own header sync patch.


Ok.

> 
> > diff --git a/target/arm/internals.h b/target/arm/internals.h index
> > 1f6efef..fc0ad6d 100644
> > --- a/target/arm/internals.h
> > +++ b/target/arm/internals.h
> > @@ -235,6 +235,19 @@ enum arm_exception_class {  #define
> > ARM_EL_ISV_SHIFT 24  #define ARM_EL_IL (1 << ARM_EL_IL_SHIFT)  #define
> > ARM_EL_ISV (1 << ARM_EL_ISV_SHIFT)
> > +#define ARM_EL_EC_MASK  ((0x3F) << ARM_EL_EC_SHIFT) #define
> > +ARM_EL_FSC_TYPE (0x3C)
> > +
> > +#define FSC_SEA (0x10)
> > +#define FSC_SEA_TTW0(0x14)
> > +#define FSC_SEA_TTW1(0x15)
> > +#define FSC_SEA_TTW2(0x16)
> > +#define FSC_SEA_TTW3(0x17)
> > +#define FSC_SECC(0x18)
> > +#define FSC_SECC_TTW0   (0x1c)
> > +#define FSC_SECC_TTW1   (0x1d)
> > +#define FSC_SECC_TTW2   (0x1e)
> > +#define FSC_SECC_TTW3   (0x1f)
> >
> >  /* Utility functions for constructing various kinds of syndrome value.
> >   * Note that in general we follow the AArch64 syndrome values; in a
> > diff --git a/target/arm/kvm.c b/target/arm/kvm.c index
> > 7c17f0d..2e1716a 100644
> > --- a/target/arm/kvm.c
> > +++ b/target/arm/kvm.c
> > @@ -129,6 +129,39 @@ void kvm_arm_destroy_scratch_host_vcpu(int *fdarray)
> >  }
> >  }
> >
> > +typedef struct HWPoisonPage {
> > +ram_addr_t ram_addr;
> > +QLIST_ENTRY(HWPoisonPage) list;
> > +} HWPoisonPage;
> > +
> > +static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
> > +QLIST_HEAD_INITIALIZER(hwpoison_page_list);
> > +
> > +static void kvm_unpoison_all(void *param) {
> > +HWPoisonPage *page, *next_page;
> > +
> > +QLIST_FOREACH_SAFE(page, _page_list, list, next_page) {
> > +QLIST_REMOVE(page, list);
> > +qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
> > +g_free(page);
> > +}
> > +}
> > +
> > +void kvm_hwpoison_page_add(ram_addr_t ram_addr) {
> > +HWPoisonPage *page;
> > +
> > +QLIST_FOREACH(page, _page_list, list) {
> > +if (page->ram_addr == ram_addr) {
> > +return;
> > +}
> > +}
> > +page = g_new(HWPoisonPage, 1);
> > +page->ram_addr = ram_addr;
> > +QLIST_INSERT_HEAD(_page_list, page, list); }
> 
> This code has all just been copied-and-pasted from target/i386/kvm.c.
> Please instead abstract it out properly into a cpu-independent source file.


Yes, it copied from x86.
Do you mean abstracting this code

Re: [PATCH v6 3/7] acpi: apei: remove the unused code

2017-09-11 Thread gengdongjiu

James,
   Thanks for the review.

On 2017/9/9 2:17, James Morse wrote:
> Hi gengdongjiu,
> 
> On 04/09/17 12:43, gengdongjiu wrote:
>> On 2017/9/1 1:50, James Morse wrote:
>>> On 28/08/17 11:38, Dongjiu Geng wrote:
>>>> In current code logic, the two functions ghes_sea_add() and
>>>> ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA
>>>> is defined. If not, it will return errors in the ghes_probe()
>>>> and not contiue. Hence, remove the unnecessary handling when
>>>> CONFIG_ACPI_APEI_SEI is not defined.
>>>
>>> This doesn't match what the patch does. I get this feeling this is needed 
>>> for
>>> some future patch you haven't included.
>>
>> James, let check the code, when the ghes_probe, if the CONFIG_ACPI_APEI_SEA 
>> is not defined.
>> it will return -ENOTSUPP and goto error, and the ghes_sea_add has no chance 
>> to execute.
>> similar, if the probe is failed, it should not have chance to execute 
>> ghes_sea_remove.
> 
> It's the 'unnecessary handling when CONFIG_ACPI_APEI_SEI' in the commit 
> message
> that confuses me: this patch doesn't reference that Kconfig symbol. I guess 
> that
> sentence needs removing for this v6?
 thanks for the pointing out, That needs to be removed for v6.

> 
> Re-reading without that part of the commit-message:
> 
> You're relying on the compiler's dead-code elimination to spot unused static
> functions and silently drop them. Great!
> (there is the small risk that gcc 3.2[0] can't do this, x86 still has to 
> support
> this gcc version)
> 
> As this is just clean-up patch can you break it out of this series, it isn't
> needed to add support for SEI.
sure, I will.

> 
> (This series adds support for what should be an APEI notification, but the 
> only
> code that touches APEI removes some code from a different notification 
> method.)

understand.

> 
> 
>>> Setting NOTIFY_SEI as the GHES entry's notification type means the OS should
>>> check the GHES->ErrorStatusAddress for CPER records when it receives an
>>> SError-Interrupt, as it may be a notification of an error from this error 
>>> source.
> 
>> previously I added the NOTIFY_SEI support,
> 
> (Yes, I saw that in v5 and expected this series to add some APEI support code 
> )
> 
> 
>> but consider the error address in CPER is not accurate and calling 
>> memory_failure() may not make sense.
>> so I remove it.
> 
> 'not accurate'... this is going to be a problem, but lets keep that discussion
> on the cover-letter.

Ok.


> 
> 
>>> If you aren't handling the notification, why is this is in the HEST at all?
>>> (and if its not: its not firmware-first)
> 
>> For the SEI notification, may be we can parse and handle the CPER record 
>> other than the Error physical address
> 
> Sure, but I only see this cleanup patch in this series, where does APEI learn
> about NOTIFY_SEI? As this is nothing will ever touch those CPER records, if
> you're using GHESv2 firmware will be prevented from delivering subsequent
> notifications.
James, whether it is possible you can review the previous v5 patch which adds 
the support for NOTIFY_SEI? thanks in advancecIn that patch, I share the SEI 
notification handling with the SEA notification handling to avoid duplicated 
code.

https://www.spinics.net/lists/arm-kernel/msg601767.html

> 
> 
> Thanks,
> 
> James
> 
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/README.rst#n251
> 
> .
>

Re: [PATCH] arm64: KVM: Skip PSTATE.PAN reest at EL2 in non-VHE

2017-09-11 Thread gengdongjiu

Hi Vladimir,

On 2017/9/11 19:20, Vladimir Murzin wrote:
> On 11/09/17 12:16, Dongjiu Geng wrote:
>> PSTATE.PAN disables reading and/or writing to a userspace virtual
>> address from EL1 in non-VHE or from EL2 in VHE. In non-VHE, there is
>> no any userspace mapping at EL2, so no need to reest the PSTATE.PAN.
>  ^
>  reset
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Haibin Zhang 
>> ---
>>  arch/arm64/kvm/hyp/entry.S | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 12ee62d6d410..86a7549b1b4c 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -96,8 +96,12 @@ ENTRY(__guest_exit)
>>  
>>  add x1, x1, #VCPU_CONTEXT
>>  
>> -ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
>> +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
>> +b   2f  // skip PAN at EL2 in non-VHE
>> +alternative_else_nop_endif
>>  
>> +ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
>> +2:
>>  // Store the guest regs x2 and x3
>>  stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
>>  
>>
> 
> Ok. Probably I need to say why original patch did not consider non-VHE case:
> - VHE and PAN features come within the same v8.1 extension bundle, so it is
>   unlucky to see IRL implementation with PAN but no VHE.
> - Given above the only case where extra PAN instruction could count is
>   VHE-enabled system with CONFIG_ARM64_VHE is not set; However, IMO, usecase 
> for
>   such setup is kind of debugging; it is quite obvious that those who care of
>   performance should not disable VHE in the first place...
thanks for the explanation.


> 
> Nit:
> In general it is not polite to keep posting patches in a middle of the merge
> window - people are busy with more important stuff...
I do not know when you are busy and in merge window


> 
> Cheers
> Vladimir
> 
> .
>

Re: [PATCH v11 6/6] target-arm: kvm64: Handle SError interrupt for the guest OS

2017-09-11 Thread gengdongjiu

Hi peter,

> 
> On 18 August 2017 at 15:23, Dongjiu Geng  wrote:
> > When guest OS happens SError interrupt(SEI), it will trap to host.
> > Host firstly calls memory failure to deal with this error and decide
> > whether it needs to deliver SIGBUS signal to userspace. The advantage
> > that using signal to notify is that it can make the notification
> > method is general, non-KVM user can also use it. when userspace gets
> > this signal and knows this is SError interrupt, it will translate the
> > delivered host VA to PA and record this PA to GHES.
> >
> > Because ARMv8.2 adds an extension to RAS to allow system software
> > insert implicit Error Synchronization Barrier operations to isolate
> > the error and allow passes specified syndrome to guest OS, so after
> > record the CPER, user space calls IOCTL to pass a specified syndrome
> > to KVM, then switch to guest OS, guest OS can use the recorded CPER
> > record and syndrome information to do the recovery.
> >
> > The steps are shown below:
> > 1. translate the host VA to guest OS PA and record this error PA to HEST 
> > table.
> > 2. set specified virtual SError syndrome and pass the value to KVM.
> >
> > Signed-off-by: Dongjiu Geng 
> > Signed-off-by: Quanming Wu 
> > ---
> >  linux-headers/linux/kvm.h |  1 +
> >  target/arm/internals.h|  1 +
> >  target/arm/kvm64.c| 28 
> >  3 files changed, 30 insertions(+)
> >
> > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> > index 2aa176e..10dfcab 100644
> > --- a/linux-headers/linux/kvm.h
> > +++ b/linux-headers/linux/kvm.h
> > @@ -1356,6 +1356,7 @@ struct kvm_s390_ucas_mapping {
> >  /* Available with KVM_CAP_S390_CMMA_MIGRATION */
> >  #define KVM_S390_GET_CMMA_BITS  _IOWR(KVMIO, 0xb8, struct 
> > kvm_s390_cmma_log)
> >  #define KVM_S390_SET_CMMA_BITS  _IOW(KVMIO, 0xb9, struct 
> > kvm_s390_cmma_log)
> > +#define KVM_ARM_SEI _IO(KVMIO,   0xb10)
> >
> >  #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
> >  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> > diff --git a/target/arm/internals.h b/target/arm/internals.h index
> > fc0ad6d..18b1cbc 100644
> > --- a/target/arm/internals.h
> > +++ b/target/arm/internals.h
> > @@ -237,6 +237,7 @@ enum arm_exception_class {  #define ARM_EL_ISV (1
> > << ARM_EL_ISV_SHIFT)  #define ARM_EL_EC_MASK  ((0x3F) <<
> > ARM_EL_EC_SHIFT)  #define ARM_EL_FSC_TYPE (0x3C)
> > +#define ARM_EL_ISS_MASK ((1 << ARM_EL_IL_SHIFT) - 1)
> >
> >  #define FSC_SEA (0x10)
> >  #define FSC_SEA_TTW0(0x14)
> > diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index
> > d3bdab2..b84cb49 100644
> > --- a/target/arm/kvm64.c
> > +++ b/target/arm/kvm64.c
> > @@ -616,6 +616,22 @@ static int kvm_arm_cpreg_value(ARMCPU *cpu, ptrdiff_t 
> > fieldoffset)
> >  return -EINVAL;
> >  }
> >
> > +static int kvm_inject_arm_sei(CPUState *cs) {
> > +ARMCPU *cpu = ARM_CPU(cs);
> > +CPUARMState *env = >env;
> > +
> > +unsigned long syndrome = env->exception.vaddress;
> > +/* set virtual SError syndrome */
> > +if (arm_feature(env, ARM_FEATURE_RAS_EXTENSION)) {
> > +syndrome = syndrome & ARM_EL_ISS_MASK;
> > +} else {
> > +syndrome = 0;
> > +}
> > +
> > +return  kvm_vcpu_ioctl(CPU(cpu), KVM_ARM_SEI, );
> 
> This looks odd. If we don't have the RAS extension why do we need to do 
> anything at all here ?

This is because Qemu may need to support non-RAS extension as discussed with 
ARM James before.
That is to say host hardware CPU does not support RAS, but guest supports.
That is under discussion.
When host hardware supports RAS, specify the syndrome to a valid value, 
otherwise, set it to 0.

> 
> > +}
> > +
> >  /* Inject synchronous external abort */  static int
> > kvm_inject_arm_sea(CPUState *c)  { @@ -1007,6 +1023,15 @@ static bool
> > is_abort_sea(unsigned long syndrome)
> >  }
> >  }
> >
> > +static bool is_abort_sei(unsigned long syndrome) {
> > +uint8_t ec = ((syndrome & ARM_EL_EC_MASK) >> ARM_EL_EC_SHIFT);
> 
> You don't need to bother masking here -- in other places in QEMU we assume 
> that the EC field is at the top of the word, so just "syndrome >>
> ARM_EL_EC_SHIFT" is sufficient.

OK, thanks for the suggestion.

> 
> > +if ((ec != EC_SERROR))
> > +return false;
> > +else
> > +return true;
> 
> scripts/checkpatch.pl should tell you that this if needs braces (it's good to 
> get in the habit of running it on all patches; it is not always
> correct, so judgement is required, but it will flag up some common mistakes).
> 
> In this particular case, you should just
>return ec == EC_SERROR;
> though.

Good suggestion.

> 
> > +}
> > +
> >  void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)  {
> >  ram_addr_t ram_addr;
> > @@ -1024,6 +1049,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, 
> > void *addr)
> >  if

Re: [PATCH v11 4/6] target-arm: kvm64: detect guest RAS EXTENSION feature

2017-09-06 Thread gengdongjiu

Hi Peter,
  Thanks very much for your review, I will check your comments in detail and 
reply.


On 2017/9/6 1:26, Peter Maydell wrote:
> On 18 August 2017 at 15:23, Dongjiu Geng  wrote:
>> check if kvm supports guest RAS EXTENSION. if so, set
>> corresponding feature bit for vcpu.
>>
>> Signed-off-by: Dongjiu Geng 
>> ---
>>  linux-headers/linux/kvm.h | 1 +
>>  target/arm/cpu.h  | 3 +++
>>  target/arm/kvm64.c| 8 
>>  3 files changed, 12 insertions(+)
>>
>> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
>> index 7971a4f..2aa176e 100644
>> --- a/linux-headers/linux/kvm.h
>> +++ b/linux-headers/linux/kvm.h
>> @@ -929,6 +929,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_PPC_SMT_POSSIBLE 147
>>  #define KVM_CAP_HYPERV_SYNIC2 148
>>  #define KVM_CAP_HYPERV_VP_INDEX 149
>> +#define KVM_CAP_ARM_RAS_EXTENSION 150
>>
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>
> 
> Hi. Changes to linux-headers need to be done as a patch of their
> own created using scripts/update-linux-headers.sh run against a
> mainline kernel tree (and with a commit message that quotes the
> kernel commit hash used). This ensures that we have a consistent
> set of headers that don't diverge from the kernel copy.
> 
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index b39d64a..6b0961b 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -611,6 +611,8 @@ struct ARMCPU {
>>
>>  /* CPU has memory protection unit */
>>  bool has_mpu;
>> +/* CPU has ras extension unit */
>> +bool has_ras_extension;
>>  /* PMSAv7 MPU number of supported regions */
>>  uint32_t pmsav7_dregion;
>>
>> @@ -1229,6 +1231,7 @@ enum arm_features {
>>  ARM_FEATURE_THUMB_DSP, /* DSP insns supported in the Thumb encodings */
>>  ARM_FEATURE_PMU, /* has PMU support */
>>  ARM_FEATURE_VBAR, /* has cp15 VBAR */
>> +ARM_FEATURE_RAS_EXTENSION, /*has RAS extension support */
> 
> Missing space after '/*' ?
> 
>>  };
>>
>>  static inline int arm_feature(CPUARMState *env, int feature)
>> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
>> index a16abc8..0781367 100644
>> --- a/target/arm/kvm64.c
>> +++ b/target/arm/kvm64.c
>> @@ -518,6 +518,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>>  unset_feature(>features, ARM_FEATURE_PMU);
>>  }
>>
>> +if (kvm_check_extension(cs->kvm_state, KVM_CAP_ARM_RAS_EXTENSION)) {
>> +cpu->has_ras_extension = true;
>> +set_feature(>features, ARM_FEATURE_RAS_EXTENSION);
>> +} else {
>> +cpu->has_ras_extension = false;
>> +unset_feature(>features, ARM_FEATURE_RAS_EXTENSION);
>> +}
>> +
> 
> Shouldn't we need to also tell the kernel that we actually want
> it to expose RAS to the guest? Compare the PMU code in this
> function, where we set a kvm_init_features bit to do this.
> (This suggests that your ABI for the kernel part of this feature
> may not be correct?)
> 
> You should also not be calling set_feature() here -- if the
> CPU features bit doesn't say "this CPU should have the RAS
> extensions" we shouldn't create a CPU with them. Instead
> you should set it in kvm_arm_get_host_cpu_features() (again,
> compare the PMU code).
> 
> thanks
> -- PMM
> 
> .
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu

Hi Marc,

On 2017/9/6 16:17, Marc Zyngier wrote:
> On 05/09/17 19:58, gengdongjiu wrote:
>> when exit from guest, some host PSTATE bits may be lost, such as
>> PSTATE.PAN or PSTATE.UAO. It is because host and hypervisor all run
>> in the EL2, host PSTATE value cannot be saved and restored via
>> SPSR_EL2. So if guest has changed the PSTATE, host continues with
>> a wrong value guest has set.
>>
>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>> Signed-off-by: Haibin Zhang <zhanghaib...@huawei.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  8 +++
>>  arch/arm64/include/asm/kvm_hyp.h  |  2 ++
>>  arch/arm64/include/asm/sysreg.h   | 23 +++
>>  arch/arm64/kvm/hyp/entry.S|  2 --
>>  arch/arm64/kvm/hyp/switch.c   | 24 ++--
>>  arch/arm64/kvm/hyp/sysreg-sr.c| 48 
>> ---
>>  6 files changed, 100 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index e923b58..cba7d3e 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -193,6 +193,12 @@ struct kvm_cpu_context {
>>  };
>>  };
>>  
>> +struct kvm_cpu_host_pstate {
>> +u64 daif;
>> +u64 uao;
>> +u64 pan;
>> +};
> 
> I love it. This is the most expensive way of saving/restoring a single
> 32bit value.
> 
> More seriously, please see the discussion between James and Christoffer
> there[1]. I expect James to address the PAN/UAO states together with the
> debug state in the next iteration of his patch.

I roughly see the discussion between James and Christoffer, Seems Christoffer 
does not suggest save and
restore it, and suggest to do below, and UAO/PAN may not use the same ways.

  __kvm_vcpu_run(struct kvm_vcpu *vcpu)
  {
  if (has_vhe())
  asm("msr daifset, #0xf");

...
 exit_code = __guest_enter(vcpu, host_ctxt);
...

if (has_vhe())
  asm("msr daifclr, #0xd");
  }


If not save/restore them, the KVM will set them according to the CPU 
capability. For example below fixing, it will check CPU capability, If CPU 
supports PAN,
the kvm will always enable the PAN for the host. But in fact, the host may be 
not enable the PAN.
Of course for the UAO, we can use the similar fixing if Marc or Christoffer is 
agreed. but seems not make sense.

commit cb96408da4e11698674abd04aeac941c1bed2038
Author: Vladimir Murzin <vladimir.mur...@arm.com>
Date:   Thu Sep 1 15:29:03 2016 +0100

arm64: KVM: VHE: reset PSTATE.PAN on entry to EL2

SCTLR_EL2.SPAN bit controls what happens with the PSTATE.PAN bit on an
exception. However, this bit has no effect on the PSTATE.PAN when
HCR_EL2.E2H or HCR_EL2.TGE is unset. Thus when VHE is used and
exception taken from a guest PSTATE.PAN bit left unchanged and we
continue with a value guest has set.

To address that always reset PSTATE.PAN on entry from EL1.

Fixes: 1f364c8c48a0 ("arm64: VHE: Add support for running Linux in EL2 
mode")

Signed-off-by: Vladimir Murzin <vladimir.mur...@arm.com>
Reviewed-by: James Morse <james.mo...@arm.com>
Acked-by: Marc Zyngier <marc.zyng...@arm.com>
Cc: <sta...@vger.kernel.org> # v4.6+
Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 3967c231..b5926ee 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -96,6 +96,8 @@ ENTRY(__guest_exit)

add x1, x1, #VCPU_CONTEXT

+   ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
+
// Store the guest regs x2 and x3
stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]


> 
> Thanks,
> 
>   M.
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg599798.html
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu

For UAO, if not save/restore PSTATE.UAO, we can use below fixing.

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9341376..c3dd761 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -21,6 +21,8 @@
 #include 
 #include 

+#include 
+
 /* Yes, this does nothing, on purpose */
 static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }

@@ -121,8 +123,13 @@ static void __hyp_text __sysreg_restore_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
 }

+static void __hyp_text __sysreg_restore_state_vhe(struct kvm_cpu_context *ctxt)
+{
+uao_thread_switch(current);
+}
+
 static hyp_alternate_select(__sysreg_call_restore_host_state,
-   __sysreg_restore_state, __sysreg_do_nothing,
+   __sysreg_restore_state, __sysreg_restore_state_vhe,
ARM64_HAS_VIRT_HOST_EXTN);

 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)



On 2017/9/6 17:32, gengdongjiu wrote:
> Hi Marc,
> 
> On 2017/9/6 16:17, Marc Zyngier wrote:
>> On 05/09/17 19:58, gengdongjiu wrote:
>>> when exit from guest, some host PSTATE bits may be lost, such as
>>> PSTATE.PAN or PSTATE.UAO. It is because host and hypervisor all run
>>> in the EL2, host PSTATE value cannot be saved and restored via
>>> SPSR_EL2. So if guest has changed the PSTATE, host continues with
>>> a wrong value guest has set.
>>>
>>> Signed-off-by: Dongjiu Geng <gengdong...@huawei.com>
>>> Signed-off-by: Haibin Zhang <zhanghaib...@huawei.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_host.h |  8 +++
>>>  arch/arm64/include/asm/kvm_hyp.h  |  2 ++
>>>  arch/arm64/include/asm/sysreg.h   | 23 +++
>>>  arch/arm64/kvm/hyp/entry.S|  2 --
>>>  arch/arm64/kvm/hyp/switch.c   | 24 ++--
>>>  arch/arm64/kvm/hyp/sysreg-sr.c| 48 
>>> ---
>>>  6 files changed, 100 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index e923b58..cba7d3e 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -193,6 +193,12 @@ struct kvm_cpu_context {
>>> };
>>>  };
>>>  
>>> +struct kvm_cpu_host_pstate {
>>> +   u64 daif;
>>> +   u64 uao;
>>> +   u64 pan;
>>> +};
>>
>> I love it. This is the most expensive way of saving/restoring a single
>> 32bit value.
>>
>> More seriously, please see the discussion between James and Christoffer
>> there[1]. I expect James to address the PAN/UAO states together with the
>> debug state in the next iteration of his patch.
> 
> I roughly see the discussion between James and Christoffer, Seems Christoffer 
> does not suggest save and
> restore it, and suggest to do below, and UAO/PAN may not use the same ways.
> 
>   __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   {
>   if (has_vhe())
>   asm("msr daifset, #0xf");
> 
>   ...
>  exit_code = __guest_enter(vcpu, host_ctxt);
>   ...
> 
>   if (has_vhe())
>   asm("msr daifclr, #0xd");
>   }
> 
> 
> If not save/restore them, the KVM will set them according to the CPU 
> capability. For example below fixing, it will check CPU capability, If CPU 
> supports PAN,
> the kvm will always enable the PAN for the host. But in fact, the host may be 
> not enable the PAN.
> Of course for the UAO, we can use the similar fixing if Marc or Christoffer 
> is agreed. but seems not make sense.
> 
> commit cb96408da4e11698674abd04aeac941c1bed2038
> Author: Vladimir Murzin <vladimir.mur...@arm.com>
> Date:   Thu Sep 1 15:29:03 2016 +0100
> 
> arm64: KVM: VHE: reset PSTATE.PAN on entry to EL2
> 
> SCTLR_EL2.SPAN bit controls what happens with the PSTATE.PAN bit on an
> exception. However, this bit has no effect on the PSTATE.PAN when
> HCR_EL2.E2H or HCR_EL2.TGE is unset. Thus when VHE is used and
> exception taken from a guest PSTATE.PAN bit left unchanged and we
> continue with a value guest has set.
> 
> To address that always reset PSTATE.PAN on entry from EL1.
> 
> Fixes: 1f364c8c48a0 ("arm64: VHE: Add support for running Linux in EL2 
> mode")
> 
> Signed-off-by: Vladimir Murzin <vladimir.mur...@arm.com>
> Reviewed-by: James Morse <james.mo...@arm.com>
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu



On 2017/9/6 20:00, Vladimir Murzin wrote:
> On 06/09/17 11:35, gengdongjiu wrote:
>> Vladimir,
>>
>> On 2017/9/6 17:41, Vladimir Murzin wrote:
>>> Can you please elaborate on cases where PAN is not enabled?
>>
>> I mean the informal private usage, For example, he disabled the PAN 
>> dynamically to let kernel space to access the user space.
>> After he dynamic disabled the PAN, then switched to guest OS. after return 
>> to host. he found the PAN stage is modified.
>> Of cause this is not a formal usage, in our host kernel, it is always 
>> enabled, no dynamic change, but I means it may exist such cases.
>>
>>
> 
> So, in short, there is no real issue with PAN, right? What about UAO?
For the PAN, if host OS dynamically enable/disable PAN should have issue.
Do you think that is not a issue as above description?

"host OS dynamically disable the PAN, but after go back from the guest OS, The 
PAN is unexpectedly enabled"

> 
> Cheers
> Vladimir
> 
> .
>

Re: [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM

2017-09-06 Thread gengdongjiu

Hi Peter,

On 2017/9/6 19:19, Peter Maydell wrote:
> On 28 August 2017 at 11:38, Dongjiu Geng  wrote:
>> In the firmware-first RAS solution, corrupt data is detected in a
>> memory location when guest OS application software executing at EL0
>> or guest OS kernel El1 software are reading from the memory. The
>> memory node records errors in an error record accessible using
>> system registers.
> 
> Hi Dongjiu -- it looks like this patch set is extending
> the API KVM provides to userspace, but it doesn't update
> the documentation in Documentation/virtual/kvm/api.txt.
> I appreciate the API is still somewhat under discussion,
> but if you can include the docs updates it's helpful to
> me for reviewing whether the API makes sense from the
> userspace consumer end of it.

sure, it should. thanks a lot for the reminder. I will update the related docs
in my next patch set version.

> 
> thanks
> -- PMM
> 
> .
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu



On 2017/9/6 20:00, Vladimir Murzin wrote:
> On 06/09/17 11:35, gengdongjiu wrote:
>> Vladimir,
>>
>> On 2017/9/6 17:41, Vladimir Murzin wrote:
>>> Can you please elaborate on cases where PAN is not enabled?
>>
>> I mean the informal private usage, For example, he disabled the PAN 
>> dynamically to let kernel space to access the user space.
>> After he dynamic disabled the PAN, then switched to guest OS. after return 
>> to host. he found the PAN stage is modified.
>> Of cause this is not a formal usage, in our host kernel, it is always 
>> enabled, no dynamic change, but I means it may exist such cases.
>>
>>
> 
> So, in short, there is no real issue with PAN, right? What about UAO?
For the pstate.UAO, current code has issue from my test. Because after 
switching from guest os, it does not set pstate.UAO again.
PAN is set again in your previous patch when switch to host, but UAO is not.
If you have concern about the save/restore PSTATE bits, may be we can use below 
modification to fix UAO issue.

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9341376..c3dd761 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -21,6 +21,8 @@
 #include 
 #include 

+#include 

 /* Yes, this does nothing, on purpose */
 static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }

@@ -121,8 +123,13 @@ static void __hyp_text __sysreg_restore_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
 }

+static void __hyp_text __sysreg_restore_state_vhe(struct kvm_cpu_context *ctxt)
+{
+uao_thread_switch(current);
+}
+
 static hyp_alternate_select(__sysreg_call_restore_host_state,
-   __sysreg_restore_state, __sysreg_do_nothing,
+   __sysreg_restore_state, __sysreg_restore_state_vhe,
ARM64_HAS_VIRT_HOST_EXTN);

 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)


> 
> Cheers
> Vladimir
> 
> .
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu

Vladimir,

On 2017/9/6 17:41, Vladimir Murzin wrote:
> Can you please elaborate on cases where PAN is not enabled?

I mean the informal private usage, For example, he disabled the PAN dynamically 
to let kernel space to access the user space.
After he dynamic disabled the PAN, then switched to guest OS. after return to 
host. he found the PAN stage is modified.
Of cause this is not a formal usage, in our host kernel, it is always enabled, 
no dynamic change, but I means it may exist such cases.

Re: [PATCH v6 4/7] arm64: kvm: support user space to query RAS extension feature

2017-09-05 Thread gengdongjiu

James,

On 2017/9/1 2:04, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 28/08/17 11:38, Dongjiu Geng wrote:
>> In ARMV8.2 RAS extension, a virtual SError exception syndrome
>> register(VSESR_EL2) is added.  This value may be specified from
>> userspace.
> 
> I agree that the CPU support for injecting SErrors with a specified ESR should
> be exposed to KVM's user space...
Ok, thanks.

> 
> 
>> Userspace will want to check if the CPU has the RAS
>> extension. 
> 
> ... but user-space wants to know if it can inject SErrors with a specified 
> ESR.
> 
> What if we gain another way of doing this that isn't via the RAS-extensions, 
> now
> user-space has to check for two capabilities.
> 
> 
>> If it has, it wil specify the virtual SError syndrome
>> value, otherwise it will not be set. This patch adds support for
>> querying the availability of this extension.
> 
> I'm against telling user-space what features the CPU has unless it can use 
> them
> directly. In this case we are talking about a KVM API, so we should describe 
> the
> API not the CPU.

shenglong (zhaoshengl...@huawei.com) who is Qemu maintainer suggested checking 
the CPU RAS-extensions
to decide whether generate the APEI table and record CPER for the guest OS in 
the user space.
he means if the host does not support RAS, user space may also not support RAS.

> 
> 
>> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
>> index 3256b9228e75..b7313ee028e9 100644
>> --- a/arch/arm64/kvm/reset.c
>> +++ b/arch/arm64/kvm/reset.c
>> @@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
>> long ext)
>>  case KVM_CAP_ARM_PMU_V3:
>>  r = kvm_arm_support_pmu_v3();
>>  break;
>> +case KVM_CAP_ARM_RAS_EXTENSION:
> 
> This should be called something more like 'KVM_CAP_ARM_INJECT_SERROR_ESR'
I understand your suggestion.

> 
> 
>> +r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>> +break;
>>  case KVM_CAP_SET_GUEST_DEBUG:
>>  case KVM_CAP_VCPU_ATTRIBUTES:
>>  r = 1;
> 
> 
> We can inject SError on systems without the RAS extensions using just the
> HCR_EL2.VSE bit. We may want to make the 'ESR' part of the API optional, or
> expose '1' for the without-ESR version and '2 for with-ESR, (however we choose
> to implement that).
> 
> The risk is if we want to add a without-ESR version later, and the name we 
> make
> ABI now turned out to be a mistake. Marc or Christoffer probably have the best
> view of this. (no-one has needed it so far...)
> 
> 
> Thanks,
> 
> James
> 
> 
> .
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu



On 2017/9/6 20:30, Vladimir Murzin wrote:
> On 06/09/17 13:14, gengdongjiu wrote:
>>
>>
>> On 2017/9/6 20:00, Vladimir Murzin wrote:
>>> On 06/09/17 11:35, gengdongjiu wrote:
>>>> Vladimir,
>>>>
>>>> On 2017/9/6 17:41, Vladimir Murzin wrote:
>>>>> Can you please elaborate on cases where PAN is not enabled?
>>>>
>>>> I mean the informal private usage, For example, he disabled the PAN 
>>>> dynamically to let kernel space to access the user space.
>>>> After he dynamic disabled the PAN, then switched to guest OS. after return 
>>>> to host. he found the PAN stage is modified.
>>>> Of cause this is not a formal usage, in our host kernel, it is always 
>>>> enabled, no dynamic change, but I means it may exist such cases.
>>>>
>>>>
>>>
>>> So, in short, there is no real issue with PAN, right? What about UAO?
>> For the PAN, if host OS dynamically enable/disable PAN should have issue.
>> Do you think that is not a issue as above description?
>>
>> "host OS dynamically disable the PAN, but after go back from the guest OS, 
>> The PAN is unexpectedly enabled"
> 
> Do you see effect of "PAN is unexpectedly enabled"?
In fact I did not encounter this case, but I think it can exist.
I think if host OS dynamically disable PAN, it wants the host kernel access the 
user space address space not through copy_to/from_user API.
Now if it is unexpectedly enabled, when host kernel still accesses the user 
space address,  it will happen MMU fault exception.


> 
> Cheers
> Vladimir
> 
>>
>>>
>>> Cheers
>>> Vladimir
>>>
>>> .
>>>
>>
>>
> 
> 
> .
>

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-06 Thread gengdongjiu

Hi, Vladimir

> >> Do you see effect of "PAN is unexpectedly enabled"?
> > In fact I did not encounter this case, but I think it can exist.
> > I think if host OS dynamically disable PAN, it wants the host kernel access 
> > the user space address space not through copy_to/from_user
> API.
> > Now if it is unexpectedly enabled, when host kernel still accesses the user 
> > space address,  it will happen MMU fault exception.
> 
> And this is expected! The only allowed channel for kernel <-> user is uaccess 
> API.
> 
> I guess that you have test (and that great!) which violates that rule (for 
> testing purpose, why not?) and now you are trying to fit kernel into
> it...


If you think that makes sense for it, we do not consider the paste.PAN in the 
world-switch.
For the pstate.UAO issue, do you agree my fixing or you have other suggestion?  
Also to other reviewer. Thanks.

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9341376..c3dd761 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -21,6 +21,8 @@
 #include 
 #include 

+#include 

 /* Yes, this does nothing, on purpose */
 static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }

@@ -121,8 +123,13 @@ static void __hyp_text __sysreg_restore_state(struct 
kvm_cpu_context *ctxt)
write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
 }

+static void __hyp_text __sysreg_restore_state_vhe(struct kvm_cpu_context *ctxt)
+{
+uao_thread_switch(current);
+}
+
 static hyp_alternate_select(__sysreg_call_restore_host_state,
-   __sysreg_restore_state, __sysreg_do_nothing,
+   __sysreg_restore_state, __sysreg_restore_state_vhe,
ARM64_HAS_VIRT_HOST_EXTN);

 void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt)


> 
> Cheers
> Vladimir
> 
> >
> >
> >>
> >> Cheers
> >> Vladimir
> >>
> >>>
> 
>  Cheers
>  Vladimir
> 
>  .
> 
> >>>
> >>>
> >>
> >>
> >> .
> >>
> >
> >

Re: [PATCH] arm64: KVM: VHE: reset PSTATE.UAO when switch to host

2017-09-07 Thread gengdongjiu

> On 07/09/17 12:49, gengdongjiu wrote:
> >
> >
> > On 2017/9/7 18:13, Marc Zyngier wrote:
> >> On 07/09/17 11:05, gengdongjiu wrote:
> >>> Hi James,
> >>>
> >>> On 2017/9/7 17:20, James Morse wrote:
> >>>> Hi Dongjiu Geng,
> >>>>
> >>>> On 07/09/17 06:54, Dongjiu Geng wrote:
> >>>>> In VHE mode, host kernel runs in the EL2 and can enable 'User
> >>>>> Access Override' when fs==KERNEL_DS so that it can access kernel
> >>>>> memory. However, PSTATE.UAO is set to 0 on an exception taken from
> >>>>> EL1 to EL2. Thus when VHE is used and exception taken from a guest
> >>>>> UAO will be disabled and host will use the incorrect PSTATE.UAO.
> >>>>> So check and reset the PSTATE.UAO when switching to host.
> >>>>
> >>>> This would only be a problem if KVM were calling into world-switch
> >>>> with fs==KERNEL_DS. I can't see where this happens.
> >>>  Not only KVM, may also kernel sets the fs == KERNEL_DS before
> >>> calling into world-switch
> >>
> >> How? Please describe the exact sequence of event that lead to this
> >> situation with the current code base.
> >
> > Hi Marc,
> >
> >Different tasks have different fs, such as USER_DS or KERNEL_DS. In
> > the context switch, it will restore the task's fs. Thus, that depends
> > on task itself, as shown below code. UAO is different with PAN, PAN will be 
> > always enabled if hardware CPU supports PAN feature, but
> UAO is dynamical change.
> 
> You haven't answered my question: There is exactly one point where we enter 
> the world-switch. Show me that, at this point, PSTATE.UAO
> *before* the call is different from PSTATE.UAO after the call. Give me the 
> exact sequence of event that leads to this situation. Show me a
> stack trace.

Hi Marc,

  If using current mainline KVM code + Qemu and modify nothing, may not exist 
broken issue,  because the Qemu progress FS should be USER_DS. But if I make a 
modification for user space( Qemu or KVM tools) to change its FS property to 
KERNEL_DS or use third party application with KERNEL_DS FS to run the guest, it 
will have problem(the KERNEL_DS is cleared). If you think my case is reasonable 
and should support, I can show you the calling stack trace. If you think, my 
case is not reasonable and KVM should not support the application with 
KERNEL_DS fs to run guest. You can ignore this patch, thanks.


> 
> Until you do this, I will ignore any further comment coming from you on this 
> subject.
> 
> Thanks,
> 
>   M.
> --
> Jazz is not dead. It just smells funny...

Re: [PATCH v11 6/6] target-arm: kvm64: Handle SError interrupt for the guest OS

2017-09-13 Thread gengdongjiu



On 2017/9/12 0:39, Peter Maydell wrote:
 +return  kvm_vcpu_ioctl(CPU(cpu), KVM_ARM_SEI, );
>>> This looks odd. If we don't have the RAS extension why do we need to do 
>>> anything at all here ?
>> This is because Qemu may need to support non-RAS extension as discussed with 
>> ARM James before.
>> That is to say host hardware CPU does not support RAS, but guest supports.
>> That is under discussion.
>> When host hardware supports RAS, specify the syndrome to a valid value, 
>> otherwise, set it to 0.
> If the guest CPU doesn't support the RAS extension then we have
> no mechanism for delivering it a notification about the
> memory problem at all, so setting the syndrome to anything
> doesn't make sense.
> 
> I'm not sure what you should do in the case of "host
> supports telling us about a memory problem and has
> done so, but guest does not support being told about it",
> but I'm pretty sure it shouldn't be this.
Hi peter,
   thanks for the comments.

   in short, if the hardware CPU does not support RAS extension, do you think 
whether the Qemu or guest OS
needs to support RAS(generate APEI table / record CPER / Error recovery).

CC James,

Hi James,
  you ever have below comments:

---
But you can use APEI in a guest on CPUs without the RAS extensions: the host may
signal memory errors to Qemu for any number of reasons.
--

in fact, I have a concern about it. If CPU without the RAS extension, the host 
should not deliver the sigbus.
in which case in your test that host still deliver sigbus without RAS?

Re: [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2

2017-09-13 Thread gengdongjiu



On 2017/9/8 0:31, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 28/08/17 11:38, Dongjiu Geng wrote:
>> ARMv8.2 adds a new bit HCR_EL2.TEA which controls to
>> route synchronous external aborts to EL2, and adds a
>> trap control bit HCR_EL2.TERR which controls to
>> trap all Non-secure EL1&0 error record accesses to EL2.
>>
>> This patch enables the two bits for the guest OS.
>> when an synchronous abort is generated in the guest OS,
> 
>> it will trap to EL3 firmware, EL3 firmware will check the
>> HCR_EL2.TEA value to decide to jump to hypervisor or host
>> OS.
> 
> (This is what you are using this for, the patch has nothing to do with EL3.)

No, EL3 will check the HCR_EL2.TEA to decide to jump to hypervisor or host 
kernel.


> 
> 
>> Enabling HCR_EL2.TERR makes error record access
>> from guest trap to EL2.
> 
> 
> KVM already handles external aborts from lower exception levels, no more work
> needs doing for TEA.
when SCR_EL3.EA is set, TEA will not workable, El3 only check its value to 
decide to hypervisor or EL1 host kernel.

> 
> What happens when a guest access the RAS-Error-Record registers?
it will trap to EL2 kvm

> 
> Before we can set HCR_EL2.TERR I think we need to add some minimal emulation 
> for
> the registers it traps. Most of them should be RAZ/WI, so it should be
> straightforward. (I think KVMs default is to emulate an undef for unknown 
> traps).

if KVM default handling is to emulate an undef for unknown traps, how about we 
use its default way? because no one access
the ERR RAS register in the guest .

> 
> Eventually we will want to back this with a page of memory that lets
> Qemu/kvmtool configure what the guest can see. (i.e. the emulated machine's
> errors for kernel-first handling.)
I think emulate it to an undef for unknown traps can be enough, no one access 
the ERR register in the guest.

> 
> 
> Thanks,
> 
> James
> 
> .
>

Re: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace

2017-09-13 Thread gengdongjiu

Hi James,


On 2017/9/8 0:30, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 28/08/17 11:38, Dongjiu Geng wrote:
>> when userspace gets SIGBUS signal, it does not know whether
>> this is a synchronous external abort or SError,
> 
> Why would Qemu/kvmtool need to know if the original notification (if there was
> one) was synchronous or asynchronous? This is between firmware and the kernel.
there are two reasons:

1. Let us firstly discuss the SEA and SEI, there are different workflow for the 
two different Errors.
2. when record the CPER in the user space, it needs to know the error type, 
because SEA and SEI are different Error source,
   so they have different offset in the APEI table, that is to say they will be 
recorded to different place of the APEI table.


 etc/acpi/tables   etc/hardware_errors

==
+ +--++--+
| | HEST ||address   |  
+--+
| +--+|registers |  
| Error Status |
| | GHES0|| ++  
| Data Block 0 |
| +--+ +->| |status_address0 
|->| ++
| | .| |  | ++  
| |  CPER  |
| | error_status_address-+-+ +--->| |status_address1 |--+   
| |  CPER  |
| | .|   || ++  |   
| |    |
| | read_ack_register+-+ ||  .   |  |   
| |  CPER  |
| | read_ack_preserve| | |+--+  |   
| +-++
| | read_ack_write   | | | +->| |status_address10|+ |   
| Error Status |
+ +--+ | | |  | ++| |   
| Data Block 1 |
| | GHES1| +-+-+->| | ack_value0 || 
+-->| ++
+ +--+   | |  | ++| 
| |  CPER  |
| | .|   | | +--->| | ack_value1 || 
| |  CPER  |
| | error_status_address-+---+ | || ++| 
| |    |
| | .| | || |  . || 
| |  CPER  |
| | read_ack_register+-+-+| ++| 
+-++
| | read_ack_preserve| |   +->| | ack_value10|| 
| |..  |
| | read_ack_write   | |   |  | ++| 
| ++
+ +--| |   |  | 
| Error Status |
| | ...  | |   |  | 
| Data Block 10|
+ +--+ |   |  
+>| ++
| | GHES10   | |   |
| |  CPER  |
+ +--+ |   |
| |  CPER  |
| | .| |   |
| |    |
| | error_status_address-+-+   |
| |  CPER  |
| | .| |
+-++
| | read_ack_register+-+
| | read_ack_preserve|
| | read_ack_write   |
+ +--+

> 
> 
> I think I can see why you need this: to choose whether to emulate SEA or SEI,
emulating SEA or SEI is one reason, another reason is that the CPER will be 
recorded to different place of APEI.


> but what if the guest wasn't running? Or the guest was running, but it wasn't
> guest-memory that is affected.
If the guest was not running, host firmware will directly notify EL1 host 
kernel to handle the error, not notify hypervisor
only if the guest was running host firmware can notify the Error to hypervisor.

If the user space is Qemu, and the error is from Qemu, and guest-memory is not 
involve.
I will not handle it, please see the code for arm64.

void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
{
ram_addr_t ram_addr;
hwaddr paddr;

ARMCPU *cpu = ARM_CPU(c);
CPUARMState *env = >env;
assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
if (addr) {
ram_addr = qemu_ram_addr_from_host(addr);
if (ram_addr != RAM_ADDR_INVALID &&
kvm_physical_memory_addr_from_host(c->kvm_state, addr, )) {
kvm_cpu_synchronize_state(c);
kvm_hwpoison_page_add(ram_addr);
if

Re: [PATCH v11 6/6] target-arm: kvm64: Handle SError interrupt for the guest OS

2017-09-13 Thread gengdongjiu



On 2017/9/13 18:52, Peter Maydell wrote:
> This question seems to be not really related to the review
> comment that it is responding to.
> 
> (1) If the host does not support notifying us about
> errors, then there is clearly nothing to do in this
> code, because we will never get a notification.
> 
> (2) If the host does support notifying us about errors,
> but we choose not to expose RAS to the guest, then
> there's not much to do either. We probably just want
> to take whatever the default behaviour is for any
> application when it touches memory that's bad.
> We definitely don't want to tell the guest anything.
> 
> (3) If the host supports notification, and we choose
> to expose RAS to the guest, then we need to do
> whatever we have to do to notify the guest.
> 
> If we're in this signal handler and also
> arm_feature(env, ARM_FEATURE_RAS) is false then that
> is case (2), and my point is that doing anything with
> the guest 'syndrome' value looks like the wrong thing.

Peter,
  your explanation is clear. OK, understand, thanks.

[PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-05 Thread gengdongjiu

when exit from guest, some host PSTATE bits may be lost, such as
PSTATE.PAN or PSTATE.UAO. It is because host and hypervisor all run
in the EL2, host PSTATE value cannot be saved and restored via
SPSR_EL2. So if guest has changed the PSTATE, host continues with
a wrong value guest has set.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Haibin Zhang 
---
 arch/arm64/include/asm/kvm_host.h |  8 +++
 arch/arm64/include/asm/kvm_hyp.h  |  2 ++
 arch/arm64/include/asm/sysreg.h   | 23 +++
 arch/arm64/kvm/hyp/entry.S|  2 --
 arch/arm64/kvm/hyp/switch.c   | 24 ++--
 arch/arm64/kvm/hyp/sysreg-sr.c| 48 ---
 6 files changed, 100 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index e923b58..cba7d3e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -193,6 +193,12 @@ struct kvm_cpu_context {
};
 };
 
+struct kvm_cpu_host_pstate {
+   u64 daif;
+   u64 uao;
+   u64 pan;
+};
+
 typedef struct kvm_cpu_context kvm_cpu_context_t;
 
 struct kvm_vcpu_arch {
@@ -227,6 +233,8 @@ struct kvm_vcpu_arch {
 
/* Pointer to host CPU context */
kvm_cpu_context_t *host_cpu_context;
+   /* Host PSTATE value */
+   struct kvm_cpu_host_pstate host_pstate;
struct {
/* {Break,watch}point registers */
struct kvm_guest_debug_arch regs;
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b..a75587a 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -134,6 +134,8 @@
 
 void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
 void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
+void __sysreg_save_host_pstate(struct kvm_vcpu *vcpu);
+void __sysreg_restore_host_pstate(struct kvm_vcpu *vcpu);
 void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
 void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 248339e..efdcf40 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -295,6 +295,29 @@
 #define SYS_ICH_LR14_EL2   __SYS__LR8_EL2(6)
 #define SYS_ICH_LR15_EL2   __SYS__LR8_EL2(7)
 
+#define REG_PSTATE_PAN sys_reg(3, 0, 4, 2, 3)
+#define REG_PSTATE_UAO sys_reg(3, 0, 4, 2, 4)
+
+#define GET_PSTATE_PAN \
+   ({  \
+   u64 reg;\
+   asm volatile(ALTERNATIVE("mov %0, #0",  \
+   "mrs_s %0, " __stringify(REG_PSTATE_PAN),\
+   ARM64_HAS_PAN)\
+   : "=r" (reg));\
+   reg;\
+   })
+
+#define GET_PSTATE_UAO \
+   ({  \
+   u64 reg;\
+   asm volatile(ALTERNATIVE("mov %0, #0",\
+   "mrs_s %0, " __stringify(REG_PSTATE_UAO),\
+   ARM64_HAS_UAO)\
+   : "=r" (reg));\
+   reg;\
+   })
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_EE(1 << 25)
 #define SCTLR_ELx_I(1 << 12)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 12ee62d..7662ef5 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -96,8 +96,6 @@ ENTRY(__guest_exit)
 
add x1, x1, #VCPU_CONTEXT
 
-   ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
-
// Store the guest regs x2 and x3
stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..9b380a1 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -278,6 +278,26 @@ static void __hyp_text __skip_instr(struct kvm_vcpu *vcpu)
write_sysreg_el2(*vcpu_pc(vcpu), elr);
 }
 
+static void __hyp_text __save_host_state(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpu_context *host_ctxt;
+
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+
+   __sysreg_save_host_state(host_ctxt);
+   __sysreg_save_host_pstate(vcpu);
+}
+
+static void __hyp_text __restore_host_state(struct kvm_vcpu *vcpu)
+{
+   struct kvm_cpu_context *host_ctxt;
+
+   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+
+   __sysreg_restore_host_state(host_ctxt);
+   __sysreg_restore_host_pstate(vcpu);
+}
+
 int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 {
struct

Re: [PATCH] arm64: KVM: Skip PSTATE.PAN reest at EL2 in non-VHE

2017-09-12 Thread gengdongjiu

> On Mon, Sep 11 2017 at  7:16:52 pm BST, Dongjiu Geng  
> wrote:
> > PSTATE.PAN disables reading and/or writing to a userspace virtual
> > address from EL1 in non-VHE or from EL2 in VHE. In non-VHE, there is
> > no any userspace mapping at EL2, so no need to reest the PSTATE.PAN.
> >
> > Signed-off-by: Dongjiu Geng 
> > Signed-off-by: Haibin Zhang 
> > ---
> >  arch/arm64/kvm/hyp/entry.S | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index 12ee62d6d410..86a7549b1b4c 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -96,8 +96,12 @@ ENTRY(__guest_exit)
> >
> > add x1, x1, #VCPU_CONTEXT
> >
> > -   ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
> > +alternative_if_not ARM64_HAS_VIRT_HOST_EXTN
> > +   b   2f  // skip PAN at EL2 in non-VHE
> > +alternative_else_nop_endif
> >
> > +   ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
> > +2:
> > // Store the guest regs x2 and x3
> > stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
> 

Hi Marc,
  I need to say why I found this issue. In my test, I found the PAN feature did 
not work in my platform for VHE, but it works for the no-VHE.
1. After checking the code, I found I missed this patch " arm64: KVM: VHE: 
reset PSTATE.PAN on entry to EL2", and then thought of UAO
feature. because UAO feature is similar with the PAN which all control the 
access address permit. After discussed with you, you think usually
we do not always call set_fs(KERNEL_DS), so leave it alone.
2. After checking the PAN and discussing with you, we think PAN does not needed 
at EL2 for the non-VHE, so make this change. 

> Aside from Vladimir's comment about why this may not be an important change 
> in practice (both features are v8.1, and expected to be
> implemented at the same time as VHE), I'm not sure this brings us much.

I agree that using VHE is the usual case if CPU supports VHE. but we cannot 
sure other people must not use non-VHE, since the pstate.PAN is not needed, why 
we still enabled it

> 
> We're just trading a write to PSTATE (which will have no effect other than 
> storing a bit in PSTATE) for a branch, and I'm not sure what is the
> worse. Your patch definitely makes the code less readable, and I value ease 
> of maintenance very much.

This place will check two CAP feature ARM64_HAS_PAN and 
ARM64_HAS_VIRT_HOST_EXTN, the best way is using "ALTERNATIVE" instruction, as 
shown below.
But I found it will report error when build, to avoid changing " ALTERNATIVE " 
macro. so I use the 'branch' instruction.

alternative_if ARM64_HAS_VIRT_HOST_EXTN
ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
alternative_else_nop_endif

> 
> Do you have any data coming from a non-VHE, PAN-enabled system that shows a 
> measurable, significant performance improvement with
> this patch?
> Because that would be the only reason why I'd consider such a change.

Frankly speaking, I do not test the performance.

> 
> Thanks,
> 
>   M.
> --
> Jazz is not dead. It just smells funny.

1 2 3 4 5 >

1 - 100 of 448 matches

Mail list logo