Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread gengdongjiu
Hi,

On 2017/3/22 6:47, Tyler Baicar wrote:
> + fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +
> + /* The host kernel will handle the synchronous external abort. There
> +  * is no need to pass the error into the guest.
> +  */
> + if (is_abort_synchronous(fault_status))
> + sea_status = handle_guest_sea((unsigned long)fault_ipa,
> + kvm_vcpu_get_hsr(vcpu));
>  
>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> - if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
> + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
>   kvm_inject_vabt(vcpu);
>   return 1;
>   }
   After the host kernel correctly handle the synchronous external abort, 
the sea_status
   will return 0, so the code logical will be continue go-no, whether it is 
better directly return
   after correctly handle the SEA? such as below.

if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);
return 1;
} else
return 1;

>  
> - fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> -
>   trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
> kvm_vcpu_get_hfar(vcpu), fault_ipa);
>  
> - /* Check the stage-2 fault is trans. fault or write fault */
> - fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread James Morse
Hi Tyler,

On 21/03/17 22:47, Tyler Baicar wrote:
> Currently external aborts are unsupported by the guest abort
> handling. Add handling for SEAs so that the host kernel reports
> SEAs which occur in the guest kernel.

Looks good,

Can we squash the APEI changes into the patch that added them? This would be one
fewer patches that then need the ACPI maintainer to review.


> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616f..105b6ab 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1406,6 +1407,24 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa)
>   kvm_set_pfn_accessed(pfn);
>  }
>  
> +static bool is_abort_synchronous(unsigned long fault_status) {

I missed kvm_vcpu_dabt_isextabt() when I suggested we would need a helper (my
fault). Can we use that instead?

(my argument that the unused encodings are reserved doesn't hold if KVM is
already doing this... )


> @@ -1426,23 +1445,31 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   unsigned long hva;
>   bool is_iabt, write_fault, writable;
>   gfn_t gfn;
> - int ret, idx;
> + int ret, idx, sea_status = 1;
> +
> + /* Check the stage-2 fault is trans. fault or write fault */
> + fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +
> + fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +
> + /* The host kernel will handle the synchronous external abort. There
> +  * is no need to pass the error into the guest.
> +  */
> + if (is_abort_synchronous(fault_status))
> + sea_status = handle_guest_sea((unsigned long)fault_ipa,
> + kvm_vcpu_get_hsr(vcpu));


Why not return from here if the error has been handled?

You use sea_status to skip the next two things that KVM might do, but it goes on
to try and process this, possibly calling user_mem_abort(), surely all this is
unnecessary?


>  
>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
> - if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
> + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
>   kvm_inject_vabt(vcpu);
>   return 1;
>   }
>  
> - fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> -
>   trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
> kvm_vcpu_get_hfar(vcpu), fault_ipa);
>  
> - /* Check the stage-2 fault is trans. fault or write fault */
> - fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
> - fault_status != FSC_ACCESS) {
> + fault_status != FSC_ACCESS && sea_status) {
>   kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>   kvm_vcpu_trap_get_class(vcpu),
>   (unsigned long)kvm_vcpu_trap_get_fault(vcpu),


Thanks,

James

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Aarch64: Qemu master or 2.9.0-rc1 breaks compatibility with 4.10 kernel.

2017-03-28 Thread Prakash B
Hi ,

Qemu master or v2.9.0-rc1 doesn't launch guest  when host kernel
version 4.10 or  lower on Aarch64 (cavium ThunderX), its failing with
abort  "qemu-system-aarch64: KVM_GET_DEVICE_ATTR failed: No such
device or address"

"GICv3 live migration support"  patch breaks the  compatibility b/w
kernel and qemu.

Is this expected behaviour,  can we provide a fix without  breaking
compatibility b/w kernel and qemu .

For  more information  below is gdb back trace .
---

Thread 1 "qemu-system-aar" received signal SIGABRT, Aborted.
0xa4bd7528 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
54../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0xa4bd7528 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1  0xa4bd89e0 in __GI_abort () at abort.c:89
#2  0x00479d3c in kvm_device_access (fd=23, group=6, attr=50788,
val=0x15c80d8, write=false)
at /home/ubuntu/workarea/sources/qemu-stable/qemu-2.9/kvm-all.c:2225
#3  0x004ed948 in arm_gicv3_icc_reset (env=0x13a8ff8, ri=0x1629ed0) at
/home/ubuntu/workarea/sources/qemu-stable/qemu-2.9/hw/intc/arm_gicv3_kvm.c:618
#4  0x00610414 in cp_reg_reset (key=0x1629cf0, value=0x1629ed0,
opaque=0x13a0cf0) at
/home/ubuntu/workarea/sources/qemu-stable/qemu-2.9/target/arm/cpu.c:75
#5  0xa4fb0878 in g_hash_table_foreach () from
/lib/aarch64-linux-gnu/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)



Verified guest  booting on host 4.10 kernel by reverting "GICv3:
support state save/restore" patches  and  guest booted without issues.

Regards,
Prakash B
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread James Morse
Hi Christoffer,

(CC: Leif and Achin who know more about how UEFI fits into this picture)

On 21/03/17 19:39, Christoffer Dall wrote:
> On Tue, Mar 21, 2017 at 07:11:44PM +, James Morse wrote:
>> On 21/03/17 11:34, Christoffer Dall wrote:
>>> On Tue, Mar 21, 2017 at 02:32:29PM +0800, gengdongjiu wrote:
 On 2017/3/20 23:08, James Morse wrote:
 On 20/03/17 07:55, Dongjiu Geng wrote:
> In the RAS implementation, hardware pass the virtual SEI
> syndrome information through the VSESR_EL2, so set the virtual
> SEI syndrome using physical SEI syndrome el2_elr to pass to
> the guest OS
>
> How does this work with firmware first?

 I explained it in previous mail about the work flow.
>>>
>>> When delivering and reporting SEIs to the VM, should this happen
>>> directly to the OS running in the VM, or to the guest firmware (e.g.
>>> UEFI) running in the VM as well?
>>
>> 'firmware first' is the ACPI specs name for x86's BIOS or management-mode
>> handling the error. On arm64 we have multiple things called firmware, so the
>> name might be more confusing than helpful.
>>
>> As far as I understand it, firmware here refers to the secure-world and EL3.
>> Something like ATF can use SCR_EL3.EA to claim SErrors and external aborts,
>> routing them to EL3 where secure platform specific firmware generates CPER 
>> records.
>> For a guest, Qemu takes the role of this EL3-firmware.
>>
> Thanks for the clarification.  So UEFI in the VM would not be involved
> in this at all?

On the host, part of UEFI is involved to generate the CPER records.
In a guest?, I don't know.
Qemu could generate the records, or drive some other component to do it.

Leif and Achin are the people with the UEFI/bigger picture.


> My confusion here comes from not thinking about QEMU or KVM as firmware,
> but as the machine, so it would be sort of like the functionality is
> baked into hardware rather than firmware.
> 
> Note that to the VM, the environment will look like hardware without EL3
> and without a secure world, so any software assuming there's something
> 'hidden' behind the available non-secure modes must not decide to
> disable features if discovering the lack of a secure world.


Thanks,

James

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 5/9] arm64: KVM: PMU: Inject UNDEF on read access to PMSWINC_EL0

2017-03-28 Thread Christoffer Dall
On Mon, Mar 27, 2017 at 05:03:41PM +0100, Marc Zyngier wrote:
> PMSWINC_EL0 is a WO register, so let's UNDEF when reading from it
> (in the highly hypothetical case where this doesn't UNDEF at EL1).
> 
> Signed-off-by: Marc Zyngier 

Reviewed-by: Christoffer Dall 

> ---
>  arch/arm64/kvm/sys_regs.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 19a036b4f6ac..f80a61af5e88 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -772,16 +772,15 @@ static bool access_pmswinc(struct kvm_vcpu *vcpu, 
> struct sys_reg_params *p,
>   if (!kvm_arm_pmu_v3_ready(vcpu))
>   return trap_raz_wi(vcpu, p, r);
>  
> + if (!p->is_write)
> + return read_from_write_only(vcpu, p);
> +
>   if (pmu_write_swinc_el0_disabled(vcpu))
>   return false;
>  
> - if (p->is_write) {
> - mask = kvm_pmu_valid_counter_mask(vcpu);
> - kvm_pmu_software_increment(vcpu, p->regval & mask);
> - return true;
> - }
> -
> - return false;
> + mask = kvm_pmu_valid_counter_mask(vcpu);
> + kvm_pmu_software_increment(vcpu, p->regval & mask);
> + return true;
>  }
>  
>  static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> -- 
> 2.11.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/9] arm64: KVM: PMU: Refactor pmu_*_el0_disabled

2017-03-28 Thread Christoffer Dall
On Mon, Mar 27, 2017 at 05:03:37PM +0100, Marc Zyngier wrote:
> There is a lot of duplication in the pmu_*_el0_disabled helpers,
> and as we're going to modify them shortly, let's move all the
> common stuff in a single function.
> 
> No functionnal change.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/sys_regs.c | 25 +++--
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0e26f8c2b56f..7e1d673304d5 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -460,35 +460,32 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const 
> struct sys_reg_desc *r)
>   vcpu_sys_reg(vcpu, PMCR_EL0) = val;
>  }
>  
> -static bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu)
> +static bool check_disabled(struct kvm_vcpu *vcpu, u64 flags)
>  {
>   u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
> + bool cond = (reg & flags) || vcpu_mode_priv(vcpu);

nit: I would call this variable 'enabled' and then return !enabled to
make it clear what's going on.

(If you agree, I can fix this up when applying along with the typo and
rename pointed out by Suzuki).

>  
> - return !((reg & ARMV8_PMU_USERENR_EN) || vcpu_mode_priv(vcpu));
> + return !cond;
>  }
>  
> -static bool pmu_write_swinc_el0_disabled(struct kvm_vcpu *vcpu)
> +static bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu)
>  {
> - u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
> + return check_disabled(vcpu, ARMV8_PMU_USERENR_EN);
> +}
>  
> - return !((reg & (ARMV8_PMU_USERENR_SW | ARMV8_PMU_USERENR_EN))
> -  || vcpu_mode_priv(vcpu));
> +static bool pmu_write_swinc_el0_disabled(struct kvm_vcpu *vcpu)
> +{
> + return check_disabled(vcpu, ARMV8_PMU_USERENR_SW | 
> ARMV8_PMU_USERENR_EN);
>  }
>  
>  static bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu)
>  {
> - u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
> -
> - return !((reg & (ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_EN))
> -  || vcpu_mode_priv(vcpu));
> + return check_disabled(vcpu, ARMV8_PMU_USERENR_CR | 
> ARMV8_PMU_USERENR_EN);
>  }
>  
>  static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
>  {
> - u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
> -
> - return !((reg & (ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_EN))
> -  || vcpu_mode_priv(vcpu));
> + return check_disabled(vcpu, ARMV8_PMU_USERENR_ER | 
> ARMV8_PMU_USERENR_EN);
>  }
>  
>  static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> -- 
> 2.11.0
> 

Otherwise:

Reviewed-by: Christoffer Dall 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread James Morse
Hi Peter,

On 28/03/17 12:33, Peter Maydell wrote:
> On 28 March 2017 at 12:23, Christoffer Dall  wrote:
>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
>>> On the host, part of UEFI is involved to generate the CPER records.
>>> In a guest?, I don't know.
>>> Qemu could generate the records, or drive some other component to do it.
>>
>> I think I am beginning to understand this a bit.  Since the guet UEFI
>> instance is specifically built for the machine it runs on, QEMU's virt
>> machine in this case, they could simply agree (by some contract) to
>> place the records at some specific location in memory, and if the guest
>> kernel asks its guest UEFI for that location, things should just work by
>> having logic in QEMU to process error reports and populate guest memory.
> 
> Is "write direct to guest memory" the best ABI here or would
> it be preferable to use the fw_cfg interface for the guest UEFI
> to retrieve the data items on demand?

As far as I understand the interaction between Qemu and UEFI isn't defined. The
eventual aim is to emulate ACPI's firmware first error handling for a guest.
This way the RAS behaviour for a host and the guest is the same.

The ABI is the guest OS gets a 'notification' (there is a list in acpi: 18.3.2.9
Hardware Error Notification), and then finds a pointer to the CPER records
(defined in the UEFI Spec Appendix N) at the address advertised by one of the
Generic Hardware Error Source (GHES) entries of the ACPI Hardware Error Source
Table (HEST).

How Qemu and UEFI conspire to make this happen is up for discussion.
My suggestion would be to try and closely mirror whatever happens on a physical
system so that UEFI only needs to test one set of code.


> Is there a pre-existing "this is how it works on x86" implementation?

I found [0], which on page 16 talks about Qemu injecting a Pseudo 'Software
Recoverable Action Required', which I assume is a flavour of NMI.

The ACPI/CPER stuff is arch agnostic and given Qemu is only ever likely to have
to handle memory errors it should be possible to use the same code for both x86
and arm64.


Thanks,

James

[0] https://events.linuxfoundation.org/images/stories/slides/lfcs2013_tanino.pdf
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread Christoffer Dall
On Tue, Mar 28, 2017 at 02:24:55PM +0100, Achin Gupta wrote:
> On Tue, Mar 28, 2017 at 02:22:29PM +0200, Christoffer Dall wrote:
> > On Tue, Mar 28, 2017 at 12:54:13PM +0100, Achin Gupta wrote:
> > > On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
> > > > On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
> > > > > Hi Christoffer,
> > > > >
> > > > > (CC: Leif and Achin who know more about how UEFI fits into this 
> > > > > picture)
> > > > >
> > > > > On 21/03/17 19:39, Christoffer Dall wrote:
> > > > > > On Tue, Mar 21, 2017 at 07:11:44PM +, James Morse wrote:
> > > > > >> On 21/03/17 11:34, Christoffer Dall wrote:
> > > > > >>> On Tue, Mar 21, 2017 at 02:32:29PM +0800, gengdongjiu wrote:
> > > > >  On 2017/3/20 23:08, James Morse wrote:
> > > > >  On 20/03/17 07:55, Dongjiu Geng wrote:
> > > > > > In the RAS implementation, hardware pass the virtual SEI
> > > > > > syndrome information through the VSESR_EL2, so set the 
> > > > > > virtual
> > > > > > SEI syndrome using physical SEI syndrome el2_elr to pass to
> > > > > > the guest OS
> > > > > >
> > > > > > How does this work with firmware first?
> > > > > 
> > > > >  I explained it in previous mail about the work flow.
> > > > > >>>
> > > > > >>> When delivering and reporting SEIs to the VM, should this happen
> > > > > >>> directly to the OS running in the VM, or to the guest firmware 
> > > > > >>> (e.g.
> > > > > >>> UEFI) running in the VM as well?
> > > > > >>
> > > > > >> 'firmware first' is the ACPI specs name for x86's BIOS or 
> > > > > >> management-mode
> > > > > >> handling the error. On arm64 we have multiple things called 
> > > > > >> firmware, so the
> > > > > >> name might be more confusing than helpful.
> > > > > >>
> > > > > >> As far as I understand it, firmware here refers to the 
> > > > > >> secure-world and EL3.
> > > > > >> Something like ATF can use SCR_EL3.EA to claim SErrors and 
> > > > > >> external aborts,
> > > > > >> routing them to EL3 where secure platform specific firmware 
> > > > > >> generates CPER records.
> > > > > >> For a guest, Qemu takes the role of this EL3-firmware.
> > >
> > > +1
> > >
> > > > > >>
> > > > > > Thanks for the clarification.  So UEFI in the VM would not be 
> > > > > > involved
> > > > > > in this at all?
> > > > >
> > > > > On the host, part of UEFI is involved to generate the CPER records.
> > > > > In a guest?, I don't know.
> > > > > Qemu could generate the records, or drive some other component to do 
> > > > > it.
> > > >
> > > > I think I am beginning to understand this a bit.  Since the guet UEFI
> > > > instance is specifically built for the machine it runs on, QEMU's virt
> > > > machine in this case, they could simply agree (by some contract) to
> > > > place the records at some specific location in memory, and if the guest
> > > > kernel asks its guest UEFI for that location, things should just work by
> > > > having logic in QEMU to process error reports and populate guest memory.
> > > >
> > > > Is this how others see the world too?
> > >
> > > I think so!
> > >
> > > AFAIU, the memory where CPERs will reside should be specified in a GHES 
> > > entry in
> > > the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
> > > creates a
> > > HEST for the guest Kernel?
> > >
> > > If so, then the question is how the guest UEFI finds out where QEMU 
> > > (acting as
> > > EL3 firmware) will populate the CPERs. This could either be a contract 
> > > between
> > > the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to 
> > > ask QEMU
> > > where the memory is.
> > >
> > > This is the way I expect it to work at the EL3/EL2 boundary. So I am
> > > extrapolating it to the guest/hypervisor boundary. Do shout if I am 
> > > missing
> > > anything.
> >
> > No that sounds like a resonable comparison.
> >
> > I'm not entirely sure what a HEST or GHES is, but I think the only place
> > where I'm still not clear is if when the guest kernel is notified of
> > errors does it (a) just traverse memory by following some pointers
> > (which it may have pre-loaded at boot from UEFI), or (b) run UEFI code
> > which can call into QEMU and generate error records on demand?
> 
> So HEST is the ACPI Harware Error Source Table. It has entries in it for 
> Generic
> HW Error Sources (GHES) amongst other types of error sources (x86 MCE etc). 
> Each
> Error source specifies an address where the address of the CPER created by
> firmware will be populated. OS upon receipt of an error reads the CPERs to 
> find
> the error source. It uses the addresses specified in the GHES entries of the
> HEST. This is closer to (a) above. HEST has the pointers preloaded at boot by
> UEFI.
> 
Thanks for the explanation.  Sounds to me like QEMU, through whatever
abstractions and proper methods they have to do that, must populate
memory more or less directly.

I guess this is up to whoever will 

Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread James Morse
Hi gengdongjiu,

On 28/03/17 13:16, gengdongjiu wrote:
> On 2017/3/28 19:54, Achin Gupta wrote:
>> On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
>>> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
 On the host, part of UEFI is involved to generate the CPER records.
 In a guest?, I don't know.
 Qemu could generate the records, or drive some other component to do it.
>>>
>>> I think I am beginning to understand this a bit.  Since the guet UEFI
>>> instance is specifically built for the machine it runs on, QEMU's virt
>>> machine in this case, they could simply agree (by some contract) to
>>> place the records at some specific location in memory, and if the guest
>>> kernel asks its guest UEFI for that location, things should just work by
>>> having logic in QEMU to process error reports and populate guest memory.
>>>
>>> Is this how others see the world too?
>>
>> I think so!
>>
>> AFAIU, the memory where CPERs will reside should be specified in a GHES 
>> entry in
>> the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
>> creates a
>> HEST for the guest Kernel?
>>
>> If so, then the question is how the guest UEFI finds out where QEMU (acting 
>> as
>> EL3 firmware) will populate the CPERs. This could either be a contract 
>> between
>> the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask 
>> QEMU
>> where the memory is.
> 
> whether invoke the guest UEFI will be complex? not see the advantage. it 
> seems x86 Qemu
> directly generate the ACPI table, but I am not sure, we are checking the qemu
logical.
> let Qemu generate CPER record may be clear.

At boot UEFI in the guest will need to make sure the areas of memory that may be
used for CPER records are reserved. Whether UEFI or Qemu decides where these are
needs deciding, (but probably not here)...

At runtime, when an error has occurred, I agree it would be simpler (fewer
components involved) if Qemu generates the CPER records. But if UEFI made the
memory choice above they need to interact and it gets complicated again. The
CPER records are defined in the UEFI spec, so I would expect UEFI to contain
code to generate/parse them.


Thanks,

James

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/9] arm64: KVM: PMU: Refactor pmu_*_el0_disabled

2017-03-28 Thread Marc Zyngier
On 28/03/17 13:46, Christoffer Dall wrote:
> On Mon, Mar 27, 2017 at 05:03:37PM +0100, Marc Zyngier wrote:
>> There is a lot of duplication in the pmu_*_el0_disabled helpers,
>> and as we're going to modify them shortly, let's move all the
>> common stuff in a single function.
>>
>> No functionnal change.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/sys_regs.c | 25 +++--
>>  1 file changed, 11 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 0e26f8c2b56f..7e1d673304d5 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -460,35 +460,32 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const 
>> struct sys_reg_desc *r)
>>  vcpu_sys_reg(vcpu, PMCR_EL0) = val;
>>  }
>>  
>> -static bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu)
>> +static bool check_disabled(struct kvm_vcpu *vcpu, u64 flags)
>>  {
>>  u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
>> +bool cond = (reg & flags) || vcpu_mode_priv(vcpu);
> 
> nit: I would call this variable 'enabled' and then return !enabled to
> make it clear what's going on.
> 
> (If you agree, I can fix this up when applying along with the typo and
> rename pointed out by Suzuki).

Yup, that'd be absolutely fine.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 0/9] arm64: KVM: Fix PMU exception generation

2017-03-28 Thread Christoffer Dall
On Mon, Mar 27, 2017 at 05:03:36PM +0100, Marc Zyngier wrote:
> Running the following code:
> 
> root@zomby-woof:~# cat test-pmu.c
> int main(int argc, char *argv[])
> {
>   unsigned int val;
>   asm ("mrc p15, 0, %0, c9, c13, 0\n" : "=r" (val));
>   return val;
> }
> 
> in a 32bit guest (or a 64bit guest with a 32bit userspace) results in
> this surprising result:
> 
> [  120.347497] kvm [1150]: Unsupported guest CP15 access at: ab0945ae
> [  120.353689] kvm [1142]:  { Op0( 0), Op1( 0), CRn( 9), CRm(13), Op2( 0), 
> func_read },
> 
> which is weird, because the guest behaves correctly:
> root@zomby-woof:~# ./test-pmu 
> [   16.184422] test-pmu[740]: undefined instruction: pc=ab0945ae
> [   16.186043] Code: 00340001 b480 af00b085 60396078 (3f1dee19) 
> Illegal instruction
> 
> It gets the expected UNDEF, and all is fine. So what?
> 
> It turns out that the PMU emulation code is a bit lazy, and tells the
> rest of KVM that the emulation has failed, so that an exception gets
> delivered. Subtle differences in the 32bit vs 64bit handling make it
> spit an "Unsupported..." error.
> 
> This series tries to set things straight:
> - Make all PMU illegal accesses inject an UNDEF
> - Make these illegal accesses a successful emulation w.r.t the rest of KVM.
> 
> In the process, we also squash an interesting bug in the 64bit CP
> access. Similar treatment is applied to the 32bit kernel, except that
> we don't ever inject an exception there (no PMU support yet).
> 

I have applied this series to queue (not next since I haven't tested
thoroughly yet), with the fixups agreed in the series.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread Achin Gupta
On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
> On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
> > Hi Christoffer,
> >
> > (CC: Leif and Achin who know more about how UEFI fits into this picture)
> >
> > On 21/03/17 19:39, Christoffer Dall wrote:
> > > On Tue, Mar 21, 2017 at 07:11:44PM +, James Morse wrote:
> > >> On 21/03/17 11:34, Christoffer Dall wrote:
> > >>> On Tue, Mar 21, 2017 at 02:32:29PM +0800, gengdongjiu wrote:
> >  On 2017/3/20 23:08, James Morse wrote:
> >  On 20/03/17 07:55, Dongjiu Geng wrote:
> > > In the RAS implementation, hardware pass the virtual SEI
> > > syndrome information through the VSESR_EL2, so set the virtual
> > > SEI syndrome using physical SEI syndrome el2_elr to pass to
> > > the guest OS
> > >
> > > How does this work with firmware first?
> > 
> >  I explained it in previous mail about the work flow.
> > >>>
> > >>> When delivering and reporting SEIs to the VM, should this happen
> > >>> directly to the OS running in the VM, or to the guest firmware (e.g.
> > >>> UEFI) running in the VM as well?
> > >>
> > >> 'firmware first' is the ACPI specs name for x86's BIOS or management-mode
> > >> handling the error. On arm64 we have multiple things called firmware, so 
> > >> the
> > >> name might be more confusing than helpful.
> > >>
> > >> As far as I understand it, firmware here refers to the secure-world and 
> > >> EL3.
> > >> Something like ATF can use SCR_EL3.EA to claim SErrors and external 
> > >> aborts,
> > >> routing them to EL3 where secure platform specific firmware generates 
> > >> CPER records.
> > >> For a guest, Qemu takes the role of this EL3-firmware.

+1

> > >>
> > > Thanks for the clarification.  So UEFI in the VM would not be involved
> > > in this at all?
> >
> > On the host, part of UEFI is involved to generate the CPER records.
> > In a guest?, I don't know.
> > Qemu could generate the records, or drive some other component to do it.
>
> I think I am beginning to understand this a bit.  Since the guet UEFI
> instance is specifically built for the machine it runs on, QEMU's virt
> machine in this case, they could simply agree (by some contract) to
> place the records at some specific location in memory, and if the guest
> kernel asks its guest UEFI for that location, things should just work by
> having logic in QEMU to process error reports and populate guest memory.
>
> Is this how others see the world too?

I think so!

AFAIU, the memory where CPERs will reside should be specified in a GHES entry in
the HEST. Is this not the case with a guest kernel i.e. the guest UEFI creates a
HEST for the guest Kernel?

If so, then the question is how the guest UEFI finds out where QEMU (acting as
EL3 firmware) will populate the CPERs. This could either be a contract between
the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask QEMU
where the memory is.

This is the way I expect it to work at the EL3/EL2 boundary. So I am
extrapolating it to the guest/hypervisor boundary. Do shout if I am missing
anything.

hth,
Achin

>
> >
> > Leif and Achin are the people with the UEFI/bigger picture.
> >
> >
>
> Thanks!
> -Christoffer

[1] 
http://infocenter.arm.com/help/topic/com.arm.doc.den0060a/DEN0060A_ARM_MM_Interface_Specification.pdf
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 7/9] arm64: KVM: Do not corrupt registers on failed 64bit CP read

2017-03-28 Thread Marc Zyngier
On 28/03/17 13:46, Christoffer Dall wrote:
> On Mon, Mar 27, 2017 at 05:03:43PM +0100, Marc Zyngier wrote:
>> If we fail to emulate a mrrc instruction, we:
>> 1) deliver an exception,
>> 2) spit a nastygram on the console,
>> 3) write back some garbage to Rt/Rt2
>>
>> While 1) and 2) are perfectly acceptable, 3) is out of the scope of
>> the architecture... Let's mimick the code in kvm_handle_cp_32 and
>> be more cautious.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/sys_regs.c | 20 +---
>>  1 file changed, 9 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 4e5d4eee8cec..1080a76e960f 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1678,20 +1678,18 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
>>  params.regval |= vcpu_get_reg(vcpu, Rt2) << 32;
>>  }
>>  
>> -if (!emulate_cp(vcpu, , target_specific, nr_specific))
>> -goto out;
>> -if (!emulate_cp(vcpu, , global, nr_global))
>> -goto out;
>> -
>> -unhandled_cp_access(vcpu, );
>> +if (!emulate_cp(vcpu, , target_specific, nr_specific) ||
>> +!emulate_cp(vcpu, , global, nr_global)) {
> 
> super nit: I choked a bit on this contruct, any objections to adding a
> comment like the following above:
> 
>   /*
>* Try to emulate the coprocessor access using the target
>* specific table first, and using the global table aftwards.
>* If either of the tables contains a handler, handle the
>* potential register operation in the case of a read and return
>* with success.
>*/
> 
> Too much?
> 
> (If not, I can also add this when applying).

No, that's great. Thanks!

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 8/9] arm: KVM: Make unexpected register accesses inject an undef

2017-03-28 Thread Christoffer Dall
On Mon, Mar 27, 2017 at 05:03:44PM +0100, Marc Zyngier wrote:
> Reads from write-only system registers are generally confined to
> EL1 and not propagated to EL2 (that's what the architecture
> mantates). In order to be sure that we have a sane behaviour
> even in the unlikely event that we have a broken system, we still
> handle it in KVM. Same goes for write to RO registers.
> 
> In that case, let's inject an undef into the guest.
> 

Reviewed-by: Christoffer Dall 

> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/kvm/coproc.c | 18 ++
>  arch/arm/kvm/coproc.h | 18 --
>  2 files changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index 3e5e4194ef86..519aac12b365 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -40,6 +40,24 @@
>   * Co-processor emulation
>   
> */
>  
> +static bool write_to_read_only(struct kvm_vcpu *vcpu,
> +const struct coproc_params *params)
> +{
> + WARN_ONCE(1, "CP15 write to read-only register\n");
> + print_cp_instr(params);
> + kvm_inject_undefined(vcpu);
> + return false;
> +}
> +
> +static bool read_from_write_only(struct kvm_vcpu *vcpu,
> +  const struct coproc_params *params)
> +{
> + WARN_ONCE(1, "CP15 read to write-only register\n");
> + print_cp_instr(params);
> + kvm_inject_undefined(vcpu);
> + return false;
> +}
> +
>  /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
>  static u32 cache_levels;
>  
> diff --git a/arch/arm/kvm/coproc.h b/arch/arm/kvm/coproc.h
> index eef1759c2b65..3a41b7d1eb86 100644
> --- a/arch/arm/kvm/coproc.h
> +++ b/arch/arm/kvm/coproc.h
> @@ -81,24 +81,6 @@ static inline bool read_zero(struct kvm_vcpu *vcpu,
>   return true;
>  }
>  
> -static inline bool write_to_read_only(struct kvm_vcpu *vcpu,
> -   const struct coproc_params *params)
> -{
> - kvm_debug("CP15 write to read-only register at: %08lx\n",
> -   *vcpu_pc(vcpu));
> - print_cp_instr(params);
> - return false;
> -}
> -
> -static inline bool read_from_write_only(struct kvm_vcpu *vcpu,
> - const struct coproc_params *params)
> -{
> - kvm_debug("CP15 read to write-only register at: %08lx\n",
> -   *vcpu_pc(vcpu));
> - print_cp_instr(params);
> - return false;
> -}
> -
>  /* Reset functions */
>  static inline void reset_unknown(struct kvm_vcpu *vcpu,
>const struct coproc_reg *r)
> -- 
> 2.11.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 6/9] arm64: KVM: Treat sysreg accessors returning false as successful

2017-03-28 Thread Christoffer Dall
On Mon, Mar 27, 2017 at 05:03:42PM +0100, Marc Zyngier wrote:
> Instead of considering that a sysreg accessor has failed when
> returning false, let's consider that it is *always* successful
> (after all, we won't stand for an incomplete emulation).

That's right!

> 
> The return value now simply indicates whether we should skip
> the instruction (because it has now been emulated), or if we
> should leave the PC alone if the emulation has injected an
> exception.

Reviewed-by: Christoffer Dall 

(I especially enjoy the much cleaner flow of emulate_sys_reg())

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/sys_regs.c | 49 
> +++
>  1 file changed, 20 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f80a61af5e88..4e5d4eee8cec 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1571,6 +1571,22 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   return 1;
>  }
>  
> +static void perform_access(struct kvm_vcpu *vcpu,
> +struct sys_reg_params *params,
> +const struct sys_reg_desc *r)
> +{
> + /*
> +  * Not having an accessor means that we have configured a trap
> +  * that we don't know how to handle. This certainly qualifies
> +  * as a gross bug that should be fixed right away.
> +  */
> + BUG_ON(!r->access);
> +
> + /* Skip instruction if instructed so */
> + if (likely(r->access(vcpu, params, r)))
> + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +}
> +
>  /*
>   * emulate_cp --  tries to match a sys_reg access in a handling table, and
>   *call the corresponding trap handler.
> @@ -1594,20 +1610,8 @@ static int emulate_cp(struct kvm_vcpu *vcpu,
>   r = find_reg(params, table, num);
>  
>   if (r) {
> - /*
> -  * Not having an accessor means that we have
> -  * configured a trap that we don't know how to
> -  * handle. This certainly qualifies as a gross bug
> -  * that should be fixed right away.
> -  */
> - BUG_ON(!r->access);
> -
> - if (likely(r->access(vcpu, params, r))) {
> - /* Skip instruction, since it was emulated */
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> - /* Handled */
> - return 0;
> - }
> + perform_access(vcpu, params, r);
> + return 0;
>   }
>  
>   /* Not handled */
> @@ -1777,26 +1781,13 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>   r = find_reg(params, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
>  
>   if (likely(r)) {
> - /*
> -  * Not having an accessor means that we have
> -  * configured a trap that we don't know how to
> -  * handle. This certainly qualifies as a gross bug
> -  * that should be fixed right away.
> -  */
> - BUG_ON(!r->access);
> -
> - if (likely(r->access(vcpu, params, r))) {
> - /* Skip instruction, since it was emulated */
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> - return 1;
> - }
> - /* If access function fails, it should complain. */
> + perform_access(vcpu, params, r);
>   } else {
>   kvm_err("Unsupported guest sys_reg access at: %lx\n",
>   *vcpu_pc(vcpu));
>   print_sys_reg_instr(params);
> + kvm_inject_undefined(vcpu);
>   }
> - kvm_inject_undefined(vcpu);
>   return 1;
>  }
>  
> -- 
> 2.11.0
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: pass the virtual SEI syndrome to guest OS

2017-03-28 Thread Achin Gupta
On Tue, Mar 28, 2017 at 02:22:29PM +0200, Christoffer Dall wrote:
> On Tue, Mar 28, 2017 at 12:54:13PM +0100, Achin Gupta wrote:
> > On Tue, Mar 28, 2017 at 01:23:28PM +0200, Christoffer Dall wrote:
> > > On Tue, Mar 28, 2017 at 11:48:08AM +0100, James Morse wrote:
> > > > Hi Christoffer,
> > > >
> > > > (CC: Leif and Achin who know more about how UEFI fits into this picture)
> > > >
> > > > On 21/03/17 19:39, Christoffer Dall wrote:
> > > > > On Tue, Mar 21, 2017 at 07:11:44PM +, James Morse wrote:
> > > > >> On 21/03/17 11:34, Christoffer Dall wrote:
> > > > >>> On Tue, Mar 21, 2017 at 02:32:29PM +0800, gengdongjiu wrote:
> > > >  On 2017/3/20 23:08, James Morse wrote:
> > > >  On 20/03/17 07:55, Dongjiu Geng wrote:
> > > > > In the RAS implementation, hardware pass the virtual SEI
> > > > > syndrome information through the VSESR_EL2, so set the virtual
> > > > > SEI syndrome using physical SEI syndrome el2_elr to pass to
> > > > > the guest OS
> > > > >
> > > > > How does this work with firmware first?
> > > > 
> > > >  I explained it in previous mail about the work flow.
> > > > >>>
> > > > >>> When delivering and reporting SEIs to the VM, should this happen
> > > > >>> directly to the OS running in the VM, or to the guest firmware (e.g.
> > > > >>> UEFI) running in the VM as well?
> > > > >>
> > > > >> 'firmware first' is the ACPI specs name for x86's BIOS or 
> > > > >> management-mode
> > > > >> handling the error. On arm64 we have multiple things called 
> > > > >> firmware, so the
> > > > >> name might be more confusing than helpful.
> > > > >>
> > > > >> As far as I understand it, firmware here refers to the secure-world 
> > > > >> and EL3.
> > > > >> Something like ATF can use SCR_EL3.EA to claim SErrors and external 
> > > > >> aborts,
> > > > >> routing them to EL3 where secure platform specific firmware 
> > > > >> generates CPER records.
> > > > >> For a guest, Qemu takes the role of this EL3-firmware.
> >
> > +1
> >
> > > > >>
> > > > > Thanks for the clarification.  So UEFI in the VM would not be involved
> > > > > in this at all?
> > > >
> > > > On the host, part of UEFI is involved to generate the CPER records.
> > > > In a guest?, I don't know.
> > > > Qemu could generate the records, or drive some other component to do it.
> > >
> > > I think I am beginning to understand this a bit.  Since the guet UEFI
> > > instance is specifically built for the machine it runs on, QEMU's virt
> > > machine in this case, they could simply agree (by some contract) to
> > > place the records at some specific location in memory, and if the guest
> > > kernel asks its guest UEFI for that location, things should just work by
> > > having logic in QEMU to process error reports and populate guest memory.
> > >
> > > Is this how others see the world too?
> >
> > I think so!
> >
> > AFAIU, the memory where CPERs will reside should be specified in a GHES 
> > entry in
> > the HEST. Is this not the case with a guest kernel i.e. the guest UEFI 
> > creates a
> > HEST for the guest Kernel?
> >
> > If so, then the question is how the guest UEFI finds out where QEMU (acting 
> > as
> > EL3 firmware) will populate the CPERs. This could either be a contract 
> > between
> > the two or a guest DXE driver uses the MM_COMMUNICATE call (see [1]) to ask 
> > QEMU
> > where the memory is.
> >
> > This is the way I expect it to work at the EL3/EL2 boundary. So I am
> > extrapolating it to the guest/hypervisor boundary. Do shout if I am missing
> > anything.
>
> No that sounds like a resonable comparison.
>
> I'm not entirely sure what a HEST or GHES is, but I think the only place
> where I'm still not clear is if when the guest kernel is notified of
> errors does it (a) just traverse memory by following some pointers
> (which it may have pre-loaded at boot from UEFI), or (b) run UEFI code
> which can call into QEMU and generate error records on demand?

So HEST is the ACPI Harware Error Source Table. It has entries in it for Generic
HW Error Sources (GHES) amongst other types of error sources (x86 MCE etc). Each
Error source specifies an address where the address of the CPER created by
firmware will be populated. OS upon receipt of an error reads the CPERs to find
the error source. It uses the addresses specified in the GHES entries of the
HEST. This is closer to (a) above. HEST has the pointers preloaded at boot by
UEFI.

hth,
Achin

>
> Thanks,
> -Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory

2017-03-28 Thread Punit Agrawal
Christoffer Dall  writes:

> On Mon, Mar 27, 2017 at 02:31:44PM +0100, Punit Agrawal wrote:
>> Christoffer Dall  writes:
>> 
>> > On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote:
>> >> Hi guys,
>> >> 
>> >> On 27/03/17 12:20, Punit Agrawal wrote:
>> >> > Christoffer Dall  writes:
>> >> >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote:
>> >> >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], 
>> >> >>> notifications for
>> >> >>> broken memory can call memory_failure() in mm/memory-failure.c to 
>> >> >>> deliver
>> >> >>> SIGBUS to any user space process using the page, and notify all the
>> >> >>> in-kernel users.
>> >> >>>
>> >> >>> If the page corresponded with guest memory, KVM will unmap this page
>> >> >>> from its stage2 page tables. The user space process that allocated
>> >> >>> this memory may have never touched this page in which case it may not
>> >> >>> be mapped meaning SIGBUS won't be delivered.
>> >> >>>
>> >> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it
>> >> >>> comes to process the stage2 fault.
>> >> >>>
>> >> >>> Do as x86 does, and deliver the SIGBUS when we discover
>> >> >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb
>> >> >>> as this matches the user space mapping size.
>> >> 
>> >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> >> >>> index 962616fd4ddd..9d1aa294e88f 100644
>> >> >>> --- a/arch/arm/kvm/mmu.c
>> >> >>> +++ b/arch/arm/kvm/mmu.c
>> >> >>> @@ -20,8 +20,10 @@
>> >> >>>  #include 
>> >> >>>  #include 
>> >> >>>  #include 
>> >> >>> +#include 
>> >> >>>  #include 
>> >> >>>  #include 
>> >> >>> +#include 
>> >> >>>  #include 
>> >> >>>  #include 
>> >> >>>  #include 
>> >> >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct 
>> >> >>> kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> >> >>>   __coherent_cache_guest_page(vcpu, pfn, size);
>> >> >>>  }
>> >> >>>  
>> >> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool 
>> >> >>> hugetlb)
>> >> >>> +{
>> >> >>> + siginfo_t info;
>> >> >>> +
>> >> >>> + info.si_signo   = SIGBUS;
>> >> >>> + info.si_errno   = 0;
>> >> >>> + info.si_code= BUS_MCEERR_AR;
>> >> >>> + info.si_addr= (void __user *)address;
>> >> >>> +
>> >> >>> + if (hugetlb)
>> >> >>> + info.si_addr_lsb = PMD_SHIFT;
>> >> >>> + else
>> >> >>> + info.si_addr_lsb = PAGE_SHIFT;
>> >> >>> +
>> >> >>> + send_sig_info(SIGBUS, , current);
>> >> >>> +}
>> >> >>> +
>> >> >>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t 
>> >> >>> fault_ipa,
>> >> >>> struct kvm_memory_slot *memslot, unsigned 
>> >> >>> long hva,
>> >> >>> unsigned long fault_status)
>> >> >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu 
>> >> >>> *vcpu, phys_addr_t fault_ipa,
>> >> >>>   smp_rmb();
>> >> >>>  
>> >> >>>   pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, );
>> >> >>> + if (pfn == KVM_PFN_ERR_HWPOISON) {
>> >> >>> + kvm_send_hwpoison_signal(hva, hugetlb);
>> >> >>
>> >> >> The way this is called means that we'll only notify userspace of a huge
>> >> >> mapping if userspace is mapping hugetlbfs, and not because the stage2
>> >> >> mapping may or may not have used transparent huge pages when the error
>> >> >> was discovered.  Is this the desired semantics?
>> >> 
>> >> No,
>> >> 
>> >> 
>> >> > I think so.
>> >> >
>> >> > AFAIUI, transparent hugepages are split before being poisoned while all
>> >> > the underlying pages of a hugepage are poisoned together, i.e., no
>> >> > splitting.
>> >> 
>> >> In which case I need to look into this some more!
>> >> 
>> >> My thinking was we should report the size that was knocked out of the 
>> >> stage2 to
>> >> avoid the guest repeatedly faulting until it has touched every 
>> >> guest-page-size
>> >> in the stage2 hole.
>> >
>> > By signaling something at the fault path, I think it's going to be very
>> > hard to backtrack how the stage 2 page tables looked like when faults
>> > started happening, because I think these are completely decoupled events
>> > (the mmu notifier and the later fault).
>> >
>> >> 
>> >> Reading the code in that kvm/mmu.c it looked like the mapping sizes would 
>> >> always
>> >> be the same as those used by userspace.
>> >
>> > I think the mapping sizes should be the same between userspace and KVM,
>> > but the mapping size of a particular page (and associated pages) may
>> > vary over time.
>> 
>> Stage 1 and Stage 2 support different hugepage sizes. A larger size
>> stage 1 page maps to multiple stage 2 page table entries. For stage 1,
>> we support PUD_SIZE, CONT_PMD_SIZE, PMD_SIZE and CONT_PTE_SIZE while
>> only PMD_SIZE is supported for Stage 2.
>> 
>> >
>> >> 
>> >> If the page was split before KVM could have taken this fault I assumed it 
>> >> would
>> >> fault on the 

Re: [PATCH] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory

2017-03-28 Thread Christoffer Dall
On Tue, Mar 28, 2017 at 03:50:51PM +0100, Punit Agrawal wrote:
> Christoffer Dall  writes:
> 
> > On Mon, Mar 27, 2017 at 02:31:44PM +0100, Punit Agrawal wrote:
> >> Christoffer Dall  writes:
> >> 
> >> > On Mon, Mar 27, 2017 at 01:00:56PM +0100, James Morse wrote:
> >> >> Hi guys,
> >> >> 
> >> >> On 27/03/17 12:20, Punit Agrawal wrote:
> >> >> > Christoffer Dall  writes:
> >> >> >> On Wed, Mar 15, 2017 at 04:07:27PM +, James Morse wrote:
> >> >> >>> Once we enable ARCH_SUPPORTS_MEMORY_FAILURE on arm64[0], 
> >> >> >>> notifications for
> >> >> >>> broken memory can call memory_failure() in mm/memory-failure.c to 
> >> >> >>> deliver
> >> >> >>> SIGBUS to any user space process using the page, and notify all the
> >> >> >>> in-kernel users.
> >> >> >>>
> >> >> >>> If the page corresponded with guest memory, KVM will unmap this page
> >> >> >>> from its stage2 page tables. The user space process that allocated
> >> >> >>> this memory may have never touched this page in which case it may 
> >> >> >>> not
> >> >> >>> be mapped meaning SIGBUS won't be delivered.
> >> >> >>>
> >> >> >>> When this happens KVM discovers pfn == KVM_PFN_ERR_HWPOISON when it
> >> >> >>> comes to process the stage2 fault.
> >> >> >>>
> >> >> >>> Do as x86 does, and deliver the SIGBUS when we discover
> >> >> >>> KVM_PFN_ERR_HWPOISON. Use the stage2 mapping size as the si_addr_lsb
> >> >> >>> as this matches the user space mapping size.
> >> >> 
> >> >> >>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> >> >>> index 962616fd4ddd..9d1aa294e88f 100644
> >> >> >>> --- a/arch/arm/kvm/mmu.c
> >> >> >>> +++ b/arch/arm/kvm/mmu.c
> >> >> >>> @@ -20,8 +20,10 @@
> >> >> >>>  #include 
> >> >> >>>  #include 
> >> >> >>>  #include 
> >> >> >>> +#include 
> >> >> >>>  #include 
> >> >> >>>  #include 
> >> >> >>> +#include 
> >> >> >>>  #include 
> >> >> >>>  #include 
> >> >> >>>  #include 
> >> >> >>> @@ -1237,6 +1239,23 @@ static void coherent_cache_guest_page(struct 
> >> >> >>> kvm_vcpu *vcpu, kvm_pfn_t pfn,
> >> >> >>> __coherent_cache_guest_page(vcpu, pfn, size);
> >> >> >>>  }
> >> >> >>>  
> >> >> >>> +static void kvm_send_hwpoison_signal(unsigned long address, bool 
> >> >> >>> hugetlb)
> >> >> >>> +{
> >> >> >>> +   siginfo_t info;
> >> >> >>> +
> >> >> >>> +   info.si_signo   = SIGBUS;
> >> >> >>> +   info.si_errno   = 0;
> >> >> >>> +   info.si_code= BUS_MCEERR_AR;
> >> >> >>> +   info.si_addr= (void __user *)address;
> >> >> >>> +
> >> >> >>> +   if (hugetlb)
> >> >> >>> +   info.si_addr_lsb = PMD_SHIFT;
> >> >> >>> +   else
> >> >> >>> +   info.si_addr_lsb = PAGE_SHIFT;
> >> >> >>> +
> >> >> >>> +   send_sig_info(SIGBUS, , current);
> >> >> >>> +}
> >> >> >>> +
> >> >> >>>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t 
> >> >> >>> fault_ipa,
> >> >> >>>   struct kvm_memory_slot *memslot, unsigned 
> >> >> >>> long hva,
> >> >> >>>   unsigned long fault_status)
> >> >> >>> @@ -1306,6 +1325,10 @@ static int user_mem_abort(struct kvm_vcpu 
> >> >> >>> *vcpu, phys_addr_t fault_ipa,
> >> >> >>> smp_rmb();
> >> >> >>>  
> >> >> >>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, );
> >> >> >>> +   if (pfn == KVM_PFN_ERR_HWPOISON) {
> >> >> >>> +   kvm_send_hwpoison_signal(hva, hugetlb);
> >> >> >>
> >> >> >> The way this is called means that we'll only notify userspace of a 
> >> >> >> huge
> >> >> >> mapping if userspace is mapping hugetlbfs, and not because the stage2
> >> >> >> mapping may or may not have used transparent huge pages when the 
> >> >> >> error
> >> >> >> was discovered.  Is this the desired semantics?
> >> >> 
> >> >> No,
> >> >> 
> >> >> 
> >> >> > I think so.
> >> >> >
> >> >> > AFAIUI, transparent hugepages are split before being poisoned while 
> >> >> > all
> >> >> > the underlying pages of a hugepage are poisoned together, i.e., no
> >> >> > splitting.
> >> >> 
> >> >> In which case I need to look into this some more!
> >> >> 
> >> >> My thinking was we should report the size that was knocked out of the 
> >> >> stage2 to
> >> >> avoid the guest repeatedly faulting until it has touched every 
> >> >> guest-page-size
> >> >> in the stage2 hole.
> >> >
> >> > By signaling something at the fault path, I think it's going to be very
> >> > hard to backtrack how the stage 2 page tables looked like when faults
> >> > started happening, because I think these are completely decoupled events
> >> > (the mmu notifier and the later fault).
> >> >
> >> >> 
> >> >> Reading the code in that kvm/mmu.c it looked like the mapping sizes 
> >> >> would always
> >> >> be the same as those used by userspace.
> >> >
> >> > I think the mapping sizes should be the same between userspace and KVM,
> >> > but the mapping size of a particular page (and associated pages) may
> >> > vary over time.
> >> 
> >> Stage 1 and Stage 2 support different hugepage sizes. A larger size
> >> 

Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Baicar, Tyler

Hello Marc,


On 3/24/2017 8:03 AM, Marc Zyngier wrote:

On 21/03/17 22:47, Tyler Baicar wrote:

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

Signed-off-by: Tyler Baicar 
---
  arch/arm/include/asm/kvm_arm.h   | 10 +
  arch/arm/include/asm/system_misc.h   |  5 +
  arch/arm/kvm/mmu.c   | 41 ++--
  arch/arm64/include/asm/kvm_arm.h | 10 +
  arch/arm64/include/asm/system_misc.h |  2 ++
  arch/arm64/mm/fault.c| 19 +++--
  drivers/acpi/apei/ghes.c | 13 ++--
  include/acpi/ghes.h  |  2 +-
  8 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index a3f0b3d..ebf020b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,16 @@
  #define FSC_FAULT (0x04)
  #define FSC_ACCESS(0x08)
  #define FSC_PERM  (0x0c)
+#define FSC_SEA(0x10)
+#define FSC_SEA_TTW0   (0x14)
+#define FSC_SEA_TTW1   (0x15)
+#define FSC_SEA_TTW2   (0x16)
+#define FSC_SEA_TTW3   (0x17)
+#define FSC_SECC   (0x18)
+#define FSC_SECC_TTW0  (0x1c)
+#define FSC_SECC_TTW1  (0x1d)
+#define FSC_SECC_TTW2  (0x1e)
+#define FSC_SECC_TTW3  (0x1f)
  
  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */

  #define HPFAR_MASK(~0xf)
diff --git a/arch/arm/include/asm/system_misc.h 
b/arch/arm/include/asm/system_misc.h
index a3d61ad..ea45d94 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -24,4 +24,9 @@
  
  #endif /* !__ASSEMBLY__ */
  
+static inline int handle_guest_sea(unsigned long addr, unsigned int esr)

+{
+   return -1;
+}
+

Shouldn't this be in the #ifndef __ASSEMBLY__ block? The assembler is
definitely going to barf on that...

I will move this in the next patch set.

  #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616f..105b6ab 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "trace.h"
  
@@ -1406,6 +1407,24 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)

kvm_set_pfn_accessed(pfn);
  }
  
+static bool is_abort_synchronous(unsigned long fault_status) {

+   switch (fault_status) {
+   case FSC_SEA:
+   case FSC_SEA_TTW0:
+   case FSC_SEA_TTW1:
+   case FSC_SEA_TTW2:
+   case FSC_SEA_TTW3:
+   case FSC_SECC:
+   case FSC_SECC_TTW0:
+   case FSC_SECC_TTW1:
+   case FSC_SECC_TTW2:
+   case FSC_SECC_TTW3:
+   return true;
+   default:
+   return false;
+   }
+}
+
  /**
   * kvm_handle_guest_abort - handles all 2nd stage aborts
   * @vcpu: the VCPU pointer
@@ -1426,23 +1445,31 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
struct kvm_run *run)
unsigned long hva;
bool is_iabt, write_fault, writable;
gfn_t gfn;
-   int ret, idx;
+   int ret, idx, sea_status = 1;
+
+   /* Check the stage-2 fault is trans. fault or write fault */
+   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+
+   fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+
+   /* The host kernel will handle the synchronous external abort. There
+* is no need to pass the error into the guest.
+*/
+   if (is_abort_synchronous(fault_status))
+   sea_status = handle_guest_sea((unsigned long)fault_ipa,
+   kvm_vcpu_get_hsr(vcpu));
  
  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);

-   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
+   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);
return 1;
}
  
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);

-
trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
  kvm_vcpu_get_hfar(vcpu), fault_ipa);
  
-	/* Check the stage-2 fault is trans. fault or write fault */

-   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
-   fault_status != FSC_ACCESS) {
+   fault_status != FSC_ACCESS && sea_status) {
kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
kvm_vcpu_trap_get_class(vcpu),
(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6e99978..61d694c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -204,6 +204,16 @@
  #define FSC_FAULT ESR_ELx_FSC_FAULT
  #define FSC_ACCESS

Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Baicar, Tyler

Hello Christoffer,


On 3/24/2017 10:01 AM, Christoffer Dall wrote:

On Tue, Mar 21, 2017 at 04:47:05PM -0600, Tyler Baicar wrote:

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

The logic in kvm_handle_guest_abort could deserve an explanation here.

Surely the whole logic of how KVM fits into the picture here and how
we're dealing with synchronous errors could be explained here, based on
the long conversations between you and James on many prior iterations on
this patch.

I will add additional info here.

Signed-off-by: Tyler Baicar 
---
  arch/arm/include/asm/kvm_arm.h   | 10 +
  arch/arm/include/asm/system_misc.h   |  5 +
  arch/arm/kvm/mmu.c   | 41 ++--
  arch/arm64/include/asm/kvm_arm.h | 10 +
  arch/arm64/include/asm/system_misc.h |  2 ++
  arch/arm64/mm/fault.c| 19 +++--
  drivers/acpi/apei/ghes.c | 13 ++--
  include/acpi/ghes.h  |  2 +-
  8 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index a3f0b3d..ebf020b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,16 @@
  #define FSC_FAULT (0x04)
  #define FSC_ACCESS(0x08)
  #define FSC_PERM  (0x0c)
+#define FSC_SEA(0x10)
+#define FSC_SEA_TTW0   (0x14)
+#define FSC_SEA_TTW1   (0x15)
+#define FSC_SEA_TTW2   (0x16)
+#define FSC_SEA_TTW3   (0x17)
+#define FSC_SECC   (0x18)
+#define FSC_SECC_TTW0  (0x1c)
+#define FSC_SECC_TTW1  (0x1d)
+#define FSC_SECC_TTW2  (0x1e)
+#define FSC_SECC_TTW3  (0x1f)
  
  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */

  #define HPFAR_MASK(~0xf)
diff --git a/arch/arm/include/asm/system_misc.h 
b/arch/arm/include/asm/system_misc.h
index a3d61ad..ea45d94 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -24,4 +24,9 @@
  
  #endif /* !__ASSEMBLY__ */
  
+static inline int handle_guest_sea(unsigned long addr, unsigned int esr)

+{
+   return -1;
+}
+
  #endif /* __ASM_ARM_SYSTEM_MISC_H */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616f..105b6ab 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "trace.h"
  
@@ -1406,6 +1407,24 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)

kvm_set_pfn_accessed(pfn);
  }
  
+static bool is_abort_synchronous(unsigned long fault_status) {

this naming feels a bit weird to me.  For example, this will return
false for a normal translation fault, which is indeed synchronous.

Below you cann handle_guest_sea conditionally based on the result of
this function, so could it be called is_abort_sea ?

Yes, I will change the naming.

+   switch (fault_status) {
+   case FSC_SEA:
+   case FSC_SEA_TTW0:
+   case FSC_SEA_TTW1:
+   case FSC_SEA_TTW2:
+   case FSC_SEA_TTW3:
+   case FSC_SECC:
+   case FSC_SECC_TTW0:
+   case FSC_SECC_TTW1:
+   case FSC_SECC_TTW2:
+   case FSC_SECC_TTW3:
+   return true;
+   default:
+   return false;
+   }
+}
+
  /**
   * kvm_handle_guest_abort - handles all 2nd stage aborts
   * @vcpu: the VCPU pointer
@@ -1426,23 +1445,31 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
struct kvm_run *run)
unsigned long hva;
bool is_iabt, write_fault, writable;
gfn_t gfn;
-   int ret, idx;
+   int ret, idx, sea_status = 1;
+
+   /* Check the stage-2 fault is trans. fault or write fault */

I think this comment was supposed to belong with the check for the three
types below, not for calling kvm_vcpu_trap_get_fault_type, and I also
think the comment is outdated, because it doesn't mention the access
flag.

I suggest either removing the comment, or keep it where it was and
change to say "trans. fault, write fault, acces flag fault, or
sea_status", with some insight into what sea_status means here.

Will remove.



+   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+
+   fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+
+   /* The host kernel will handle the synchronous external abort. There
+* is no need to pass the error into the guest.
+*/

nit: commenting style.

Will change.

+   if (is_abort_synchronous(fault_status))
+   sea_status = handle_guest_sea((unsigned long)fault_ipa,

will this ever be used on AArch32 in the future?

If so, casting a phys_addr_t for an IPA to an unsigned long is an
incredibly bad idea.  Why not just use phys_addr_t for handle_guest_sea?

I'll change it to phys_addr_t.



+   kvm_vcpu_get_hsr(vcpu));
  
  	is_iabt = 

Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Baicar, Tyler

Hello,


On 3/28/2017 3:53 AM, gengdongjiu wrote:

Hi,

On 2017/3/22 6:47, Tyler Baicar wrote:

+   fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+
+   /* The host kernel will handle the synchronous external abort. There
+* is no need to pass the error into the guest.
+*/
+   if (is_abort_synchronous(fault_status))
+   sea_status = handle_guest_sea((unsigned long)fault_ipa,
+   kvm_vcpu_get_hsr(vcpu));
  
  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);

-   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
+   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);
return 1;
}

After the host kernel correctly handle the synchronous external abort, 
the sea_status
will return 0, so the code logical will be continue go-no, whether it 
is better directly return
after correctly handle the SEA? such as below.

if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);
return 1;
} else
 return 1;

Yes, I will return after successful SEA handling in the next patch set.

Thanks,
Tyler


  
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);

-
trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
  kvm_vcpu_get_hfar(vcpu), fault_ipa);
  
-	/* Check the stage-2 fault is trans. fault or write fault */

-   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V13 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Baicar, Tyler

Hello,


On 3/27/2017 10:19 PM, gengdongjiu wrote:

Hi Tyler,

I have a question for below code.

On 2017/3/25 0:01, Christoffer Dall wrote:

is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
-   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
+   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu)) && sea_status) {
kvm_inject_vabt(vcpu);

when it is SEA synchronized abort, why here inject a asynchronous abort through 
kvm_inject_vabt(vcpu) instead of synchronized abort? seem the original kvm 
source code is also do that, so I am confused about that.
thanks .
If the SEA was properly handled, then sea_status will avoid injecting 
the vabt. I'm going to just return after the SEA handling in the next 
patch though.


Thanks,
Tyler



return 1;
}


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V14 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Baicar, Tyler

Hello Christoffer,


On 3/28/2017 2:26 PM, Christoffer Dall wrote:

On Tue, Mar 28, 2017 at 01:30:40PM -0600, Tyler Baicar wrote:

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.

Signed-off-by: Tyler Baicar 
Acked-by: Catalin Marinas 
Acked-by: Marc Zyngier 
---
  arch/arm/include/asm/kvm_arm.h   | 10 ++
  arch/arm/include/asm/system_misc.h   |  5 +
  arch/arm/kvm/mmu.c   | 34 +++---
  arch/arm64/include/asm/kvm_arm.h | 10 ++
  arch/arm64/include/asm/system_misc.h |  2 ++
  arch/arm64/mm/fault.c| 23 +--
  drivers/acpi/apei/ghes.c | 13 +++--
  include/acpi/ghes.h  |  2 +-
  8 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index a3f0b3d..ebf020b 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -187,6 +187,16 @@
  #define FSC_FAULT (0x04)
  #define FSC_ACCESS(0x08)
  #define FSC_PERM  (0x0c)
+#define FSC_SEA(0x10)
+#define FSC_SEA_TTW0   (0x14)
+#define FSC_SEA_TTW1   (0x15)
+#define FSC_SEA_TTW2   (0x16)
+#define FSC_SEA_TTW3   (0x17)
+#define FSC_SECC   (0x18)
+#define FSC_SECC_TTW0  (0x1c)
+#define FSC_SECC_TTW1  (0x1d)
+#define FSC_SECC_TTW2  (0x1e)
+#define FSC_SECC_TTW3  (0x1f)
  
  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */

  #define HPFAR_MASK(~0xf)
diff --git a/arch/arm/include/asm/system_misc.h 
b/arch/arm/include/asm/system_misc.h
index a3d61ad..8c4a89f 100644
--- a/arch/arm/include/asm/system_misc.h
+++ b/arch/arm/include/asm/system_misc.h
@@ -22,6 +22,11 @@
  
  extern unsigned int user_debug;
  
+static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)

+{
+   return -1;
+}
+
  #endif /* !__ASSEMBLY__ */
  
  #endif /* __ASM_ARM_SYSTEM_MISC_H */

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 962616f..9a977c8 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "trace.h"
  
@@ -1406,6 +1407,24 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)

kvm_set_pfn_accessed(pfn);
  }
  
+static bool is_abort_sea(unsigned long fault_status) {

+   switch (fault_status) {
+   case FSC_SEA:
+   case FSC_SEA_TTW0:
+   case FSC_SEA_TTW1:
+   case FSC_SEA_TTW2:
+   case FSC_SEA_TTW3:
+   case FSC_SECC:
+   case FSC_SECC_TTW0:
+   case FSC_SECC_TTW1:
+   case FSC_SECC_TTW2:
+   case FSC_SECC_TTW3:
+   return true;
+   default:
+   return false;
+   }
+}
+
  /**
   * kvm_handle_guest_abort - handles all 2nd stage aborts
   * @vcpu: the VCPU pointer
@@ -1428,19 +1447,28 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
struct kvm_run *run)
gfn_t gfn;
int ret, idx;
  
+	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);

+
+   fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+
+   /*
+* The host kernel will handle the synchronous external abort. There
+* is no need to pass the error into the guest.
+*/
+   if (is_abort_sea(fault_status))
+   if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+   return 1;
+

So what's the logic in presenting this as a vabt to the guest when the
host couldn't handle it?  What do you do in other parts of the kernel if
you see an abort that the host cannot handle?
The only cases that handle_guest_sea() will not return zero are there is 
no error to process
(firmware support isn't there) or the error populated is invalid. If 
there is no error to process,
we want to continue here with the handling that exists. If the error is 
invalid then GHES will
report that and since the error wasn't handled properly we should 
continue with existing

handling.

If there is an abort that GHES fails to handle in the host then GHES 
just reports that and we

also continue with the abort handling.


nit: I'd prefer to see braces around the the multi-line block above.

I will add braces.




is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
kvm_inject_vabt(vcpu);
return 1;
}
  
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);

-
trace_kvm_guest_fault(*vcpu_pc(vcpu), 

Re: [PATCH V14 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Christoffer Dall
On Tue, Mar 28, 2017 at 03:33:17PM -0600, Baicar, Tyler wrote:
> Hello Christoffer,
> 
> 
> On 3/28/2017 2:26 PM, Christoffer Dall wrote:
> >On Tue, Mar 28, 2017 at 01:30:40PM -0600, Tyler Baicar wrote:
> >>Currently external aborts are unsupported by the guest abort
> >>handling. Add handling for SEAs so that the host kernel reports
> >>SEAs which occur in the guest kernel.
> >>
> >>When an SEA occurs in the guest kernel, the guest exits and is
> >>routed to kvm_handle_guest_abort(). Prior to this patch, a print
> >>message of an unsupported FSC would be printed and nothing else
> >>would happen. With this patch, the code gets routed to the APEI
> >>handling of SEAs in the host kernel to report the SEA information.
> >>
> >>Signed-off-by: Tyler Baicar 
> >>Acked-by: Catalin Marinas 
> >>Acked-by: Marc Zyngier 
> >>---
> >>  arch/arm/include/asm/kvm_arm.h   | 10 ++
> >>  arch/arm/include/asm/system_misc.h   |  5 +
> >>  arch/arm/kvm/mmu.c   | 34 
> >> +++---
> >>  arch/arm64/include/asm/kvm_arm.h | 10 ++
> >>  arch/arm64/include/asm/system_misc.h |  2 ++
> >>  arch/arm64/mm/fault.c| 23 +--
> >>  drivers/acpi/apei/ghes.c | 13 +++--
> >>  include/acpi/ghes.h  |  2 +-
> >>  8 files changed, 87 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> >>index a3f0b3d..ebf020b 100644
> >>--- a/arch/arm/include/asm/kvm_arm.h
> >>+++ b/arch/arm/include/asm/kvm_arm.h
> >>@@ -187,6 +187,16 @@
> >>  #define FSC_FAULT (0x04)
> >>  #define FSC_ACCESS(0x08)
> >>  #define FSC_PERM  (0x0c)
> >>+#define FSC_SEA(0x10)
> >>+#define FSC_SEA_TTW0   (0x14)
> >>+#define FSC_SEA_TTW1   (0x15)
> >>+#define FSC_SEA_TTW2   (0x16)
> >>+#define FSC_SEA_TTW3   (0x17)
> >>+#define FSC_SECC   (0x18)
> >>+#define FSC_SECC_TTW0  (0x1c)
> >>+#define FSC_SECC_TTW1  (0x1d)
> >>+#define FSC_SECC_TTW2  (0x1e)
> >>+#define FSC_SECC_TTW3  (0x1f)
> >>  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> >>  #define HPFAR_MASK(~0xf)
> >>diff --git a/arch/arm/include/asm/system_misc.h 
> >>b/arch/arm/include/asm/system_misc.h
> >>index a3d61ad..8c4a89f 100644
> >>--- a/arch/arm/include/asm/system_misc.h
> >>+++ b/arch/arm/include/asm/system_misc.h
> >>@@ -22,6 +22,11 @@
> >>  extern unsigned int user_debug;
> >>+static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> >>+{
> >>+   return -1;
> >>+}
> >>+
> >>  #endif /* !__ASSEMBLY__ */
> >>  #endif /* __ASM_ARM_SYSTEM_MISC_H */
> >>diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >>index 962616f..9a977c8 100644
> >>--- a/arch/arm/kvm/mmu.c
> >>+++ b/arch/arm/kvm/mmu.c
> >>@@ -29,6 +29,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >>+#include 
> >>  #include "trace.h"
> >>@@ -1406,6 +1407,24 @@ static void handle_access_fault(struct kvm_vcpu 
> >>*vcpu, phys_addr_t fault_ipa)
> >>kvm_set_pfn_accessed(pfn);
> >>  }
> >>+static bool is_abort_sea(unsigned long fault_status) {
> >>+   switch (fault_status) {
> >>+   case FSC_SEA:
> >>+   case FSC_SEA_TTW0:
> >>+   case FSC_SEA_TTW1:
> >>+   case FSC_SEA_TTW2:
> >>+   case FSC_SEA_TTW3:
> >>+   case FSC_SECC:
> >>+   case FSC_SECC_TTW0:
> >>+   case FSC_SECC_TTW1:
> >>+   case FSC_SECC_TTW2:
> >>+   case FSC_SECC_TTW3:
> >>+   return true;
> >>+   default:
> >>+   return false;
> >>+   }
> >>+}
> >>+
> >>  /**
> >>   * kvm_handle_guest_abort - handles all 2nd stage aborts
> >>   * @vcpu: the VCPU pointer
> >>@@ -1428,19 +1447,28 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
> >>struct kvm_run *run)
> >>gfn_t gfn;
> >>int ret, idx;
> >>+   fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> >>+
> >>+   fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> >>+
> >>+   /*
> >>+* The host kernel will handle the synchronous external abort. There
> >>+* is no need to pass the error into the guest.
> >>+*/
> >>+   if (is_abort_sea(fault_status))
> >>+   if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
> >>+   return 1;
> >>+
> >So what's the logic in presenting this as a vabt to the guest when the
> >host couldn't handle it?  What do you do in other parts of the kernel if
> >you see an abort that the host cannot handle?
> The only cases that handle_guest_sea() will not return zero are
> there is no error to process
> (firmware support isn't there) or the error populated is invalid. If
> there is no error to process,
> we want to continue here with the handling that exists. If the error
> is invalid then GHES will
> report that and since the error wasn't handled properly we should
> continue with existing
> handling.
> 
> If there is an abort that GHES fails to handle in the host then GHES
> just reports that and we

Re: [PATCH 00/15] arm64/kvm: use common sysreg definitions

2017-03-28 Thread Mark Rutland
On Wed, Mar 22, 2017 at 06:35:13PM +, Mark Rutland wrote:
> On Fri, Mar 10, 2017 at 06:35:55PM +, Will Deacon wrote:
> > On Fri, Mar 10, 2017 at 08:17:22AM +, Marc Zyngier wrote:
> 
> > > The next question is how do we merge this. Obviously, we can't split it
> > > between trees, and this is very likely to clash with anything that we
> > > will merge on the KVM side (the sysreg table is a popular place).
> > > 
> > > Will, Catalin: Would it make sense to create a stable branch with these
> > > patches, and merge it into both the arm64 and KVM trees? That'd make
> > > things easier...
> > 
> > I think the scope for conflict on our side is pretty high too, so a shared
> > branch might be the best way to go. I don't want to branch just yet though,
> > so I'll probably wait a week or so before setting something in stone.
> 
> Any further thoughts on this?
> 
> Christoffer has Acked the KVM bits, so if you're happy to do so for the
> arm64 bits I can make a stable branch.

Looking around, it doesn't look like there's anything outside of arm64
that'll conflict on the  changes, and git's happy to merge
my changes with Suzuki's changes currently queued in arm64's
for-next/core branch.

I think it would make sense for those to be in a common branch taken by
both the arm64 and KVM trees, with the KVM-specific parts being taken by
KVM alone atop of that.

Would everyone be happy with that?

For reference, I've updated my branches so that arm64/common-sysreg only
contains the common parts, with the KVM parts atop of that in
kvm/common-sysreg.

Thanks,
Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH V14 07/10] efi: print unrecognized CPER section

2017-03-28 Thread Tyler Baicar
UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

For above mentioned case, this change prints out the raw data in
hex in dmesg buffer. Data length is taken from Error Data length
field of Generic Error Data Entry.

The following is a sample output from dmesg:
[  140.739180] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 2
[  140.739182] {1}[Hardware Error]: It has been corrected by h/w and requires 
no further action
[  140.739191] {1}[Hardware Error]: event severity: corrected
[  140.739196] {1}[Hardware Error]:  time: precise 2017-03-15 20:37:35
[  140.739197] {1}[Hardware Error]:  Error 0, type: corrected
[  140.739203] {1}[Hardware Error]:   section type: unknown, 
d2e2621c-f936-468d-0d84-15a4ed015c8b
[  140.739205] {1}[Hardware Error]:   section length: 568 (0x238)
[  140.739210] {1}[Hardware Error]:   : 4d415201 4d492031 453a4d45 
435f4343  .RAM1 IMEM:ECC_C
[  140.739214] {1}[Hardware Error]:   0010: 53515f45 44525f42  
  E_QSB_RD
[  140.739217] {1}[Hardware Error]:   0020:    
  
[  140.739220] {1}[Hardware Error]:   0030:   0101 
0101  
[  140.739223] {1}[Hardware Error]:   0040:   0005 
  
[  140.739226] {1}[Hardware Error]:   0050: 0101  0001 
0000  
...

The raw data from the error can then be decoded using vendor
specific tools.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
---
 drivers/firmware/efi/cper.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index 56aa516..d263bc8 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -591,8 +591,16 @@ static void cper_estatus_print_section(
cper_print_proc_arm(newpfx, arm_err);
else
goto err_section_too_small;
-   } else
-   printk("%s""section type: unknown, %pUl\n", newpfx, sec_type);
+   } else {
+   const void *unknown_err;
+
+   unknown_err = acpi_hest_generic_data_payload(gdata);
+   printk("%ssection type: unknown, %pUl\n", newpfx, sec_type);
+   printk("%ssection length: %d (%#x)\n", newpfx,
+  gdata->error_data_length, gdata->error_data_length);
+   print_hex_dump(newpfx, "", DUMP_PREFIX_OFFSET, 16, 4,
+  unknown_err, gdata->error_data_length, true);
+   }
 
return;
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH V14 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section

2017-03-28 Thread Tyler Baicar
UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
any section type that the kernel knows how to parse, trace event
is not generated for such section. And thus user is not able to know
happening of such hardware error, including error record of
non-standard section.

This commit generates a trace event which contains raw error data
for unrecognized CPER section.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Tested-by: Shiju Jose 
---
 drivers/acpi/apei/ghes.c | 24 ++--
 drivers/ras/ras.c|  1 +
 include/ras/ras_event.h  | 45 +
 3 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 7e3e5e0..3ecbacc 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -45,11 +45,13 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include "apei-internal.h"
 
@@ -454,11 +456,21 @@ static void ghes_do_proc(struct ghes *ghes,
 {
int sev, sec_sev;
struct acpi_hest_generic_data *gdata;
+   uuid_le sec_type;
+   uuid_le *fru_id = _UUID_LE;
+   char *fru_text = "";
 
sev = ghes_severity(estatus->error_severity);
apei_estatus_for_each_section(estatus, gdata) {
sec_sev = ghes_severity(gdata->error_severity);
-   if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+   sec_type = *(uuid_le *)gdata->section_type;
+
+   if (gdata->validation_bits & CPER_SEC_VALID_FRU_ID)
+   fru_id = (uuid_le *)gdata->fru_id;
+   if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
+   fru_text = gdata->fru_text;
+
+   if (!uuid_le_cmp(sec_type,
 CPER_SEC_PLATFORM_MEM)) {
struct cper_sec_mem_err *mem_err;
 
@@ -469,7 +481,7 @@ static void ghes_do_proc(struct ghes *ghes,
ghes_handle_memory_failure(gdata, sev);
}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
-   else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
+   else if (!uuid_le_cmp(sec_type,
  CPER_SEC_PCIE)) {
struct cper_sec_pcie *pcie_err;
 
@@ -502,6 +514,14 @@ static void ghes_do_proc(struct ghes *ghes,
 
}
 #endif
+#ifdef CONFIG_RAS
+   else if (trace_unknown_sec_event_enabled()) {
+   void *unknown_err = 
acpi_hest_generic_data_payload(gdata);
+   trace_unknown_sec_event(_type,
+   fru_id, fru_text, sec_sev,
+   unknown_err, gdata->error_data_length);
+   }
+#endif
}
 }
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index b67dd36..fb2500b 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -27,3 +27,4 @@ static int __init ras_init(void)
 EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 1791a12..5861b6f 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@
 );
 
 /*
+ * Unknown Section Report
+ *
+ * This event is generated when hardware detected a hardware
+ * error event, which may be of non-standard section as defined
+ * in UEFI spec appendix "Common Platform Error Record", or may
+ * be of sections for which TRACE_EVENT is not defined.
+ *
+ */
+TRACE_EVENT(unknown_sec_event,
+
+   TP_PROTO(const uuid_le *sec_type,
+const uuid_le *fru_id,
+const char *fru_text,
+const u8 sev,
+const u8 *err,
+const u32 len),
+
+   TP_ARGS(sec_type, fru_id, fru_text, sev, err, len),
+
+   TP_STRUCT__entry(
+   __array(char, sec_type, 16)
+   __array(char, fru_id, 16)
+   __string(fru_text, fru_text)
+   __field(u8, sev)
+   __field(u32, len)
+   __dynamic_array(u8, buf, len)
+   ),
+
+   TP_fast_assign(
+   memcpy(__entry->sec_type, sec_type, sizeof(uuid_le));
+   memcpy(__entry->fru_id, fru_id, sizeof(uuid_le));
+   __assign_str(fru_text, fru_text);
+   __entry->sev = sev;
+   __entry->len = len;
+   memcpy(__get_dynamic_array(buf), err, len);
+   ),
+
+   TP_printk("severity: %d; sec type:%pU; FRU: %pU %s; data len:%d; raw 
data:%s",
+ __entry->sev, __entry->sec_type,
+ __entry->fru_id, 

[PATCH V14 09/10] trace, ras: add ARM processor error trace event

2017-03-28 Thread Tyler Baicar
Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar 
Acked-by: Steven Rostedt 
Reviewed-by: Xie XiuQi 
---
 drivers/acpi/apei/ghes.c|  8 +++-
 drivers/firmware/efi/cper.c |  1 +
 drivers/ras/ras.c   |  1 +
 include/ras/ras_event.h | 45 +
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 3ecbacc..230b095 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -515,7 +515,13 @@ static void ghes_do_proc(struct ghes *ghes,
}
 #endif
 #ifdef CONFIG_RAS
-   else if (trace_unknown_sec_event_enabled()) {
+   else if (!uuid_le_cmp(sec_type, CPER_SEC_PROC_ARM) &&
+trace_arm_event_enabled()) {
+   struct cper_sec_proc_arm *arm_err;
+
+   arm_err = acpi_hest_generic_data_payload(gdata);
+   trace_arm_event(arm_err);
+   } else if (trace_unknown_sec_event_enabled()) {
void *unknown_err = 
acpi_hest_generic_data_payload(gdata);
trace_unknown_sec_event(_type,
fru_id, fru_text, sec_sev,
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index d263bc8..37a39af 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define INDENT_SP  " "
 
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index fb2500b..8ba5a94 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -28,3 +28,4 @@ static int __init ras_init(void)
 #endif
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(unknown_sec_event);
+EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index 5861b6f..13befad 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -162,6 +162,51 @@
 );
 
 /*
+ * ARM Processor Events Report
+ *
+ * This event is generated when hardware detects an ARM processor error
+ * has occurred. UEFI 2.6 spec section N.2.4.4.
+ */
+TRACE_EVENT(arm_event,
+
+   TP_PROTO(const struct cper_sec_proc_arm *proc),
+
+   TP_ARGS(proc),
+
+   TP_STRUCT__entry(
+   __field(u64, mpidr)
+   __field(u64, midr)
+   __field(u32, running_state)
+   __field(u32, psci_state)
+   __field(u8, affinity)
+   ),
+
+   TP_fast_assign(
+   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
+   __entry->affinity = proc->affinity_level;
+   else
+   __entry->affinity = ~0;
+   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
+   __entry->mpidr = proc->mpidr;
+   else
+   __entry->mpidr = 0ULL;
+   __entry->midr = proc->midr;
+   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
+   __entry->running_state = proc->running_state;
+   __entry->psci_state = proc->psci_state;
+   } else {
+   __entry->running_state = ~0;
+   __entry->psci_state = ~0;
+   }
+   ),
+
+   TP_printk("affinity level: %d; MPIDR: %016llx; MIDR: %016llx; "
+ "running state: %d; PSCI state: %d",
+ __entry->affinity, __entry->mpidr, __entry->midr,
+ __entry->running_state, __entry->psci_state)
+);
+
+/*
  * Unknown Section Report
  *
  * This event is generated when hardware detected a hardware
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH V14 01/10] acpi: apei: read ack upon ghes record consumption

2017-03-28 Thread Tyler Baicar
A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
---
 drivers/acpi/apei/ghes.c | 49 +---
 drivers/acpi/apei/hest.c |  7 +--
 include/acpi/ghes.h  |  5 -
 3 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index b192b42..0241e36 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -80,6 +81,10 @@
((struct acpi_hest_generic_status *)\
 ((struct ghes_estatus_node *)(estatus_node) + 1))
 
+#define IS_HEST_TYPE_GENERIC_V2(ghes)  \
+   ((struct acpi_hest_header *)ghes->generic)->type == \
+ACPI_HEST_TYPE_GENERIC_ERROR_V2
+
 /*
  * This driver isn't really modular, however for the time being,
  * continuing to use module_param is the easiest way to remain
@@ -249,10 +254,18 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
*generic)
ghes = kzalloc(sizeof(*ghes), GFP_KERNEL);
if (!ghes)
return ERR_PTR(-ENOMEM);
+
ghes->generic = generic;
+   if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+   rc = apei_map_generic_address(
+   >generic_v2->read_ack_register);
+   if (rc)
+   goto err_free;
+   }
+
rc = apei_map_generic_address(>error_status_address);
if (rc)
-   goto err_free;
+   goto err_unmap_read_ack_addr;
error_block_length = generic->error_block_length;
if (error_block_length > GHES_ESTATUS_MAX_SIZE) {
pr_warning(FW_WARN GHES_PFX
@@ -264,13 +277,17 @@ static struct ghes *ghes_new(struct acpi_hest_generic 
*generic)
ghes->estatus = kmalloc(error_block_length, GFP_KERNEL);
if (!ghes->estatus) {
rc = -ENOMEM;
-   goto err_unmap;
+   goto err_unmap_status_addr;
}
 
return ghes;
 
-err_unmap:
+err_unmap_status_addr:
apei_unmap_generic_address(>error_status_address);
+err_unmap_read_ack_addr:
+   if (IS_HEST_TYPE_GENERIC_V2(ghes))
+   apei_unmap_generic_address(
+   >generic_v2->read_ack_register);
 err_free:
kfree(ghes);
return ERR_PTR(rc);
@@ -280,6 +297,9 @@ static void ghes_fini(struct ghes *ghes)
 {
kfree(ghes->estatus);
apei_unmap_generic_address(>generic->error_status_address);
+   if (IS_HEST_TYPE_GENERIC_V2(ghes))
+   apei_unmap_generic_address(
+   >generic_v2->read_ack_register);
 }
 
 static inline int ghes_severity(int severity)
@@ -649,6 +669,23 @@ static void ghes_estatus_cache_add(
rcu_read_unlock();
 }
 
+static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
+{
+   int rc;
+   u64 val = 0;
+
+   rc = apei_read(, _v2->read_ack_register);
+   if (rc)
+   return rc;
+   val &= generic_v2->read_ack_preserve <<
+   generic_v2->read_ack_register.bit_offset;
+   val |= generic_v2->read_ack_write <<
+   generic_v2->read_ack_register.bit_offset;
+   rc = apei_write(val, _v2->read_ack_register);
+
+   return rc;
+}
+
 static int ghes_proc(struct ghes *ghes)
 {
int rc;
@@ -661,6 +698,12 @@ static int ghes_proc(struct ghes *ghes)
ghes_estatus_cache_add(ghes->generic, ghes->estatus);
}
ghes_do_proc(ghes, ghes->estatus);
+
+   if (IS_HEST_TYPE_GENERIC_V2(ghes)) {
+   rc = ghes_ack_error(ghes->generic_v2);
+   if (rc)
+   return rc;
+   }
 out:
ghes_clear_estatus(ghes);
return rc;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index 8f2a98e..456b488 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -52,6 +52,7 @@
[ACPI_HEST_TYPE_AER_ENDPOINT] = sizeof(struct acpi_hest_aer),
[ACPI_HEST_TYPE_AER_BRIDGE] = sizeof(struct acpi_hest_aer_bridge),
[ACPI_HEST_TYPE_GENERIC_ERROR] = sizeof(struct acpi_hest_generic),
+   

[PATCH V14 04/10] arm64: exception: handle Synchronous External Abort

2017-03-28 Thread Tyler Baicar
SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, report the error
in the kernel logs.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar 
CC: Jonathan (Zhixiong) Zhang 
Reviewed-by: James Morse 
Acked-by: Catalin Marinas 
---
 arch/arm64/include/asm/esr.h |  1 +
 arch/arm64/mm/fault.c| 43 +--
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d14c478..f20c64a 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -83,6 +83,7 @@
 #define ESR_ELx_WNR(UL(1) << 6)
 
 /* Shared ISS field definitions for Data/Instruction aborts */
+#define ESR_ELx_FnV(UL(1) << 10)
 #define ESR_ELx_EA (UL(1) << 9)
 #define ESR_ELx_S1PTW  (UL(1) << 7)
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 4bf899f..e17633d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -488,6 +488,29 @@ static int do_bad(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
return 1;
 }
 
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+   struct siginfo info;
+
+   pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+fault_name(esr), esr, addr);
+
+   info.si_signo = SIGBUS;
+   info.si_errno = 0;
+   info.si_code  = 0;
+   if (esr & ESR_ELx_FnV)
+   info.si_addr = 0;
+   else
+   info.si_addr  = (void __user *)addr;
+   arm64_notify_die("", regs, , esr);
+
+   return 0;
+}
+
 static const struct fault_info {
int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs 
*regs);
int sig;
@@ -510,22 +533,22 @@ static int do_bad(unsigned long addr, unsigned int esr, 
struct pt_regs *regs)
{ do_page_fault,SIGSEGV, SEGV_ACCERR,   "level 1 permission 
fault"  },
{ do_page_fault,SIGSEGV, SEGV_ACCERR,   "level 2 permission 
fault"  },
{ do_page_fault,SIGSEGV, SEGV_ACCERR,   "level 3 permission 
fault"  },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort"},
+   { do_sea,   SIGBUS,  0, "synchronous external 
abort"},
{ do_bad,   SIGBUS,  0, "unknown 17"
},
{ do_bad,   SIGBUS,  0, "unknown 18"
},
{ do_bad,   SIGBUS,  0, "unknown 19"
},
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous external 
abort (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error"  },
+   { do_sea,   SIGBUS,  0, "level 0 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "level 1 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "level 2 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "level 3 (translation 
table walk)"  },
+   { do_sea,   SIGBUS,  0, "synchronous parity or 
ECC error" },
{ do_bad,   SIGBUS,  0, "unknown 25"
},
{ do_bad,   SIGBUS,  0, "unknown 26"
},
{ do_bad,   SIGBUS,  0, "unknown 27"
},
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
-   { do_bad,   SIGBUS,  0, "synchronous parity 
error (translation table walk)" },
+   { do_sea,   SIGBUS,  0, "level 0 synchronous 
parity error (translation table walk)" },
+   { do_sea,   SIGBUS,  0,  

[PATCH V14 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

2017-03-28 Thread Tyler Baicar
When a memory error, CPU error, PCIe error, or other type of hardware error
that's covered by RAS occurs, firmware should populate the shared GHES memory
location with the proper GHES structures to notify the OS of the error.
For example, platforms that implement firmware first handling may implement
separate GHES sources for corrected errors and uncorrected errors. If the
error is an uncorrectable error, then the firmware will notify the OS
immediately since the error needs to be handled ASAP. The OS will then be able
to take the appropriate action needed such as offlining a page. If the error
is a corrected error, then the firmware will not interrupt the OS immediately.
Instead, the OS will see and report the error the next time it's GHES timer
expires. The kernel will first parse the GHES structures and report the errors
through the kernel logs and then notify the user space through RAS trace
events. This allows user space applications such as RAS Daemon to see the
errors and report them however the user desires. This patchset extends the
kernel functionality for RAS errors based on updates in the UEFI 2.6 and
ACPI 6.1 specifications.

An example flow from firmware to user space could be:

 +---+
   +>|   |
   | |  GHES polling |--+
+-+  |source |  |   +---+   ++
| |  +---+  |   |  Kernel GHES  |   ||
|  Firmware   | +-->|  CPER AER and |-->|  RAS trace |
| |  +---+  |   |  EDAC drivers |   |   event|
+-+  |   |  |   +---+   ++
   | |  GHES sci |--+
   +>|   source  |
 +---+

Add support for Generic Hardware Error Source (GHES) v2, which introduces the
capability for the OS to acknowledge the consumption of the error record
generated by the Reliability, Availability and Serviceability (RAS) controller.
This eliminates potential race conditions between the OS and the RAS controller.

Add support for the timestamp field added to the Generic Error Data Entry v3,
allowing the OS to log the time that the error is generated by the firmware,
rather than the time the error is consumed. This improves the correctness of
event sequences when analyzing error logs. The timestamp is added in
ACPI 6.1, reference Table 18-343 Generic Error Data Entry.

Add support for ARMv8 Common Platform Error Record (CPER) per UEFI 2.6
specification. ARMv8 specific processor error information is reported as part of
the CPER records.  This provides more detail on for processor error logs. This
can help describe ARMv8 cache, tlb, and bus errors.

Synchronous External Abort (SEA) represents a specific processor error condition
in ARM systems. A handler is added to recognize SEA errors, and a notifier is
added to parse and report the errors before the process is killed. Refer to
section N.2.1.1 in the Common Platform Error Record appendix of the UEFI 2.6
specification.

Currently the kernel ignores CPER records that are unrecognized.
On the other hand, UEFI spec allows for non-standard (eg. vendor
proprietary) error section type in CPER (Common Platform Error Record),
as defined in section N2.3 of UEFI version 2.5. Therefore, user
is not able to see hardware error data of non-standard section.

If section Type field of Generic Error Data Entry is unrecognized,
prints out the raw data in dmesg buffer, and also adds a tracepoint
for reporting such hardware errors.

Currently even if an error status block's severity is fatal, the kernel
does not honor the severity level and panic. With the firmware first
model, the platform could inform the OS about a fatal hardware error
through the non-NMI GHES notification type. The OS should panic when a
hardware error record is received with this severity.

Add support to handle SEAs that occur while a KVM guest kernel is
running. Currently these are unsupported by the guest abort handling.

V14:Make sure function prototypes are in the __ASSEMBLY__ block
Change is_abort_synchronous to is_abort_sea
Use phys_addr_t for SEA address
Return after successful SEA handling in handle_guest_abort()

V13:Rebase on 4.11rc2
Print decimal and hex sizes for unknown CPER section errors
Use proper CONFIG_* when using IS_ENABLED
Move handle_guest_sea call prior to SEI check
Add a return value to handle_guest_sea
Move RCU locking into ghes_notify_sea
Add valid bit checks to ARM trace event
Remove GPIO, SEI, and GSIV cases in GHES
Add ARCH_HAVE_NMI_SAFE_CMPXCHG since we added NMI usage

V12:Remove double quotes from CPER code
Add helper function to check all SEA cases in KVM patch
Replace nmi_enter/exit with rcu_read_lock/unlock for KVM SEA
Change HAVE_ACPI_APEI_SEA to ACPI_APEI_SEA in KVM SEA case

V11:Change print_hex_dump calls to include ASCII output
Change 

Re: host stalls when qemu-system-aarch64 with kvm and pflash

2017-03-28 Thread Radha Mohan
On Tue, Mar 28, 2017 at 1:16 PM, Christoffer Dall  wrote:
> Hi Radha,
>
> On Tue, Mar 28, 2017 at 12:58:24PM -0700, Radha Mohan wrote:
>> Hi,
>> I am seeing an issue with qemu-system-aarch64 when using pflash
>> (booting kernel via UEFI bios).
>>
>> Host kernel: 4.11.0-rc3-next-20170323
>> Qemu version: v2.9.0-rc1
>>
>> Command used:
>> ./aarch64-softmmu/qemu-system-aarch64 -cpu host -enable-kvm -M
>> virt,gic_version=3 -nographic -smp 1 -m 2048 -drive
>> if=none,id=hd0,file=/root/zesty-server-cloudimg-arm64.img,id=0 -device
>> virtio-blk-device,drive=hd0 -pflash /root/flash0.img -pflash
>> /root/flash1.img
>>
>>
>> As soon as the guest kernel boots the host starts to stall and prints
>> the below messages. And the system never recovers. I can neither
>> poweroff the guest nor the host. So I have resort to external power
>> reset of the host.
>>
>> ==
>> [  116.199077] NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s!
>> [kworker/25:1:454]
>> [  116.206901] Modules linked in: binfmt_misc nls_iso8859_1 aes_ce_blk
>> shpchp crypto_simd gpio_keys cryptd aes_ce_cipher ghash_ce sha2_ce
>> sha1_ce uio_pdrv_genirq uio autofs4 btrfs raid10 rai
>> d456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
>> raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit ttm
>> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_s
>> ys_fops drm nicvf ahci nicpf libahci thunder_bgx thunder_xcv
>> mdio_thunder mdio_cavium
>>
>> [  116.206995] CPU: 25 PID: 454 Comm: kworker/25:1 Not tainted
>> 4.11.0-rc3-next-20170323 #1
>> [  116.206997] Hardware name: www.cavium.com crb-1s/crb-1s, BIOS 0.3 Feb 23 
>> 2017
>> [  116.207010] Workqueue: events netstamp_clear
>> [  116.207015] task: 801f906b5400 task.stack: 801f901a4000
>> [  116.207020] PC is at smp_call_function_many+0x284/0x2e8
>> [  116.207023] LR is at smp_call_function_many+0x244/0x2e8
>> [  116.207026] pc : [] lr : []
>> pstate: 8145
>> [  116.207028] sp : 801f901a7be0
>> [  116.207030] x29: 801f901a7be0 x28: 09139000
>> [  116.207036] x27: 09139434 x26: 0080
>> [  116.207041] x25:  x24: 081565d0
>> [  116.207047] x23: 0001 x22: 08e11e00
>> [  116.207052] x21: 801f6d5cff00 x20: 801f6d5cff08
>> [  116.207057] x19: 09138e38 x18: 0a03
>> [  116.207063] x17: b77c9028 x16: 082e81d8
>> [  116.207068] x15: 3d0d6dd44d08 x14: 0036312196549b4a
>> [  116.207073] x13: 58dabe4c x12: 0018
>> [  116.207079] x11: 366e2f04 x10: 09f0
>> [  116.207084] x9 : 801f901a7d30 x8 : 0002
>> [  116.207089] x7 :  x6 : 
>> [  116.207095] x5 :  x4 : 0020
>> [  116.207100] x3 : 0020 x2 : 
>> [  116.207105] x1 : 801f6d682578 x0 : 0003
>>
>> [  150.443116] INFO: rcu_sched self-detected stall on CPU
>> [  150.448261]  25-...: (14997 ticks this GP)
>> idle=47a/141/0 softirq=349/349 fqs=7495
>> [  150.451115] INFO: rcu_sched detected stalls on CPUs/tasks:
>> [  150.451123]  25-...: (14997 ticks this GP)
>> idle=47a/141/0 softirq=349/349 fqs=7495
>> [  150.451124]  (detected by 13, t=15002 jiffies, g=805, c=804, q=8384)
>> [  150.451136] Task dump for CPU 25:
>> [  150.451138] kworker/25:1R  running task0   454  2 
>> 0x0002
>> [  150.451155] Workqueue: events netstamp_clear
>> [  150.451158] Call trace:
>> [  150.451164] [] __switch_to+0x90/0xa8
>> [  150.451172] [] static_key_slow_inc+0x128/0x138
>> [  150.451175] [] static_key_enable+0x34/0x60
>> [  150.451178] [] netstamp_clear+0x68/0x80
>> [  150.451181] [] process_one_work+0x158/0x478
>> [  150.451183] [] worker_thread+0x50/0x4a8
>> [  150.451187] [] kthread+0x108/0x138
>> [  150.451190] [] ret_from_fork+0x10/0x50
>> [  150.477451]   (t=15008 jiffies g=805 c=804 q=8384)
>> [  150.482242] Task dump for CPU 25:
>> [  150.482245] kworker/25:1R  running task0   454  2 
>> 0x0002
>> [  150.482259] Workqueue: events netstamp_clear
>> [  150.482264] Call trace:
>> [  150.482271] [] dump_backtrace+0x0/0x2b0
>> [  150.482277] [] show_stack+0x24/0x30
>> [  150.482281] [] sched_show_task+0x128/0x178
>> [  150.482285] [] dump_cpu_task+0x48/0x58
>> [  150.482288] [] rcu_dump_cpu_stacks+0xa0/0xe8
>> [  150.482297] [] rcu_check_callbacks+0x774/0x938
>> [  150.482305] [] update_process_times+0x34/0x60
>> [  150.482314] [] tick_sched_handle.isra.7+0x38/0x70
>> [  150.482319] [] tick_sched_timer+0x4c/0x98
>> [  150.482324] [] __hrtimer_run_queues+0xd8/0x2b8
>> [  150.482328] [] hrtimer_interrupt+0xa8/0x228
>> [  150.482334] [] arch_timer_handler_phys+0x3c/0x50
>> [  150.482341] [] handle_percpu_devid_irq+0x8c/0x230
>> [  150.482344] [] generic_handle_irq+0x34/0x50
>> [  150.482347] [] __handle_domain_irq+0x68/0xc0
>> [  150.482351] [] 

Re: [PATCH V14 10/10] arm/arm64: KVM: add guest SEA support

2017-03-28 Thread Christoffer Dall
On Tue, Mar 28, 2017 at 01:30:40PM -0600, Tyler Baicar wrote:
> Currently external aborts are unsupported by the guest abort
> handling. Add handling for SEAs so that the host kernel reports
> SEAs which occur in the guest kernel.
> 
> When an SEA occurs in the guest kernel, the guest exits and is
> routed to kvm_handle_guest_abort(). Prior to this patch, a print
> message of an unsupported FSC would be printed and nothing else
> would happen. With this patch, the code gets routed to the APEI
> handling of SEAs in the host kernel to report the SEA information.
> 
> Signed-off-by: Tyler Baicar 
> Acked-by: Catalin Marinas 
> Acked-by: Marc Zyngier 
> ---
>  arch/arm/include/asm/kvm_arm.h   | 10 ++
>  arch/arm/include/asm/system_misc.h   |  5 +
>  arch/arm/kvm/mmu.c   | 34 +++---
>  arch/arm64/include/asm/kvm_arm.h | 10 ++
>  arch/arm64/include/asm/system_misc.h |  2 ++
>  arch/arm64/mm/fault.c| 23 +--
>  drivers/acpi/apei/ghes.c | 13 +++--
>  include/acpi/ghes.h  |  2 +-
>  8 files changed, 87 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
> index a3f0b3d..ebf020b 100644
> --- a/arch/arm/include/asm/kvm_arm.h
> +++ b/arch/arm/include/asm/kvm_arm.h
> @@ -187,6 +187,16 @@
>  #define FSC_FAULT(0x04)
>  #define FSC_ACCESS   (0x08)
>  #define FSC_PERM (0x0c)
> +#define FSC_SEA  (0x10)
> +#define FSC_SEA_TTW0 (0x14)
> +#define FSC_SEA_TTW1 (0x15)
> +#define FSC_SEA_TTW2 (0x16)
> +#define FSC_SEA_TTW3 (0x17)
> +#define FSC_SECC (0x18)
> +#define FSC_SECC_TTW0(0x1c)
> +#define FSC_SECC_TTW1(0x1d)
> +#define FSC_SECC_TTW2(0x1e)
> +#define FSC_SECC_TTW3(0x1f)
>  
>  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>  #define HPFAR_MASK   (~0xf)
> diff --git a/arch/arm/include/asm/system_misc.h 
> b/arch/arm/include/asm/system_misc.h
> index a3d61ad..8c4a89f 100644
> --- a/arch/arm/include/asm/system_misc.h
> +++ b/arch/arm/include/asm/system_misc.h
> @@ -22,6 +22,11 @@
>  
>  extern unsigned int user_debug;
>  
> +static inline int handle_guest_sea(phys_addr_t addr, unsigned int esr)
> +{
> + return -1;
> +}
> +
>  #endif /* !__ASSEMBLY__ */
>  
>  #endif /* __ASM_ARM_SYSTEM_MISC_H */
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 962616f..9a977c8 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "trace.h"
>  
> @@ -1406,6 +1407,24 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, 
> phys_addr_t fault_ipa)
>   kvm_set_pfn_accessed(pfn);
>  }
>  
> +static bool is_abort_sea(unsigned long fault_status) {
> + switch (fault_status) {
> + case FSC_SEA:
> + case FSC_SEA_TTW0:
> + case FSC_SEA_TTW1:
> + case FSC_SEA_TTW2:
> + case FSC_SEA_TTW3:
> + case FSC_SECC:
> + case FSC_SECC_TTW0:
> + case FSC_SECC_TTW1:
> + case FSC_SECC_TTW2:
> + case FSC_SECC_TTW3:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
>  /**
>   * kvm_handle_guest_abort - handles all 2nd stage aborts
>   * @vcpu:the VCPU pointer
> @@ -1428,19 +1447,28 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   gfn_t gfn;
>   int ret, idx;
>  
> + fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +
> + fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +
> + /*
> +  * The host kernel will handle the synchronous external abort. There
> +  * is no need to pass the error into the guest.
> +  */
> + if (is_abort_sea(fault_status))
> + if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
> + return 1;
> +

So what's the logic in presenting this as a vabt to the guest when the
host couldn't handle it?  What do you do in other parts of the kernel if
you see an abort that the host cannot handle?

nit: I'd prefer to see braces around the the multi-line block above.


>   is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>   if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) {
>   kvm_inject_vabt(vcpu);
>   return 1;
>   }
>  
> - fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> -
>   trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
> kvm_vcpu_get_hfar(vcpu), fault_ipa);
>  
>   /* Check the stage-2 fault is trans. fault or write fault */
> - fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>   if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>   fault_status != FSC_ACCESS) {
>   kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
> diff --git a/arch/arm64/include/asm/kvm_arm.h 
> 

Re: [PATCH 00/15] arm64/kvm: use common sysreg definitions

2017-03-28 Thread Christoffer Dall
On Tue, Mar 28, 2017 at 07:48:28PM +0100, Mark Rutland wrote:
> On Wed, Mar 22, 2017 at 06:35:13PM +, Mark Rutland wrote:
> > On Fri, Mar 10, 2017 at 06:35:55PM +, Will Deacon wrote:
> > > On Fri, Mar 10, 2017 at 08:17:22AM +, Marc Zyngier wrote:
> > 
> > > > The next question is how do we merge this. Obviously, we can't split it
> > > > between trees, and this is very likely to clash with anything that we
> > > > will merge on the KVM side (the sysreg table is a popular place).
> > > > 
> > > > Will, Catalin: Would it make sense to create a stable branch with these
> > > > patches, and merge it into both the arm64 and KVM trees? That'd make
> > > > things easier...
> > > 
> > > I think the scope for conflict on our side is pretty high too, so a shared
> > > branch might be the best way to go. I don't want to branch just yet 
> > > though,
> > > so I'll probably wait a week or so before setting something in stone.
> > 
> > Any further thoughts on this?
> > 
> > Christoffer has Acked the KVM bits, so if you're happy to do so for the
> > arm64 bits I can make a stable branch.
> 
> Looking around, it doesn't look like there's anything outside of arm64
> that'll conflict on the  changes, and git's happy to merge
> my changes with Suzuki's changes currently queued in arm64's
> for-next/core branch.
> 
> I think it would make sense for those to be in a common branch taken by
> both the arm64 and KVM trees, with the KVM-specific parts being taken by
> KVM alone atop of that.
> 
> Would everyone be happy with that?

I'm happy with that.

> 
> For reference, I've updated my branches so that arm64/common-sysreg only
> contains the common parts, with the KVM parts atop of that in
> kvm/common-sysreg.
> 

Will, Catalin:  Let me know if you're going to pull from common-sysreg
and I'll do the same and add the kvm patches above.

Thanks for preparing the patches.
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm