Re: [PATCH 02/27] arm64: KVM: Hide unsupported AArch64 CPU features from guests
On Thu, Aug 17, 2017 at 09:45:51AM +0100, Marc Zyngier wrote: > On 16/08/17 21:32, Dave Martin wrote: > > On Wed, Aug 16, 2017 at 12:10:38PM +0100, Marc Zyngier wrote: > >> On 09/08/17 13:05, Dave Martin wrote: > >>> Currently, a guest kernel sees the true CPU feature registers > >>> (ID_*_EL1) when it reads them using MRS instructions. This means > >>> that the guest will observe features that are present in the > >>> hardware but the host doesn't understand or doesn't provide support > >>> for. A guest may legimitately try to use such a feature as per the > >>> architecture, but use of the feature may trap instead of working > >>> normally, triggering undef injection into the guest. > >>> > >>> This is not a problem for the host, but the guest may go wrong when > >>> running on newer hardware than the host knows about. > >>> > >>> This patch hides from guest VMs any AArch64-specific CPU features > >>> that the host doesn't support, by exposing to the guest the > >>> sanitised versions of the registers computed by the cpufeatures > >>> framework, instead of the true hardware registers. To achieve > >>> this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation > >>> code is added to KVM to report the sanitised versions of the > >>> affected registers in response to MRS and register reads from > >>> userspace. > >>> > >>> The affected registers are removed from invariant_sys_regs[] (since > >>> the invariant_sys_regs handling is no longer quite correct for > >>> them) and added to sys_reg_desgs[], with appropriate access(), > >>> get_user() and set_user() methods. No runtime vcpu storage is > >>> allocated for the registers: instead, they are read on demand from > >>> the cpufeatures framework. This may need modification in the > >>> future if there is a need for userspace to customise the features > >>> visible to the guest. > >>> > >>> Attempts by userspace to write the registers are handled similarly > >>> to the current invariant_sys_regs handling: writes are permitted, > >>> but only if they don't attempt to change the value. This is > >>> sufficient to support VM snapshot/restore from userspace. > >>> > >>> Because of the additional registers, restoring a VM on an older > >>> kernel may not work unless userspace knows how to handle the extra > >>> VM registers exposed to the KVM user ABI by this patch. > >>> > >>> Under the principle of least damage, this patch makes no attempt to > >>> handle any of the other registers currently in > >>> invariant_sys_regs[], or to emulate registers for AArch32: however, > >>> these could be handled in a similar way in future, as necessary. > >>> > >>> Signed-off-by: Dave Martin> >>> --- > >>> arch/arm64/kvm/hyp/switch.c | 6 ++ > >>> arch/arm64/kvm/sys_regs.c | 224 > >>> +++- > >>> 2 files changed, 185 insertions(+), 45 deletions(-) [...] > >>> +static bool __access_id_reg(struct kvm_vcpu *vcpu, > >>> + struct sys_reg_params *p, > >>> + const struct sys_reg_desc const *r, > >>> + bool raz) > >>> +{ > >>> + if (p->is_write) { > >>> + kvm_inject_undefined(vcpu); > >>> + return false; > >>> + } > >> > >> I don't think this is supposed to happen (should have UNDEF-ed at EL1). > >> You can call write_to_read_only() in that case, which will spit out a > >> warning and inject the exception. > > > > I'll check this -- sounds about right. > > > > If is should never happen, should I just delete that code or BUG()? I > > notice a BUG_ON() for a similar situation in access_vm_reg() for example. > > > > Or do we not quite trust hardware not to get this wrong? > > (It feels like the kind of thing that could slip through validation > > and/or would be considered not worth a respin, but it seems wrong to > > work around a theoretical hardware bug before it's confirmed to exist, > > unless we think for some reason that it's really likely.) > > That's the way we handle this for the rest of the accessors. We used to > have a BUG_ON(), but it is pretty silly to kill the whole system for > such a small deviation from the architecture. And maybe it is useless, > but it doesn't hurt either. OK, that makes sense -- I'll follow the precedent here and call write_to_read_only() if this happens. > >>> + > >>> + p->regval = read_id_reg(r, raz); > >>> + return true; > >>> +} > > > > [...] > > > >>> @@ -944,6 +1073,32 @@ static const struct sys_reg_desc sys_reg_descs[] = { > >>> { SYS_DESC(SYS_DBGVCR32_EL2), NULL, reset_val, DBGVCR32_EL2, 0 }, > >>> > >>> { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 }, > >>> + > >>> + /* > >>> + * All non-RAZ feature registers listed here must also be > >>> + * present in arm64_ftr_regs[]. > >>> + */ > >>> + > >>> + /* AArch64 mappings of the AArch32 ID registers */ > >>> + /* ID_AFR0_EL1 not exposed to guests for now */ > >>> + ID(PFR0), ID(PFR1), ID(DFR0),
Re: [PATCH 02/27] arm64: KVM: Hide unsupported AArch64 CPU features from guests
On 16/08/17 21:32, Dave Martin wrote: > On Wed, Aug 16, 2017 at 12:10:38PM +0100, Marc Zyngier wrote: >> On 09/08/17 13:05, Dave Martin wrote: >>> Currently, a guest kernel sees the true CPU feature registers >>> (ID_*_EL1) when it reads them using MRS instructions. This means >>> that the guest will observe features that are present in the >>> hardware but the host doesn't understand or doesn't provide support >>> for. A guest may legimitately try to use such a feature as per the >>> architecture, but use of the feature may trap instead of working >>> normally, triggering undef injection into the guest. >>> >>> This is not a problem for the host, but the guest may go wrong when >>> running on newer hardware than the host knows about. >>> >>> This patch hides from guest VMs any AArch64-specific CPU features >>> that the host doesn't support, by exposing to the guest the >>> sanitised versions of the registers computed by the cpufeatures >>> framework, instead of the true hardware registers. To achieve >>> this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation >>> code is added to KVM to report the sanitised versions of the >>> affected registers in response to MRS and register reads from >>> userspace. >>> >>> The affected registers are removed from invariant_sys_regs[] (since >>> the invariant_sys_regs handling is no longer quite correct for >>> them) and added to sys_reg_desgs[], with appropriate access(), >>> get_user() and set_user() methods. No runtime vcpu storage is >>> allocated for the registers: instead, they are read on demand from >>> the cpufeatures framework. This may need modification in the >>> future if there is a need for userspace to customise the features >>> visible to the guest. >>> >>> Attempts by userspace to write the registers are handled similarly >>> to the current invariant_sys_regs handling: writes are permitted, >>> but only if they don't attempt to change the value. This is >>> sufficient to support VM snapshot/restore from userspace. >>> >>> Because of the additional registers, restoring a VM on an older >>> kernel may not work unless userspace knows how to handle the extra >>> VM registers exposed to the KVM user ABI by this patch. >>> >>> Under the principle of least damage, this patch makes no attempt to >>> handle any of the other registers currently in >>> invariant_sys_regs[], or to emulate registers for AArch32: however, >>> these could be handled in a similar way in future, as necessary. >>> >>> Signed-off-by: Dave Martin>>> --- >>> arch/arm64/kvm/hyp/switch.c | 6 ++ >>> arch/arm64/kvm/sys_regs.c | 224 >>> +++- >>> 2 files changed, 185 insertions(+), 45 deletions(-) >>> > > [...] > >>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c >>> index 2e070d3..6583dd7 100644 >>> --- a/arch/arm64/kvm/sys_regs.c >>> +++ b/arch/arm64/kvm/sys_regs.c >>> @@ -892,6 +892,135 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu, >>> return true; >>> } >>> >>> +/* Read a sanitised cpufeature ID register by sys_reg_desc */ >>> +static u64 read_id_reg(struct sys_reg_desc const *r, bool raz) >>> +{ >>> + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, >>> +(u32)r->CRn, (u32)r->CRm, (u32)r->Op2); >>> + >>> + return raz ? 0 : read_sanitised_ftr_reg(id); >>> +} >>> + >>> +/* cpufeature ID register access trap handlers */ >>> + >>> +static bool __access_id_reg(struct kvm_vcpu *vcpu, >>> + struct sys_reg_params *p, >>> + const struct sys_reg_desc const *r, >>> + bool raz) >>> +{ >>> + if (p->is_write) { >>> + kvm_inject_undefined(vcpu); >>> + return false; >>> + } >> >> I don't think this is supposed to happen (should have UNDEF-ed at EL1). >> You can call write_to_read_only() in that case, which will spit out a >> warning and inject the exception. > > I'll check this -- sounds about right. > > If is should never happen, should I just delete that code or BUG()? I > notice a BUG_ON() for a similar situation in access_vm_reg() for example. > > Or do we not quite trust hardware not to get this wrong? > (It feels like the kind of thing that could slip through validation > and/or would be considered not worth a respin, but it seems wrong to > work around a theoretical hardware bug before it's confirmed to exist, > unless we think for some reason that it's really likely.) That's the way we handle this for the rest of the accessors. We used to have a BUG_ON(), but it is pretty silly to kill the whole system for such a small deviation from the architecture. And maybe it is useless, but it doesn't hurt either. >>> + >>> + p->regval = read_id_reg(r, raz); >>> + return true; >>> +} > > [...] > >>> @@ -944,6 +1073,32 @@ static const struct sys_reg_desc sys_reg_descs[] = { >>> { SYS_DESC(SYS_DBGVCR32_EL2), NULL, reset_val, DBGVCR32_EL2, 0 }, >>> >>> {
Re: [PATCH 02/27] arm64: KVM: Hide unsupported AArch64 CPU features from guests
On Wed, Aug 16, 2017 at 12:10:38PM +0100, Marc Zyngier wrote: > On 09/08/17 13:05, Dave Martin wrote: > > Currently, a guest kernel sees the true CPU feature registers > > (ID_*_EL1) when it reads them using MRS instructions. This means > > that the guest will observe features that are present in the > > hardware but the host doesn't understand or doesn't provide support > > for. A guest may legimitately try to use such a feature as per the > > architecture, but use of the feature may trap instead of working > > normally, triggering undef injection into the guest. > > > > This is not a problem for the host, but the guest may go wrong when > > running on newer hardware than the host knows about. > > > > This patch hides from guest VMs any AArch64-specific CPU features > > that the host doesn't support, by exposing to the guest the > > sanitised versions of the registers computed by the cpufeatures > > framework, instead of the true hardware registers. To achieve > > this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation > > code is added to KVM to report the sanitised versions of the > > affected registers in response to MRS and register reads from > > userspace. > > > > The affected registers are removed from invariant_sys_regs[] (since > > the invariant_sys_regs handling is no longer quite correct for > > them) and added to sys_reg_desgs[], with appropriate access(), > > get_user() and set_user() methods. No runtime vcpu storage is > > allocated for the registers: instead, they are read on demand from > > the cpufeatures framework. This may need modification in the > > future if there is a need for userspace to customise the features > > visible to the guest. > > > > Attempts by userspace to write the registers are handled similarly > > to the current invariant_sys_regs handling: writes are permitted, > > but only if they don't attempt to change the value. This is > > sufficient to support VM snapshot/restore from userspace. > > > > Because of the additional registers, restoring a VM on an older > > kernel may not work unless userspace knows how to handle the extra > > VM registers exposed to the KVM user ABI by this patch. > > > > Under the principle of least damage, this patch makes no attempt to > > handle any of the other registers currently in > > invariant_sys_regs[], or to emulate registers for AArch32: however, > > these could be handled in a similar way in future, as necessary. > > > > Signed-off-by: Dave Martin> > --- > > arch/arm64/kvm/hyp/switch.c | 6 ++ > > arch/arm64/kvm/sys_regs.c | 224 > > +++- > > 2 files changed, 185 insertions(+), 45 deletions(-) > > [...] > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > > index 2e070d3..6583dd7 100644 > > --- a/arch/arm64/kvm/sys_regs.c > > +++ b/arch/arm64/kvm/sys_regs.c > > @@ -892,6 +892,135 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu, > > return true; > > } > > > > +/* Read a sanitised cpufeature ID register by sys_reg_desc */ > > +static u64 read_id_reg(struct sys_reg_desc const *r, bool raz) > > +{ > > + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, > > +(u32)r->CRn, (u32)r->CRm, (u32)r->Op2); > > + > > + return raz ? 0 : read_sanitised_ftr_reg(id); > > +} > > + > > +/* cpufeature ID register access trap handlers */ > > + > > +static bool __access_id_reg(struct kvm_vcpu *vcpu, > > + struct sys_reg_params *p, > > + const struct sys_reg_desc const *r, > > + bool raz) > > +{ > > + if (p->is_write) { > > + kvm_inject_undefined(vcpu); > > + return false; > > + } > > I don't think this is supposed to happen (should have UNDEF-ed at EL1). > You can call write_to_read_only() in that case, which will spit out a > warning and inject the exception. I'll check this -- sounds about right. If is should never happen, should I just delete that code or BUG()? I notice a BUG_ON() for a similar situation in access_vm_reg() for example. Or do we not quite trust hardware not to get this wrong? (It feels like the kind of thing that could slip through validation and/or would be considered not worth a respin, but it seems wrong to work around a theoretical hardware bug before it's confirmed to exist, unless we think for some reason that it's really likely.) > > + > > + p->regval = read_id_reg(r, raz); > > + return true; > > +} [...] > > @@ -944,6 +1073,32 @@ static const struct sys_reg_desc sys_reg_descs[] = { > > { SYS_DESC(SYS_DBGVCR32_EL2), NULL, reset_val, DBGVCR32_EL2, 0 }, > > > > { SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 }, > > + > > + /* > > +* All non-RAZ feature registers listed here must also be > > +* present in arm64_ftr_regs[]. > > +*/ > > + > > + /* AArch64 mappings of the AArch32 ID registers */ > > + /* ID_AFR0_EL1 not exposed to guests for now */ > > +
Re: [PATCH 02/27] arm64: KVM: Hide unsupported AArch64 CPU features from guests
On 09/08/17 13:05, Dave Martin wrote: > Currently, a guest kernel sees the true CPU feature registers > (ID_*_EL1) when it reads them using MRS instructions. This means > that the guest will observe features that are present in the > hardware but the host doesn't understand or doesn't provide support > for. A guest may legimitately try to use such a feature as per the > architecture, but use of the feature may trap instead of working > normally, triggering undef injection into the guest. > > This is not a problem for the host, but the guest may go wrong when > running on newer hardware than the host knows about. > > This patch hides from guest VMs any AArch64-specific CPU features > that the host doesn't support, by exposing to the guest the > sanitised versions of the registers computed by the cpufeatures > framework, instead of the true hardware registers. To achieve > this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation > code is added to KVM to report the sanitised versions of the > affected registers in response to MRS and register reads from > userspace. > > The affected registers are removed from invariant_sys_regs[] (since > the invariant_sys_regs handling is no longer quite correct for > them) and added to sys_reg_desgs[], with appropriate access(), > get_user() and set_user() methods. No runtime vcpu storage is > allocated for the registers: instead, they are read on demand from > the cpufeatures framework. This may need modification in the > future if there is a need for userspace to customise the features > visible to the guest. > > Attempts by userspace to write the registers are handled similarly > to the current invariant_sys_regs handling: writes are permitted, > but only if they don't attempt to change the value. This is > sufficient to support VM snapshot/restore from userspace. > > Because of the additional registers, restoring a VM on an older > kernel may not work unless userspace knows how to handle the extra > VM registers exposed to the KVM user ABI by this patch. > > Under the principle of least damage, this patch makes no attempt to > handle any of the other registers currently in > invariant_sys_regs[], or to emulate registers for AArch32: however, > these could be handled in a similar way in future, as necessary. > > Signed-off-by: Dave Martin> --- > arch/arm64/kvm/hyp/switch.c | 6 ++ > arch/arm64/kvm/sys_regs.c | 224 > +++- > 2 files changed, 185 insertions(+), 45 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c > index 945e79c..35a90b8 100644 > --- a/arch/arm64/kvm/hyp/switch.c > +++ b/arch/arm64/kvm/hyp/switch.c > @@ -81,11 +81,17 @@ static void __hyp_text __activate_traps(struct kvm_vcpu > *vcpu) >* it will cause an exception. >*/ > val = vcpu->arch.hcr_el2; > + > if (!(val & HCR_RW) && system_supports_fpsimd()) { > write_sysreg(1 << 30, fpexc32_el2); > isb(); > } > + > + if (val & HCR_RW) /* for AArch64 only: */ > + val |= HCR_TID3; /* TID3: trap feature register accesses */ > + > write_sysreg(val, hcr_el2); > + > /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */ > write_sysreg(1 << 15, hstr_el2); > /* > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 2e070d3..6583dd7 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -892,6 +892,135 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu, > return true; > } > > +/* Read a sanitised cpufeature ID register by sys_reg_desc */ > +static u64 read_id_reg(struct sys_reg_desc const *r, bool raz) > +{ > + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, > + (u32)r->CRn, (u32)r->CRm, (u32)r->Op2); > + > + return raz ? 0 : read_sanitised_ftr_reg(id); > +} > + > +/* cpufeature ID register access trap handlers */ > + > +static bool __access_id_reg(struct kvm_vcpu *vcpu, > + struct sys_reg_params *p, > + const struct sys_reg_desc const *r, > + bool raz) > +{ > + if (p->is_write) { > + kvm_inject_undefined(vcpu); > + return false; > + } I don't think this is supposed to happen (should have UNDEF-ed at EL1). You can call write_to_read_only() in that case, which will spit out a warning and inject the exception. > + > + p->regval = read_id_reg(r, raz); > + return true; > +} > + > +static bool access_id_reg(struct kvm_vcpu *vcpu, > + struct sys_reg_params *p, > + const struct sys_reg_desc *r) > +{ > + return __access_id_reg(vcpu, p, r, false); > +} > + > +static bool access_raz_id_reg(struct kvm_vcpu *vcpu, > + struct sys_reg_params *p, > + const struct sys_reg_desc *r) > +{ > + return
[PATCH 02/27] arm64: KVM: Hide unsupported AArch64 CPU features from guests
Currently, a guest kernel sees the true CPU feature registers (ID_*_EL1) when it reads them using MRS instructions. This means that the guest will observe features that are present in the hardware but the host doesn't understand or doesn't provide support for. A guest may legimitately try to use such a feature as per the architecture, but use of the feature may trap instead of working normally, triggering undef injection into the guest. This is not a problem for the host, but the guest may go wrong when running on newer hardware than the host knows about. This patch hides from guest VMs any AArch64-specific CPU features that the host doesn't support, by exposing to the guest the sanitised versions of the registers computed by the cpufeatures framework, instead of the true hardware registers. To achieve this, HCR_EL2.TID3 is now set for AArch64 guests, and emulation code is added to KVM to report the sanitised versions of the affected registers in response to MRS and register reads from userspace. The affected registers are removed from invariant_sys_regs[] (since the invariant_sys_regs handling is no longer quite correct for them) and added to sys_reg_desgs[], with appropriate access(), get_user() and set_user() methods. No runtime vcpu storage is allocated for the registers: instead, they are read on demand from the cpufeatures framework. This may need modification in the future if there is a need for userspace to customise the features visible to the guest. Attempts by userspace to write the registers are handled similarly to the current invariant_sys_regs handling: writes are permitted, but only if they don't attempt to change the value. This is sufficient to support VM snapshot/restore from userspace. Because of the additional registers, restoring a VM on an older kernel may not work unless userspace knows how to handle the extra VM registers exposed to the KVM user ABI by this patch. Under the principle of least damage, this patch makes no attempt to handle any of the other registers currently in invariant_sys_regs[], or to emulate registers for AArch32: however, these could be handled in a similar way in future, as necessary. Signed-off-by: Dave Martin--- arch/arm64/kvm/hyp/switch.c | 6 ++ arch/arm64/kvm/sys_regs.c | 224 +++- 2 files changed, 185 insertions(+), 45 deletions(-) diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 945e79c..35a90b8 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -81,11 +81,17 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu) * it will cause an exception. */ val = vcpu->arch.hcr_el2; + if (!(val & HCR_RW) && system_supports_fpsimd()) { write_sysreg(1 << 30, fpexc32_el2); isb(); } + + if (val & HCR_RW) /* for AArch64 only: */ + val |= HCR_TID3; /* TID3: trap feature register accesses */ + write_sysreg(val, hcr_el2); + /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */ write_sysreg(1 << 15, hstr_el2); /* diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 2e070d3..6583dd7 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -892,6 +892,135 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu, return true; } +/* Read a sanitised cpufeature ID register by sys_reg_desc */ +static u64 read_id_reg(struct sys_reg_desc const *r, bool raz) +{ + u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, +(u32)r->CRn, (u32)r->CRm, (u32)r->Op2); + + return raz ? 0 : read_sanitised_ftr_reg(id); +} + +/* cpufeature ID register access trap handlers */ + +static bool __access_id_reg(struct kvm_vcpu *vcpu, + struct sys_reg_params *p, + const struct sys_reg_desc const *r, + bool raz) +{ + if (p->is_write) { + kvm_inject_undefined(vcpu); + return false; + } + + p->regval = read_id_reg(r, raz); + return true; +} + +static bool access_id_reg(struct kvm_vcpu *vcpu, + struct sys_reg_params *p, + const struct sys_reg_desc *r) +{ + return __access_id_reg(vcpu, p, r, false); +} + +static bool access_raz_id_reg(struct kvm_vcpu *vcpu, + struct sys_reg_params *p, + const struct sys_reg_desc *r) +{ + return __access_id_reg(vcpu, p, r, true); +} + +static int reg_from_user(u64 *val, const void __user *uaddr, u64 id); +static int reg_to_user(void __user *uaddr, const u64 *val, u64 id); +static u64 sys_reg_to_index(const struct sys_reg_desc *reg); + +/* + * cpufeature ID register user accessors + * + * For now, these registers are immutable for userspace, so no values + * are stored, and for set_id_reg() we