Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2022-12-24 Thread Ard Biesheuvel
On Sat, 24 Dec 2022 at 13:19, Marc Zyngier  wrote:
>
> On Thu, 22 Dec 2022 13:01:55 +0000,
> Ard Biesheuvel  wrote:
> >
> > On Tue, 20 Dec 2022 at 21:09, Marc Zyngier  wrote:
> > >
> > > A recent development on the EFI front has resulted in guests having
> > > their page tables baked in the firmware binary, and mapped into
> > > the IPA space as part as a read-only memslot.
> > >
> > > Not only this is legitimate, but it also results in added security,
> > > so thumbs up. However, this clashes mildly with our handling of a S1PTW
> > > as a write to correctly handle AF/DB updates to the S1 PTs, and results
> > > in the guest taking an abort it won't recover from (the PTs mapping the
> > > vectors will suffer freom the same problem...).
> > >
> > > So clearly our handling is... wrong.
> > >
> > > Instead, switch to a two-pronged approach:
> > >
> > > - On S1PTW translation fault, handle the fault as a read
> > >
> > > - On S1PTW permission fault, handle the fault as a write
> > >
> > > This is of no consequence to SW that *writes* to its PTs (the write
> > > will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> > > use AF/DB anyway, as that'd be wrong.
> > >
> > > Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> > > fault on S1PTW permission fault on instruction fetch") do we end-up
> > > with two back-to-back faults (page being evicted and faulted back).
> > > I don't think this is a case worth optimising for.
> > >
> > > Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission 
> > > fault on instruction fetch")
> > > Signed-off-by: Marc Zyngier 
> > > Cc: sta...@vger.kernel.org
> >
> > Reviewed-by: Ard Biesheuvel 
> >
> > I have tested this patch on my TX2 with one of the EFI builds in
> > question, and everything works as before (I never observed the issue
> > itself)
>
> If you get the chance, could you try with non-4kB page sizes? Here, I
> could only reproduce it with 16kB pages. It was firing like clockwork
> on Cortex-A55 with that.
>

I'll try on 64k but I don't have access to a 16k capable machine that
runs KVM atm (I'm still enjoying working wifi and GPU etc on my M1
Macbook Air)

> >
> > Regression-tested-by: Ard Biesheuvel 
> >
> > For the record, the EFI build in question targets QEMU/mach-virt and
> > switches to a set of read-only page tables in emulated NOR flash
> > straight out of reset, so it can create and populate the real page
> > tables with MMU and caches enabled. EFI does not use virtual memory or
> > paging so managing access flags or dirty bits in hardware is unlikely
> > to add any value, and it is not being used at the moment. And given
> > that this is emulated NOR flash, any ordinary write to it tears down
> > the r/o memslot altogether, and kicks the NOR flash emulation in QEMU
> > into programming mode, which is fully based on MMIO emulation and does
> > not use a memslot at all. IOW, even if we could figure out what store
> > the PTW was attempting to do, it is always going to be rejected since
> > the r/o page tables can only be modified by 'programming' the NOR
> > flash sector.
>
> Indeed, and this would be a pretty dodgy setup anyway.
>
> Thanks for having had a look,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

2022-12-22 Thread Ard Biesheuvel
On Tue, 20 Dec 2022 at 21:09, Marc Zyngier  wrote:
>
> A recent development on the EFI front has resulted in guests having
> their page tables baked in the firmware binary, and mapped into
> the IPA space as part as a read-only memslot.
>
> Not only this is legitimate, but it also results in added security,
> so thumbs up. However, this clashes mildly with our handling of a S1PTW
> as a write to correctly handle AF/DB updates to the S1 PTs, and results
> in the guest taking an abort it won't recover from (the PTs mapping the
> vectors will suffer freom the same problem...).
>
> So clearly our handling is... wrong.
>
> Instead, switch to a two-pronged approach:
>
> - On S1PTW translation fault, handle the fault as a read
>
> - On S1PTW permission fault, handle the fault as a write
>
> This is of no consequence to SW that *writes* to its PTs (the write
> will trigger a non-S1PTW fault), and SW that uses RO PTs will not
> use AF/DB anyway, as that'd be wrong.
>
> Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
> fault on S1PTW permission fault on instruction fetch") do we end-up
> with two back-to-back faults (page being evicted and faulted back).
> I don't think this is a case worth optimising for.
>
> Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission 
> fault on instruction fetch")
> Signed-off-by: Marc Zyngier 
> Cc: sta...@vger.kernel.org

Reviewed-by: Ard Biesheuvel 

I have tested this patch on my TX2 with one of the EFI builds in
question, and everything works as before (I never observed the issue
itself)

Regression-tested-by: Ard Biesheuvel 

For the record, the EFI build in question targets QEMU/mach-virt and
switches to a set of read-only page tables in emulated NOR flash
straight out of reset, so it can create and populate the real page
tables with MMU and caches enabled. EFI does not use virtual memory or
paging so managing access flags or dirty bits in hardware is unlikely
to add any value, and it is not being used at the moment. And given
that this is emulated NOR flash, any ordinary write to it tears down
the r/o memslot altogether, and kicks the NOR flash emulation in QEMU
into programming mode, which is fully based on MMIO emulation and does
not use a memslot at all. IOW, even if we could figure out what store
the PTW was attempting to do, it is always going to be rejected since
the r/o page tables can only be modified by 'programming' the NOR
flash sector.


> ---
>  arch/arm64/include/asm/kvm_emulate.h | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index 9bdba47f7e14..fd6ad8b21f85 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
> kvm_vcpu *vcpu)
>
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
> -   if (kvm_vcpu_abt_iss1tw(vcpu))
> -   return true;
> +   if (kvm_vcpu_abt_iss1tw(vcpu)) {
> +   /*
> +* Only a permission fault on a S1PTW should be
> +* considered as a write. Otherwise, page tables baked
> +* in a read-only memslot will result in an exception
> +* being delivered in the guest.
> +*
> +* The drawback is that we end-up fauling twice if the
> +* guest is using any of HW AF/DB: a translation fault
> +* to map the page containing the PT (read only at
> +* first), then a permission fault to allow the flags
> +* to be set.
> +*/
> +   switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> +   case ESR_ELx_FSC_PERM:
> +   return true;
> +   default:
> +   return false;
> +   }
> +   }
>
> if (kvm_vcpu_trap_is_iabt(vcpu))
> return false;
> --
> 2.34.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] arm64: kvm: avoid unnecessary absolute addressing via literals

2022-04-28 Thread Ard Biesheuvel
There are a few cases in the nVHE code where we take the absolute
address of a symbol via a literal pool entry, and subsequently translate
it to another address space (PA, kimg VA, kernel linear VA, etc).
Originally, this literal was needed because we relied on a different
translation for absolute references, but this is no longer the case, so
we can simply use relative addressing instead. This removes a couple of
RELA entries pointing into the .text segment.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/kvm/hyp/nvhe/host.S | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..366551594417 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -80,7 +80,7 @@ SYM_FUNC_START(__hyp_do_panic)
mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
  PSR_MODE_EL1h)
msr spsr_el2, lr
-   ldr lr, =nvhe_hyp_panic_handler
+   adr_l   lr, nvhe_hyp_panic_handler
hyp_kimg_va lr, x6
msr elr_el2, lr
 
@@ -125,13 +125,11 @@ alternative_else_nop_endif
add sp, sp, #16
/*
 * Compute the idmap address of __kvm_handle_stub_hvc and
-* jump there. Since we use kimage_voffset, do not use the
-* HYP VA for __kvm_handle_stub_hvc, but the kernel VA instead
-* (by loading it from the constant pool).
+* jump there.
 *
 * Preserve x0-x4, which may contain stub parameters.
 */
-   ldr x5, =__kvm_handle_stub_hvc
+   adr_l   x5, __kvm_handle_stub_hvc
hyp_pa  x5, x6
br  x5
 SYM_FUNC_END(__host_hvc)
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kbuild-all] Re: [PATCH v2 6/9] KVM: arm64: Detect and handle hypervisor stack overflows

2022-02-25 Thread Ard Biesheuvel
On Fri, 25 Feb 2022 at 03:12, Chen, Rong A  wrote:
>
>
>

> Hi Marc, Ard,
>
> We have ignored the warning related to asmlinkage according to the below
> advice:
>
> https://lore.kernel.org/lkml/CAMj1kXHrRYagSVniSetHdG15rkQS+fm4zVOtN=zda3w0qae...@mail.gmail.com/
>

Excellent! Thanks for implementing this - I wasn't aware that you
adopted this suggestion.

> do you want the bot ignore such warning if asmlinkage not specified?
>

Even though I think this warning has little value, I think asmlinkage
is sufficient for us to avoid it for symbols that are exported for use
by assembler code.

So I don't think this additional change is needed.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [kbuild-all] Re: [PATCH v2 6/9] KVM: arm64: Detect and handle hypervisor stack overflows

2022-02-23 Thread Ard Biesheuvel
On Wed, 23 Feb 2022 at 13:54, Marc Zyngier  wrote:
>
> On 2022-02-23 12:34, Philip Li wrote:
> > On Wed, Feb 23, 2022 at 09:16:59AM +, Marc Zyngier wrote:
> >> On Wed, 23 Feb 2022 09:05:18 +,
> >> kernel test robot  wrote:
> >> >
> >> > Hi Kalesh,
> >> >
> >> > Thank you for the patch! Perhaps something to improve:
> >> >
> >> > [auto build test WARNING on cfb92440ee71adcc2105b0890bb01ac3cddb8507]
> >> >
> >> > url:
> >> > https://github.com/0day-ci/linux/commits/Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
> >> > base:   cfb92440ee71adcc2105b0890bb01ac3cddb8507
> >> > config: arm64-randconfig-r011-20220221 
> >> > (https://download.01.org/0day-ci/archive/20220223/202202231727.l621fvgd-...@intel.com/config)
> >> > compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
> >> > d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
> >> > reproduce (this is a W=1 build):
> >> > wget 
> >> > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross 
> >> > -O ~/bin/make.cross
> >> > chmod +x ~/bin/make.cross
> >> > # install arm64 cross compiling tool for clang build
> >> > # apt-get install binutils-aarch64-linux-gnu
> >> > # 
> >> > https://github.com/0day-ci/linux/commit/7fe99fd40f7c4b2973218045ca5b9c9160524db1
> >> > git remote add linux-review https://github.com/0day-ci/linux
> >> > git fetch --no-tags linux-review 
> >> > Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
> >> > git checkout 7fe99fd40f7c4b2973218045ca5b9c9160524db1
> >> > # save the config file to linux build tree
> >> > mkdir build_dir
> >> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
> >> > O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/
> >> >
> >> > If you fix the issue, kindly add following tag as appropriate
> >> > Reported-by: kernel test robot 
> >> >
> >> > All warnings (new ones prefixed by >>):
> >> >
> >> >include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
> >> >#define NULL ((void *)0)
> >> > ^~~
> >> >arch/arm64/kvm/hyp/nvhe/switch.c:200:27: warning: initializer 
> >> > overrides prior initialization of this subobject 
> >> > [-Winitializer-overrides]
> >> >[ESR_ELx_EC_FP_ASIMD]   = kvm_hyp_handle_fpsimd,
> >> >  ^
> >> >arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous 
> >> > initialization is here
> >> >[0 ... ESR_ELx_EC_MAX]  = NULL,
> >> >  ^~~~
> >> >include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
> >> >#define NULL ((void *)0)
> >> > ^~~
> >>
> >> Kalesh, please ignore this nonsense. There may be things to improve,
> >> but this is *NOT* one of them.
> >>
> >> These reports are pretty useless, and just lead people to ignore real
> >> bug reports.
> >
> > Hi Kalesh, sorry there're some irrelevant issues mixed in the report,
> > kindly ignore them. And the valuable ones are the new ones that
> > prefixed by >>, as the below one in original report.
> >
> >>> arch/arm64/kvm/hyp/nvhe/switch.c:372:17: warning: no previous
> >>> prototype for function 'hyp_panic_bad_stack' [-Wmissing-prototypes]
> >void __noreturn hyp_panic_bad_stack(void)
> >^
>
> This is only called from assembly code, so a prototype wouldn't bring
> much.
>

Should probably be marked as 'asmlinkage' then. I've suggested many
times already that this bogus diagnostic should either be disabled, or
disregard 'asmlinkage' symbols.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH][kvmtool] virtio/pci: Signal INTx interrupts as level instead of edge

2022-01-31 Thread Ard Biesheuvel
On Mon, 31 Jan 2022 at 17:03, Marc Zyngier  wrote:
>
> It appears that the way INTx is emulated is "slightly" out of spec
> in kvmtool. We happily inject an edge interrupt, even if the spec
> mandates a level.
>
> This doesn't change much for either the guest or userspace (only
> KVM will have a bit more work tracking the EOI), but at least
> this is correct.
>
> Reported-by: Pierre Gondois 
> Signed-off-by: Marc Zyngier 
> Cc: Ard Biesheuvel 
> Cc: Sami Mujawar 
> Cc: Will Deacon 

Acked-by: Ard Biesheuvel 

> ---
>  pci.c| 2 +-
>  virtio/pci.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/pci.c b/pci.c
> index e5930331..a769ae27 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -61,7 +61,7 @@ int pci__assign_irq(struct pci_device_header *pci_hdr)
> pci_hdr->irq_line   = irq__alloc_line();
>
> if (!pci_hdr->irq_type)
> -   pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
> +   pci_hdr->irq_type = IRQ_TYPE_LEVEL_HIGH;
>
> return pci_hdr->irq_line;
>  }
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 41085291..2777d1c8 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -413,7 +413,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct 
> virtio_device *vdev, u32 vq)
> kvm__irq_trigger(kvm, vpci->gsis[vq]);
> } else {
> vpci->isr = VIRTIO_IRQ_HIGH;
> -   kvm__irq_trigger(kvm, vpci->legacy_irq_line);
> +   kvm__irq_line(kvm, vpci->legacy_irq_line, VIRTIO_IRQ_HIGH);
> }
> return 0;
>  }
> --
> 2.34.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 12/12] arm64: hugetlb: use set_pte_at() not set_pte() to provide mm pointer

2022-01-26 Thread Ard Biesheuvel
Switch to set_pte_at() so we can provide the mm pointer to the code that
performs the page table update.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index ffb9c229610a..099b28b00f4c 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -252,8 +252,8 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned 
long addr,
 
ncontig = num_contig_ptes(sz, );
 
-   for (i = 0; i < ncontig; i++, ptep++)
-   set_pte(ptep, pte);
+   for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
+   set_pte_at(mm, addr, ptep, pte);
 }
 
 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 11/12] arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer

2022-01-26 Thread Ard Biesheuvel
The set_pte() helper does not carry the struct mm pointer, which makes
it difficult for the implementation to reason about the context in which
the set_pte() call is taking place. So switch to set_pte_at() instead.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/kernel/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index e1be6c429810..e3e50adfae18 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -92,7 +92,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long 
addr, void *data)
pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
if (md->attribute & EFI_MEMORY_XP)
pte = set_pte_bit(pte, __pgprot(PTE_PXN));
-   set_pte(ptep, pte);
+   set_pte_at(_mm, addr, ptep, pte);
return 0;
 }
 
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 10/12] mm: add default definition of p4d_index()

2022-01-26 Thread Ard Biesheuvel
Implement a default version of p4d_index() similar to how pud/pmd_index
are defined.

Signed-off-by: Ard Biesheuvel 
---
 include/linux/pgtable.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index bc8713a76e03..e8aacf6ea207 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -79,6 +79,14 @@ static inline unsigned long pud_index(unsigned long address)
 #define pud_index pud_index
 #endif
 
+#ifndef p4d_index
+static inline unsigned long p4d_index(unsigned long address)
+{
+   return (address >> P4D_SHIFT) & (PTRS_PER_P4D - 1);
+}
+#define p4d_index p4d_index
+#endif
+
 #ifndef pgd_index
 /* Must be a compile-time constant, so implement it as a macro */
 #define pgd_index(a)  (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 09/12] arm64: mm: remap kernel page tables read-only at end of init

2022-01-26 Thread Ard Biesheuvel
Now that all the handling is in place to deal with read-only page tables
at runtime, do a pass over the kernel page tables at boot to remap all
the page table pages read-only that were allocated early.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/mm/mmu.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 971501535757..b1212f6d48f2 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -559,8 +559,23 @@ static void __init map_mem(pgd_t *pgdp)
memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
 }
 
+static void mark_pgtables_ro(const pmd_t *pmdp, int level, int num_entries)
+{
+   while (num_entries--) {
+   if (pmd_valid(*pmdp) && pmd_table(*pmdp)) {
+   pmd_t *next = __va(__pmd_to_phys(*pmdp));
+
+   if (level < 2)
+   mark_pgtables_ro(next, level + 1, PTRS_PER_PMD);
+   set_pgtable_ro(next);
+   }
+   pmdp++;
+   }
+}
+
 void mark_rodata_ro(void)
 {
+   int pgd_level = 4 - CONFIG_PGTABLE_LEVELS;
unsigned long section_size;
 
/*
@@ -571,6 +586,11 @@ void mark_rodata_ro(void)
update_mapping_prot(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
section_size, PAGE_KERNEL_RO);
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+   mark_pgtables_ro((pmd_t *)_pg_dir, pgd_level, PTRS_PER_PGD);
+#endif
+   mark_pgtables_ro((pmd_t *)_pg_dir, pgd_level, PTRS_PER_PGD);
+
debug_checkwx();
 }
 
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 08/12] arm64: mm: remap kernel PTE level page tables r/o in the linear region

2022-01-26 Thread Ard Biesheuvel
Now that all kernel page table manipulations are routed through the
fixmap API if r/o page tables are enabled, we can remove write access
from the linear mapping of those pages.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/pgalloc.h |  6 +
 arch/arm64/mm/mmu.c  | 24 +++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 18a5bb0c9ee4..073482634e74 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -20,6 +20,9 @@
 #define __HAVE_ARCH_PMD_FREE
 #define __HAVE_ARCH_PTE_ALLOC_ONE
 #define __HAVE_ARCH_PTE_FREE
+#define __HAVE_ARCH_PTE_ALLOC_ONE_KERNEL
+#define __HAVE_ARCH_PTE_FREE_KERNEL
+
 #include 
 
 #define PGD_SIZE   (PTRS_PER_PGD * sizeof(pgd_t))
@@ -27,6 +30,9 @@
 pgtable_t pte_alloc_one(struct mm_struct *mm);
 void pte_free(struct mm_struct *mm, struct page *pte_page);
 
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm);
+void pte_free_kernel(struct mm_struct *mm, pte_t *pte);
+
 #if CONFIG_PGTABLE_LEVELS > 2
 
 pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 949846654797..971501535757 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1402,7 +1402,7 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
table = pte_offset_kernel(pmdp, addr);
pmd_clear(pmdp);
__flush_tlb_kernel_pgtable(addr);
-   pte_free_kernel(NULL, table);
+   pte_free_kernel(_mm, table);
return 1;
 }
 
@@ -1709,3 +1709,25 @@ void pte_free(struct mm_struct *mm, struct page 
*pte_page)
pgtable_pte_page_dtor(pte_page);
__free_page(pte_page);
 }
+
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm)
+{
+   pte_t *pte = __pte_alloc_one_kernel(mm);
+
+   VM_BUG_ON(mm != _mm);
+
+   if (!pte)
+   return NULL;
+   if (page_tables_are_ro())
+   set_pgtable_ro(pte);
+   return pte;
+}
+
+void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
+{
+   VM_BUG_ON(mm != _mm);
+
+   if (page_tables_are_ro())
+   set_pgtable_rw(pte);
+   free_page((u64)pte);
+}
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 07/12] arm64: mm: remap PTE level user page tables r/o in the linear region

2022-01-26 Thread Ard Biesheuvel
Now that all PTE manipulations for user space tables go via the fixmap,
we can remap these tables read-only in the linear region so they cannot
be corrupted inadvertently.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/pgalloc.h |  5 +
 arch/arm64/include/asm/tlb.h |  2 ++
 arch/arm64/mm/mmu.c  | 23 
 3 files changed, 30 insertions(+)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 63f9ae9e96fe..18a5bb0c9ee4 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -18,10 +18,15 @@
 #define __HAVE_ARCH_PUD_FREE
 #define __HAVE_ARCH_PMD_ALLOC_ONE
 #define __HAVE_ARCH_PMD_FREE
+#define __HAVE_ARCH_PTE_ALLOC_ONE
+#define __HAVE_ARCH_PTE_FREE
 #include 
 
 #define PGD_SIZE   (PTRS_PER_PGD * sizeof(pgd_t))
 
+pgtable_t pte_alloc_one(struct mm_struct *mm);
+void pte_free(struct mm_struct *mm, struct page *pte_page);
+
 #if CONFIG_PGTABLE_LEVELS > 2
 
 pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr);
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 0f54fbb59bba..e69a44160cce 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -75,6 +75,8 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte,
  unsigned long addr)
 {
+   if (page_tables_are_ro())
+   set_pgtable_rw(page_address(pte));
pgtable_pte_page_dtor(pte);
tlb_remove_table(tlb, pte);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e55d91a5f1ed..949846654797 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1686,3 +1686,26 @@ void pmd_free(struct mm_struct *mm, pmd_t *pmd)
free_page((u64)pmd);
 }
 #endif
+
+pgtable_t pte_alloc_one(struct mm_struct *mm)
+{
+   pgtable_t pgt = __pte_alloc_one(mm, GFP_PGTABLE_USER);
+
+   VM_BUG_ON(mm == _mm);
+
+   if (!pgt)
+   return NULL;
+   if (page_tables_are_ro())
+   set_pgtable_ro(page_address(pgt));
+   return pgt;
+}
+
+void pte_free(struct mm_struct *mm, struct page *pte_page)
+{
+   VM_BUG_ON(mm == _mm);
+
+   if (page_tables_are_ro())
+   set_pgtable_rw(page_address(pte_page));
+   pgtable_pte_page_dtor(pte_page);
+   __free_page(pte_page);
+}
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 06/12] arm64: mm: remap PMD pages r/o in linear region

2022-01-26 Thread Ard Biesheuvel
PMD modifications all go through the fixmap update routine, so there is
no longer a need to keep it mapped read/write in the linear region.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/pgalloc.h |  5 +
 arch/arm64/include/asm/tlb.h |  2 ++
 arch/arm64/mm/mmu.c  | 21 
 3 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 737e9f32b199..63f9ae9e96fe 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -16,12 +16,17 @@
 #define __HAVE_ARCH_PGD_FREE
 #define __HAVE_ARCH_PUD_ALLOC_ONE
 #define __HAVE_ARCH_PUD_FREE
+#define __HAVE_ARCH_PMD_ALLOC_ONE
+#define __HAVE_ARCH_PMD_FREE
 #include 
 
 #define PGD_SIZE   (PTRS_PER_PGD * sizeof(pgd_t))
 
 #if CONFIG_PGTABLE_LEVELS > 2
 
+pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr);
+void pmd_free(struct mm_struct *mm, pmd_t *pmd);
+
 static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
 {
set_pud(pudp, __pud(__phys_to_pud_val(pmdp) | prot));
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 6557626752fc..0f54fbb59bba 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -85,6 +85,8 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 {
struct page *page = virt_to_page(pmdp);
 
+   if (page_tables_are_ro())
+   set_pgtable_rw(pmdp);
pgtable_pmd_page_dtor(page);
tlb_remove_table(tlb, page);
 }
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 03d77c4c3570..e55d91a5f1ed 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1665,3 +1665,24 @@ void pud_free(struct mm_struct *mm, pud_t *pud)
free_page((u64)pud);
 }
 #endif
+
+#ifndef __PAGETABLE_PMD_FOLDED
+pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   pmd_t *pmd = __pmd_alloc_one(mm, addr);
+
+   if (!pmd)
+   return NULL;
+   if (page_tables_are_ro())
+   set_pgtable_ro(pmd);
+   return pmd;
+}
+
+void pmd_free(struct mm_struct *mm, pmd_t *pmd)
+{
+   if (page_tables_are_ro())
+   set_pgtable_rw(pmd);
+   pgtable_pmd_page_dtor(virt_to_page(pmd));
+   free_page((u64)pmd);
+}
+#endif
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 05/12] arm64: mm: remap PUD pages r/o in linear region

2022-01-26 Thread Ard Biesheuvel
Implement the arch specific PUD alloc/free helpers by wrapping the
generic code, and remapping the page read-only on allocation and
read-write on free.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/pgalloc.h |  5 +
 arch/arm64/include/asm/tlb.h |  2 ++
 arch/arm64/mm/mmu.c  | 20 
 3 files changed, 27 insertions(+)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index d54ac9f8d6c7..737e9f32b199 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -14,6 +14,8 @@
 #include 
 
 #define __HAVE_ARCH_PGD_FREE
+#define __HAVE_ARCH_PUD_ALLOC_ONE
+#define __HAVE_ARCH_PUD_FREE
 #include 
 
 #define PGD_SIZE   (PTRS_PER_PGD * sizeof(pgd_t))
@@ -45,6 +47,9 @@ static inline void __pud_populate(pud_t *pudp, phys_addr_t 
pmdp, pudval_t prot)
 
 #if CONFIG_PGTABLE_LEVELS > 3
 
+pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr);
+void pud_free(struct mm_struct *mm, pud_t *pud);
+
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
 {
set_p4d(p4dp, __p4d(__phys_to_p4d_val(pudp) | prot));
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index c995d1f4594f..6557626752fc 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -94,6 +94,8 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, 
pmd_t *pmdp,
 static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pudp,
  unsigned long addr)
 {
+   if (page_tables_are_ro())
+   set_pgtable_rw(pudp);
tlb_remove_table(tlb, virt_to_page(pudp));
 }
 #endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a52c3162beae..03d77c4c3570 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1645,3 +1645,23 @@ static int __init prevent_bootmem_remove_init(void)
 }
 early_initcall(prevent_bootmem_remove_init);
 #endif
+
+#ifndef __PAGETABLE_PUD_FOLDED
+pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   pud_t *pud = __pud_alloc_one(mm, addr);
+
+   if (!pud)
+   return NULL;
+   if (page_tables_are_ro())
+   set_pgtable_ro(pud);
+   return pud;
+}
+
+void pud_free(struct mm_struct *mm, pud_t *pud)
+{
+   if (page_tables_are_ro())
+   set_pgtable_rw(pud);
+   free_page((u64)pud);
+}
+#endif
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 04/12] arm64: mm: remap PGD pages r/o in the linear region after allocation

2022-01-26 Thread Ard Biesheuvel
As the first step in restricting write access to all page tables via the
linear mapping, remap the page at the root PGD level of a user space
page table hierarchy read-only after allocation, so that it can only be
manipulated using the dedicated fixmap based API.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/mm/mmu.c |  7 --
 arch/arm64/mm/pgd.c | 25 ++--
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index acfae9b41cc8..a52c3162beae 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -394,8 +394,11 @@ static phys_addr_t __pgd_pgtable_alloc(int shift)
void *ptr = (void *)__get_free_page(GFP_PGTABLE_KERNEL);
BUG_ON(!ptr);
 
-   /* Ensure the zeroed page is visible to the page table walker */
-   dsb(ishst);
+   if (page_tables_are_ro())
+   set_pgtable_ro(ptr);
+   else
+   /* Ensure the zeroed page is visible to the page table walker */
+   dsb(ishst);
return __pa(ptr);
 }
 
diff --git a/arch/arm64/mm/pgd.c b/arch/arm64/mm/pgd.c
index 4a64089e5771..637d6eceeada 100644
--- a/arch/arm64/mm/pgd.c
+++ b/arch/arm64/mm/pgd.c
@@ -9,8 +9,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -20,24 +22,33 @@ static struct kmem_cache *pgd_cache __ro_after_init;
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
gfp_t gfp = GFP_PGTABLE_USER;
+   pgd_t *pgd;
 
-   if (PGD_SIZE == PAGE_SIZE)
-   return (pgd_t *)__get_free_page(gfp);
-   else
+   if (PGD_SIZE < PAGE_SIZE && !page_tables_are_ro())
return kmem_cache_alloc(pgd_cache, gfp);
+
+   pgd = (pgd_t *)__get_free_page(gfp);
+   if (!pgd)
+   return NULL;
+   if (page_tables_are_ro())
+   set_pgtable_ro(pgd);
+   return pgd;
 }
 
 void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 {
-   if (PGD_SIZE == PAGE_SIZE)
-   free_page((unsigned long)pgd);
-   else
+   if (PGD_SIZE < PAGE_SIZE && !page_tables_are_ro()) {
kmem_cache_free(pgd_cache, pgd);
+   } else {
+   if (page_tables_are_ro())
+   set_pgtable_rw(pgd);
+   free_page((unsigned long)pgd);
+   }
 }
 
 void __init pgtable_cache_init(void)
 {
-   if (PGD_SIZE == PAGE_SIZE)
+   if (PGD_SIZE == PAGE_SIZE || page_tables_are_ro())
return;
 
 #ifdef CONFIG_ARM64_PA_BITS_52
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 03/12] arm64: mm: use a fixmap slot for user page table modifications

2022-01-26 Thread Ard Biesheuvel
To prepare for user and kernel page tables being remapped read-only in
the linear region, define a new fixmap slot and use it to apply all page
table descriptor updates that target page tables other than swapper.

Fortunately for us, the fixmap descriptors themselves are always
manipulated via their kernel mapping in .bss, so there is no special
exception required to avoid circular logic here.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/Kconfig   |  11 +++
 arch/arm64/include/asm/fixmap.h  |   1 +
 arch/arm64/include/asm/pgalloc.h |  28 +-
 arch/arm64/include/asm/pgtable.h |  79 +---
 arch/arm64/mm/Makefile   |   2 +
 arch/arm64/mm/fault.c|   8 +-
 arch/arm64/mm/ro_page_tables.c   | 100 
 7 files changed, 209 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6978140edfa4..a3e98286b074 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1311,6 +1311,17 @@ config RODATA_FULL_DEFAULT_ENABLED
  This requires the linear region to be mapped down to pages,
  which may adversely affect performance in some cases.
 
+config ARM64_RO_PAGE_TABLES
+   bool "Remap page tables read-only in the kernel VA space"
+   select RODATA_FULL_DEFAULT_ENABLED
+   help
+ Remap linear mappings of page table pages read-only as long as they
+ are being used as such, and use a fixmap API to manipulate all page
+ table descriptors, instead of manipulating them directly via their
+ writable mappings in the direct map. This is intended as a debug
+ and/or hardening feature, as it removes the ability for stray writes
+ to be exploited to bypass permission restrictions.
+
 config ARM64_SW_TTBR0_PAN
bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
help
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 4335800201c9..71dfbe0452bb 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -50,6 +50,7 @@ enum fixed_addresses {
 
FIX_EARLYCON_MEM_BASE,
FIX_TEXT_POKE0,
+   FIX_TEXT_POKE_PTE,
 
 #ifdef CONFIG_ACPI_APEI_GHES
/* Used for GHES mapping from assorted contexts */
diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index 237224484d0f..d54ac9f8d6c7 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -30,7 +30,11 @@ static inline void pud_populate(struct mm_struct *mm, pud_t 
*pudp, pmd_t *pmdp)
pudval_t pudval = PUD_TYPE_TABLE;
 
pudval |= (mm == _mm) ? PUD_TABLE_UXN : PUD_TABLE_PXN;
-   __pud_populate(pudp, __pa(pmdp), pudval);
+   if (page_tables_are_ro())
+   xchg_ro_pte(mm, (pte_t *)pudp,
+   __pte(__phys_to_pud_val(__pa(pmdp) | pudval)));
+   else
+   __pud_populate(pudp, __pa(pmdp), pudval);
 }
 #else
 static inline void __pud_populate(pud_t *pudp, phys_addr_t pmdp, pudval_t prot)
@@ -51,7 +55,11 @@ static inline void p4d_populate(struct mm_struct *mm, p4d_t 
*p4dp, pud_t *pudp)
p4dval_t p4dval = P4D_TYPE_TABLE;
 
p4dval |= (mm == _mm) ? P4D_TABLE_UXN : P4D_TABLE_PXN;
-   __p4d_populate(p4dp, __pa(pudp), p4dval);
+   if (page_tables_are_ro())
+   xchg_ro_pte(mm, (pte_t *)p4dp,
+   __pte(__phys_to_p4d_val(__pa(pudp) | p4dval)));
+   else
+   __p4d_populate(p4dp, __pa(pudp), p4dval);
 }
 #else
 static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
@@ -76,15 +84,27 @@ static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t 
ptep,
 static inline void
 pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
 {
+   pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN;
+
VM_BUG_ON(mm && mm != _mm);
-   __pmd_populate(pmdp, __pa(ptep), PMD_TYPE_TABLE | PMD_TABLE_UXN);
+   if (page_tables_are_ro())
+   xchg_ro_pte(mm, (pte_t *)pmdp,
+   __pte(__phys_to_pmd_val(__pa(ptep) | pmdval)));
+   else
+   __pmd_populate(pmdp, __pa(ptep), pmdval);
 }
 
 static inline void
 pmd_populate(struct mm_struct *mm, pmd_t *pmdp, pgtable_t ptep)
 {
+   pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_PXN;
+
VM_BUG_ON(mm == _mm);
-   __pmd_populate(pmdp, page_to_phys(ptep), PMD_TYPE_TABLE | 
PMD_TABLE_PXN);
+   if (page_tables_are_ro())
+   xchg_ro_pte(mm, (pte_t *)pmdp,
+   __pte(__phys_to_pmd_val(page_to_phys(ptep) | 
pmdval)));
+   else
+   __pmd_populate(pmdp, page_to_phys(ptep), pmdval);
 }
 
 #endif
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 8d3806c68687..a8daea6b4ac9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -30,6 +30,7 @@
 
 #include 
 #i

[RFC PATCH 02/12] arm64: mm: add helpers to remap page tables read-only/read-write

2022-01-26 Thread Ard Biesheuvel
Add a couple of helpers to remap a single page read-only or read-write
via its linear address. This will be used for mappings of page table
pages in the linear region.

Note that set_memory_ro/set_memory_rw operate on addresses in the
vmalloc space only, so they cannot be used here.

Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/pgtable.h |  3 +++
 arch/arm64/mm/pageattr.c | 14 ++
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index c4ba047a82d2..8d3806c68687 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -34,6 +34,9 @@
 #include 
 #include 
 
+int set_pgtable_ro(void *addr);
+int set_pgtable_rw(void *addr);
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
 
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index a3bacd79507a..61f4aca08b95 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -153,6 +153,20 @@ int set_memory_valid(unsigned long addr, int numpages, int 
enable)
__pgprot(PTE_VALID));
 }
 
+int set_pgtable_ro(void *addr)
+{
+   return __change_memory_common((u64)addr, PAGE_SIZE,
+ __pgprot(PTE_RDONLY),
+ __pgprot(PTE_WRITE));
+}
+
+int set_pgtable_rw(void *addr)
+{
+   return __change_memory_common((u64)addr, PAGE_SIZE,
+ __pgprot(PTE_WRITE),
+ __pgprot(PTE_RDONLY));
+}
+
 int set_direct_map_invalid_noflush(struct page *page)
 {
struct page_change_data data = {
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 00/12] arm64: implement read-only page tables

2022-01-26 Thread Ard Biesheuvel
This RFC series implements support for mapping all user and kernel page
tables read-only in the linear map, and using a special fixmap slot to
make any modifications.

The purpose is to prevent page tables from being manipulated
inadvertently, which is becoming more and more important on arm64, as
many new hardening features such as BTI and MTE are controlled via
attributes in the page tables.

This series is only half of the work that is underway to implement this
in terms of hypervisor services rather than fixmap pokes, as this will
allow the hypervisor to remove all write permissions from pages used as
page tables. This work is being done in the context of the pKVM project,
which defines a clear boundary between the hypervisor executing at EL2,
and the [untrusted] host running at EL1. In this context, managing the
host's page tables at HYP level should increase the robustness of the
entire system substantially.

This series is posted separately for discussion, as it introduces the
changes that are necessary to route all page table updates via a small
set of helpers, allowing us to choose between unprotected, fixmap or HYP
protection straight-forwardly.

The pKVM specific changes will be posted as a followup series.

Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Fuad Tabba 
Cc: Quentin Perret 
Cc: Mark Rutland 
Cc: James Morse 
Cc: Catalin Marinas 

Ard Biesheuvel (12):
  asm-generic/pgalloc: allow arch to override PMD alloc/free routines
  arm64: mm: add helpers to remap page tables read-only/read-write
  arm64: mm: use a fixmap slot for user page table modifications
  arm64: mm: remap PGD pages r/o in the linear region after allocation
  arm64: mm: remap PUD pages r/o in linear region
  arm64: mm: remap PMD pages r/o in linear region
  arm64: mm: remap PTE level user page tables r/o in the linear region
  arm64: mm: remap kernel PTE level page tables r/o in the linear region
  arm64: mm: remap kernel page tables read-only at end of init
  mm: add default definition of p4d_index()
  arm64: efi: use set_pte_at() not set_pte() in order to pass mm pointer
  arm64: hugetlb: use set_pte_at() not set_pte() to provide mm pointer

 arch/arm64/Kconfig   |  11 ++
 arch/arm64/include/asm/fixmap.h  |   1 +
 arch/arm64/include/asm/pgalloc.h |  49 -
 arch/arm64/include/asm/pgtable.h |  82 +++---
 arch/arm64/include/asm/tlb.h |   6 +
 arch/arm64/kernel/efi.c  |   2 +-
 arch/arm64/mm/Makefile   |   2 +
 arch/arm64/mm/fault.c|   8 +-
 arch/arm64/mm/hugetlbpage.c  |   4 +-
 arch/arm64/mm/mmu.c  | 115 +++-
 arch/arm64/mm/pageattr.c |  14 +++
 arch/arm64/mm/pgd.c  |  25 +++--
 arch/arm64/mm/ro_page_tables.c   | 100 +
 include/asm-generic/pgalloc.h|  13 ++-
 include/linux/pgtable.h  |   8 ++
 15 files changed, 405 insertions(+), 35 deletions(-)
 create mode 100644 arch/arm64/mm/ro_page_tables.c


base-commit: e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[RFC PATCH 01/12] asm-generic/pgalloc: allow arch to override PMD alloc/free routines

2022-01-26 Thread Ard Biesheuvel
Extend the existing CPP macro based hooks that allow architectures to
specialize the code that allocates and frees pages to be used as page
tables.

Signed-off-by: Ard Biesheuvel 
---
 include/asm-generic/pgalloc.h | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/pgalloc.h b/include/asm-generic/pgalloc.h
index 977bea16cf1b..65f31f615d99 100644
--- a/include/asm-generic/pgalloc.h
+++ b/include/asm-generic/pgalloc.h
@@ -34,6 +34,7 @@ static inline pte_t *pte_alloc_one_kernel(struct mm_struct 
*mm)
 }
 #endif
 
+#ifndef __HAVE_ARCH_PTE_FREE_KERNEL
 /**
  * pte_free_kernel - free PTE-level kernel page table page
  * @mm: the mm_struct of the current context
@@ -43,6 +44,7 @@ static inline void pte_free_kernel(struct mm_struct *mm, 
pte_t *pte)
 {
free_page((unsigned long)pte);
 }
+#endif
 
 /**
  * __pte_alloc_one - allocate a page for PTE-level user page table
@@ -91,6 +93,7 @@ static inline pgtable_t pte_alloc_one(struct mm_struct *mm)
  * done with a reference count in struct page.
  */
 
+#ifndef __HAVE_ARCH_PTE_FREE
 /**
  * pte_free - free PTE-level user page table page
  * @mm: the mm_struct of the current context
@@ -101,11 +104,11 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
pgtable_pte_page_dtor(pte_page);
__free_page(pte_page);
 }
+#endif
 
 
 #if CONFIG_PGTABLE_LEVELS > 2
 
-#ifndef __HAVE_ARCH_PMD_ALLOC_ONE
 /**
  * pmd_alloc_one - allocate a page for PMD-level page table
  * @mm: the mm_struct of the current context
@@ -116,7 +119,7 @@ static inline void pte_free(struct mm_struct *mm, struct 
page *pte_page)
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+static inline pmd_t *__pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
struct page *page;
gfp_t gfp = GFP_PGTABLE_USER;
@@ -132,6 +135,12 @@ static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, 
unsigned long addr)
}
return (pmd_t *)page_address(page);
 }
+
+#ifndef __HAVE_ARCH_PMD_ALLOC_ONE
+static inline pmd_t *pmd_alloc_one(struct mm_struct *mm, unsigned long addr)
+{
+   return __pmd_alloc_one(mm, addr);
+}
 #endif
 
 #ifndef __HAVE_ARCH_PMD_FREE
-- 
2.30.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: vgic-v3: Restrict SEIS workaround to known broken systems

2022-01-22 Thread Ard Biesheuvel
On Sat, 22 Jan 2022 at 11:39, Marc Zyngier  wrote:
>
> Contrary to what df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3
> locally generated SErrors") was asserting, there is at least one other
> system out there (Cavium ThunderX2) implementing SEIS, and not in
> an obviously broken way.
>
> So instead of imposing the M1 workaround on an innocent bystander,
> let's limit it to the two known broken Apple implementations.
>
> Fixes: df652bcf1136 ("KVM: arm64: vgic-v3: Work around GICv3 locally 
> generated SErrors")
> Reported-by: Ard Biesheuvel 
> Signed-off-by: Marc Zyngier 

Thanks for the fix.

Tested-by: Ard Biesheuvel 
Acked-by: Ard Biesheuvel 

One nit below.

> ---
>  arch/arm64/kvm/hyp/vgic-v3-sr.c |  3 +++
>  arch/arm64/kvm/vgic/vgic-v3.c   | 17 +++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index 20db2f281cf2..4fb419f7b8b6 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -983,6 +983,9 @@ static void __vgic_v3_read_ctlr(struct kvm_vcpu *vcpu, 
> u32 vmcr, int rt)
> val = ((vtr >> 29) & 7) << ICC_CTLR_EL1_PRI_BITS_SHIFT;
> /* IDbits */
> val |= ((vtr >> 23) & 7) << ICC_CTLR_EL1_ID_BITS_SHIFT;
> +   /* SEIS */
> +   if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK)
> +   val |= BIT(ICC_CTLR_EL1_SEIS_SHIFT);
> /* A3V */
> val |= ((vtr >> 21) & 1) << ICC_CTLR_EL1_A3V_SHIFT;
> /* EOImode */
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 78cf674c1230..d34a795f730c 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -609,6 +609,18 @@ static int __init early_gicv4_enable(char *buf)
>  }
>  early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
>
> +static struct midr_range broken_seis[] = {

Can this be const?

> +   MIDR_ALL_VERSIONS(MIDR_APPLE_M1_ICESTORM),
> +   MIDR_ALL_VERSIONS(MIDR_APPLE_M1_FIRESTORM),
> +   {},
> +};
> +
> +static bool vgic_v3_broken_seis(void)
> +{
> +   return ((kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) &&
> +   is_midr_in_range_list(read_cpuid_id(), broken_seis));
> +}
> +
>  /**
>   * vgic_v3_probe - probe for a VGICv3 compatible interrupt controller
>   * @info:  pointer to the GIC description
> @@ -676,9 +688,10 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
> group1_trap = true;
> }
>
> -   if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) {
> -   kvm_info("GICv3 with locally generated SEI\n");
> +   if (vgic_v3_broken_seis()) {
> +   kvm_info("GICv3 with broken locally generated SEI\n");
>
> +   kvm_vgic_global_state.ich_vtr_el2 &= ~ICH_VTR_SEIS_MASK;
> group0_trap = true;
> group1_trap = true;
> if (ich_vtr_el2 & ICH_VTR_TDS_MASK)
> --
> 2.34.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] Documentation, dt, numa: Add note to empty NUMA node

2021-09-22 Thread Ard Biesheuvel
On Tue, 21 Sept 2021 at 21:45, Rob Herring  wrote:
>
> On Sun, Sep 5, 2021 at 11:16 PM Gavin Shan  wrote:
> >
> > The empty memory nodes, where no memory resides in, are allowed.
> > For these empty memory nodes, the 'len' of 'reg' property is zero.
> > The NUMA node IDs are still valid and parsed, but memory may be
> > added to them through hotplug afterwards. Currently, QEMU fails
> > to boot when multiple empty memory nodes are specified. It's
> > caused by device-tree population failure and duplicated memory
> > node names.

Those memory regions are known in advance, right? So wouldn't it be
better to use something like 'status = "disabled"' here?

>
> I still don't like the fake addresses. I can't really give suggestions
> on alternative ways to fix this with you just presenting a solution.
>

Agreed. Please try to explain what the problem is, and why this is the
best way to solve it. Please include other solutions that were
considered and rejected if any exist.

> What is the failure you see? Can we relax the kernel's expectations?
> What about UEFI boot as the memory nodes aren't used (or maybe they
> are for NUMA?) How does this work with ACPI?
>

The EFI memory map only needs to describe the memory that was present
at boot. More memory can be represented as ACPI objects, including
coldplugged memory that is already present at boot. None of this
involves the memory nodes in DT.

> > As device-tree specification indicates, the 'unit-address' of
> > these empty memory nodes, part of their names, are the equivalents
> > to 'base-address'. Unfortunately, I finds difficulty to get where
> > the assignment of 'base-address' is properly documented for these
> > empty memory nodes. So lets add a section for empty memory nodes
> > to cover this in NUMA binding document. The 'unit-address',
> > equivalent to 'base-address' in the 'reg' property of these empty
> > memory nodes is specified to be the summation of highest memory
> > address plus the NUMA node ID.
> >
> > Signed-off-by: Gavin Shan 
> > Acked-by: Randy Dunlap 
> > ---
> >  Documentation/devicetree/bindings/numa.txt | 60 +-
> >  1 file changed, 59 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/devicetree/bindings/numa.txt 
> > b/Documentation/devicetree/bindings/numa.txt
> > index 21b35053ca5a..82f047bc8dd6 100644
> > --- a/Documentation/devicetree/bindings/numa.txt
> > +++ b/Documentation/devicetree/bindings/numa.txt
> > @@ -103,7 +103,65 @@ Example:
> > };
> >
> >  
> > ==
> > -4 - Example dts
> > +4 - Empty memory nodes
> > +==
> > +
> > +Empty memory nodes, which no memory resides in, are allowed. The 'length'
> > +field of the 'reg' property is zero. However, the 'base-address' is a
> > +dummy and invalid address, which is the summation of highest memory address
> > +plus the NUMA node ID. The NUMA node IDs and distance maps are still valid
> > +and memory may be added into them through hotplug afterwards.
> > +
> > +Example:
> > +
> > +   memory@0 {
> > +   device_type = "memory";
> > +   reg = <0x0 0x0 0x0 0x8000>;
> > +   numa-node-id = <0>;
> > +   };
> > +
> > +   memory@8000 {
> > +   device_type = "memory";
> > +   reg = <0x0 0x8000 0x0 0x8000>;
> > +   numa-node-id = <1>;
> > +   };
> > +
> > +   /* Empty memory node */
> > +   memory@10002 {
> > +   device_type = "memory";
> > +   reg = <0x1 0x2 0x0 0x0>;
> > +   numa-node-id = <2>;
> > +   };
> > +
> > +   /* Empty memory node */
> > +   memory@10003 {
> > +   device_type = "memory";
> > +   reg = <0x1 0x3 0x0 0x0>;
> > +   numa-node-id = <3>;
> > +   };
>
> Do you really need the memory nodes here or just some way to define
> numa node id's 2 and 3 as valid?
>
>
> > +
> > +   distance-map {
> > +   compatible = "numa-distance-map-v1";
> > +   distance-matrix = <0 0  10>,
> > + <0 1  20>,
> > + <0 2  40>,
> > + <0 3  20>,
> > + <1 0  20>,
> > + <1 1  10>,
> > + <1 2  20>,
> > + <1 3  40>,
> > + <2 0  40>,
> > + <2 1  20>,
> > + <2 2  10>,
> > + <2 3  20>,
> > + <3 0  20>,
> > + <3 1  40>,
> > + <3 2  20>,
> > + <3 3  10>;
> > +   };
> > +
> > 

Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-05-12 Thread Ard Biesheuvel
On Wed, 12 May 2021 at 09:34, Mike Rapoport  wrote:
>
> On Wed, May 12, 2021 at 09:00:02AM +0200, Ard Biesheuvel wrote:
> > On Tue, 11 May 2021 at 12:05, Mike Rapoport  wrote:
> > >
> > > From: Mike Rapoport 
> > >
> > > Hi,
> > >
> > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> > > pfn_valid_within() to 1.
> > >
> > > The idea is to mark NOMAP pages as reserved in the memory map and restore
> > > the intended semantics of pfn_valid() to designate availability of struct
> > > page for a pfn.
> > >
> > > With this the core mm will be able to cope with the fact that it cannot 
> > > use
> > > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> > > will be treated correctly even without the need for pfn_valid_within.
> > >
> > > The patches are boot tested on qemu-system-aarch64.
> > >
> >
> > Did you use EFI boot when testing this? The memory map is much more
> > fragmented in that case, so this would be a good data point.
>
> Right, something like this:
>

Yes, although it is not always that bad.

> [0.00] Early memory node ranges
> [0.00]   node   0: [mem 0x4000-0xbfff]
> [0.00]   node   0: [mem 0xc000-0x]

This is allocated below 4 GB by the firmware, for reasons that are
only valid on x86 (where some of the early boot chain is IA32 only)

> [0.00]   node   0: [mem 0x0001-0x0004386f]
> [0.00]   node   0: [mem 0x00043870-0x00043899]
> [0.00]   node   0: [mem 0x0004389a-0x0004389b]
> [0.00]   node   0: [mem 0x0004389c-0x000438b5]
> [0.00]   node   0: [mem 0x000438b6-0x00043be3]
> [0.00]   node   0: [mem 0x00043be4-0x00043bec]
> [0.00]   node   0: [mem 0x00043bed-0x00043bed]
> [0.00]   node   0: [mem 0x00043bee-0x00043bff]
> [0.00]   node   0: [mem 0x00043c00-0x00043fff]
>
> This is a pity really, because I don't see a fundamental reason for those
> tiny holes all over the place.
>

There is a config option in the firmware build that allows these
regions to be preallocated using larger windows, which greatly reduces
the fragmentation.
> I know that EFI/ACPI mandates "IO style" memory access for those regions,
> but I fail to get why...
>

Not sure what you mean by 'IO style memory access'.



> > > I beleive it would be best to route these via mmotm tree.
> > >
> > > v4:
> > > * rebase on v5.13-rc1
> > >
> > > v3: Link: 
> > > https://lore.kernel.org/lkml/20210422061902.21614-1-r...@kernel.org
> > > * Fix minor issues found by Anshuman
> > > * Freshen up the declaration of pfn_valid() to make it consistent with
> > >   pfn_is_map_memory()
> > > * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
> > >
> > > v2: Link: 
> > > https://lore.kernel.org/lkml/20210421065108.1987-1-r...@kernel.org
> > > * Add check for PFN overflow in pfn_is_map_memory()
> > > * Add Acked-by and Reviewed-by tags, thanks David.
> > >
> > > v1: Link: 
> > > https://lore.kernel.org/lkml/20210420090925.7457-1-r...@kernel.org
> > > * Add comment about the semantics of pfn_valid() as Anshuman suggested
> > > * Extend comments about MEMBLOCK_NOMAP, per Anshuman
> > > * Use pfn_is_map_memory() name for the exported wrapper for
> > >   memblock_is_map_memory(). It is still local to arch/arm64 in the end
> > >   because of header dependency issues.
> > >
> > > rfc: Link: 
> > > https://lore.kernel.org/lkml/20210407172607.8812-1-r...@kernel.org
> > >
> > > Mike Rapoport (4):
> > >   include/linux/mmzone.h: add documentation for pfn_valid()
> > >   memblock: update initialization of reserved pages
> > >   arm64: decouple check whether pfn is in linear map from pfn_valid()
> > >   arm64: drop pfn_valid_within() and simplify pfn_valid()
> > >
> > >  arch/arm64/Kconfig  |  3 ---
> > >  arch/arm64/include/asm/memory.h |  2 +-
> > >  arch/arm64/include/asm/page.h   |  3 ++-
> > >  arch/arm64/kvm/mmu.c|  2 +-
> > >  arch/arm64/mm/init.c| 14 +-
> > >  arch/arm64/mm/ioremap.c |  4 ++--
> > >  arch/arm64/mm/mmu.c |  2 +-
> > >  include/linux/memblock.h|  4 +++-
> > >  include/linux/mmzone.h  | 11 +++
> > >  mm/memblock.c   | 28 ++--
> > >  10 files changed, 60 insertions(+), 13 deletions(-)
> > >
> > >
> > > base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
> > > --
> > > 2.28.0
> > >
>
> --
> Sincerely yours,
> Mike.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-05-12 Thread Ard Biesheuvel
On Tue, 11 May 2021 at 12:05, Mike Rapoport  wrote:
>
> From: Mike Rapoport 
>
> Hi,
>
> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire
> pfn_valid_within() to 1.
>
> The idea is to mark NOMAP pages as reserved in the memory map and restore
> the intended semantics of pfn_valid() to designate availability of struct
> page for a pfn.
>
> With this the core mm will be able to cope with the fact that it cannot use
> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks
> will be treated correctly even without the need for pfn_valid_within.
>
> The patches are boot tested on qemu-system-aarch64.
>

Did you use EFI boot when testing this? The memory map is much more
fragmented in that case, so this would be a good data point.


> I beleive it would be best to route these via mmotm tree.
>
> v4:
> * rebase on v5.13-rc1
>
> v3: Link: https://lore.kernel.org/lkml/20210422061902.21614-1-r...@kernel.org
> * Fix minor issues found by Anshuman
> * Freshen up the declaration of pfn_valid() to make it consistent with
>   pfn_is_map_memory()
> * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David
>
> v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-r...@kernel.org
> * Add check for PFN overflow in pfn_is_map_memory()
> * Add Acked-by and Reviewed-by tags, thanks David.
>
> v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-r...@kernel.org
> * Add comment about the semantics of pfn_valid() as Anshuman suggested
> * Extend comments about MEMBLOCK_NOMAP, per Anshuman
> * Use pfn_is_map_memory() name for the exported wrapper for
>   memblock_is_map_memory(). It is still local to arch/arm64 in the end
>   because of header dependency issues.
>
> rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-r...@kernel.org
>
> Mike Rapoport (4):
>   include/linux/mmzone.h: add documentation for pfn_valid()
>   memblock: update initialization of reserved pages
>   arm64: decouple check whether pfn is in linear map from pfn_valid()
>   arm64: drop pfn_valid_within() and simplify pfn_valid()
>
>  arch/arm64/Kconfig  |  3 ---
>  arch/arm64/include/asm/memory.h |  2 +-
>  arch/arm64/include/asm/page.h   |  3 ++-
>  arch/arm64/kvm/mmu.c|  2 +-
>  arch/arm64/mm/init.c| 14 +-
>  arch/arm64/mm/ioremap.c |  4 ++--
>  arch/arm64/mm/mmu.c |  2 +-
>  include/linux/memblock.h|  4 +++-
>  include/linux/mmzone.h  | 11 +++
>  mm/memblock.c   | 28 ++--
>  10 files changed, 60 insertions(+), 13 deletions(-)
>
>
> base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
> --
> 2.28.0
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()

2021-05-11 Thread Ard Biesheuvel
On Tue, 11 May 2021 at 12:06, Mike Rapoport  wrote:
>
> From: Mike Rapoport 
>
> The arm64's version of pfn_valid() differs from the generic because of two
> reasons:
>
> * Parts of the memory map are freed during boot. This makes it necessary to
>   verify that there is actual physical memory that corresponds to a pfn
>   which is done by querying memblock.
>
> * There are NOMAP memory regions. These regions are not mapped in the
>   linear map and until the previous commit the struct pages representing
>   these areas had default values.
>
> As the consequence of absence of the special treatment of NOMAP regions in
> the memory map it was necessary to use memblock_is_map_memory() in
> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that
> generic mm functionality would not treat a NOMAP page as a normal page.
>
> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and
> the rest of core mm will treat them as unusable memory and thus
> pfn_valid_within() is no longer required at all and can be disabled by
> removing CONFIG_HOLES_IN_ZONE on arm64.
>
> pfn_valid() can be slightly simplified by replacing
> memblock_is_map_memory() with memblock_is_memory().
>
> Signed-off-by: Mike Rapoport 
> Acked-by: David Hildenbrand 

Acked-by: Ard Biesheuvel 

... and many thanks for cleaning this up.


> ---
>  arch/arm64/Kconfig   | 3 ---
>  arch/arm64/mm/init.c | 2 +-
>  2 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 9f1d8566bbf9..d7dc8698cf8e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -1052,9 +1052,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
> def_bool y
> depends on NUMA
>
> -config HOLES_IN_ZONE
> -   def_bool y
> -
>  source "kernel/Kconfig.hz"
>
>  config ARCH_SPARSEMEM_ENABLE
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 798f74f501d5..fb07218da2c0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -251,7 +251,7 @@ int pfn_valid(unsigned long pfn)
> if (!early_section(ms))
> return pfn_section_valid(ms, pfn);
>
> -   return memblock_is_map_memory(addr);
> +   return memblock_is_memory(addr);
>  }
>  EXPORT_SYMBOL(pfn_valid);
>
> --
> 2.28.0
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()

2021-05-11 Thread Ard Biesheuvel
On Tue, 11 May 2021 at 12:06, Mike Rapoport  wrote:
>
> From: Mike Rapoport 
>
> The intended semantics of pfn_valid() is to verify whether there is a
> struct page for the pfn in question and nothing else.
>
> Yet, on arm64 it is used to distinguish memory areas that are mapped in the
> linear map vs those that require ioremap() to access them.
>
> Introduce a dedicated pfn_is_map_memory() wrapper for
> memblock_is_map_memory() to perform such check and use it where
> appropriate.
>
> Using a wrapper allows to avoid cyclic include dependencies.
>
> While here also update style of pfn_valid() so that both pfn_valid() and
> pfn_is_map_memory() declarations will be consistent.
>
> Signed-off-by: Mike Rapoport 
> Acked-by: David Hildenbrand 

Acked-by: Ard Biesheuvel 

> ---
>  arch/arm64/include/asm/memory.h |  2 +-
>  arch/arm64/include/asm/page.h   |  3 ++-
>  arch/arm64/kvm/mmu.c|  2 +-
>  arch/arm64/mm/init.c| 12 
>  arch/arm64/mm/ioremap.c |  4 ++--
>  arch/arm64/mm/mmu.c |  2 +-
>  6 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 87b90dc27a43..9027b7e16c4c 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -369,7 +369,7 @@ static inline void *phys_to_virt(phys_addr_t x)
>
>  #define virt_addr_valid(addr)  ({  \
> __typeof__(addr) __addr = __tag_reset(addr);\
> -   __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr));  \
> +   __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr));
>   \
>  })
>
>  void dump_mem_limit(void);
> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
> index 012cffc574e8..75ddfe671393 100644
> --- a/arch/arm64/include/asm/page.h
> +++ b/arch/arm64/include/asm/page.h
> @@ -37,7 +37,8 @@ void copy_highpage(struct page *to, struct page *from);
>
>  typedef struct page *pgtable_t;
>
> -extern int pfn_valid(unsigned long);
> +int pfn_valid(unsigned long pfn);
> +int pfn_is_map_memory(unsigned long pfn);
>
>  #include 
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..470070073085 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>
>  static bool kvm_is_device_pfn(unsigned long pfn)
>  {
> -   return !pfn_valid(pfn);
> +   return !pfn_is_map_memory(pfn);
>  }
>
>  static void *stage2_memcache_zalloc_page(void *arg)
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 16a2b2b1c54d..798f74f501d5 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -255,6 +255,18 @@ int pfn_valid(unsigned long pfn)
>  }
>  EXPORT_SYMBOL(pfn_valid);
>
> +int pfn_is_map_memory(unsigned long pfn)
> +{
> +   phys_addr_t addr = PFN_PHYS(pfn);
> +
> +   /* avoid false positives for bogus PFNs, see comment in pfn_valid() */
> +   if (PHYS_PFN(addr) != pfn)
> +   return 0;
> +
> +   return memblock_is_map_memory(addr);
> +}
> +EXPORT_SYMBOL(pfn_is_map_memory);
> +
>  static phys_addr_t memory_limit = PHYS_ADDR_MAX;
>
>  /*
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index b5e83c46b23e..b7c81dacabf0 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t 
> phys_addr, size_t size,
> /*
>  * Don't allow RAM to be mapped.
>  */
> -   if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr
> +   if (WARN_ON(pfn_is_map_memory(__phys_to_pfn(phys_addr
> return NULL;
>
> area = get_vm_area_caller(size, VM_IOREMAP, caller);
> @@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap);
>  void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size)
>  {
> /* For normal memory we already have a cacheable mapping. */
> -   if (pfn_valid(__phys_to_pfn(phys_addr)))
> +   if (pfn_is_map_memory(__phys_to_pfn(phys_addr)))
> return (void __iomem *)__phys_to_virt(phys_addr);
>
> return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL),
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 6dd9369e3ea0..ab5914cebd3c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -82,7 +82,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>   unsigned long size, pgp

Re: [PATCH v4 2/4] memblock: update initialization of reserved pages

2021-05-11 Thread Ard Biesheuvel
On Tue, 11 May 2021 at 12:06, Mike Rapoport  wrote:
>
> From: Mike Rapoport 
>
> The struct pages representing a reserved memory region are initialized
> using reserve_bootmem_range() function. This function is called for each
> reserved region just before the memory is freed from memblock to the buddy
> page allocator.
>
> The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> values set by the memory map initialization which makes it necessary to
> have a special treatment for such pages in pfn_valid() and
> pfn_valid_within().
>
> Split out initialization of the reserved pages to a function with a
> meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> reserved regions and mark struct pages for the NOMAP regions as
> PageReserved.
>
> Signed-off-by: Mike Rapoport 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Anshuman Khandual 

Acked-by: Ard Biesheuvel 

> ---
>  include/linux/memblock.h |  4 +++-
>  mm/memblock.c| 28 ++--
>  2 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 5984fff3f175..1b4c97c151ae 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn;
>   * @MEMBLOCK_NONE: no special request
>   * @MEMBLOCK_HOTPLUG: hotpluggable region
>   * @MEMBLOCK_MIRROR: mirrored region
> - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping
> + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
> + * reserved in the memory map; refer to memblock_mark_nomap() description
> + * for further details
>   */
>  enum memblock_flags {
> MEMBLOCK_NONE   = 0x0,  /* No special request */
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..3abf2c3fea7f 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t 
> base, phys_addr_t size)
>   * @base: the base phys addr of the region
>   * @size: the size of the region
>   *
> + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the
> + * direct mapping of the physical memory. These regions will still be
> + * covered by the memory map. The struct page representing NOMAP memory
> + * frames in the memory map will be PageReserved()
> + *
>   * Return: 0 on success, -errno on failure.
>   */
>  int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size)
> @@ -2002,6 +2007,26 @@ static unsigned long __init 
> __free_memory_core(phys_addr_t start,
> return end_pfn - start_pfn;
>  }
>
> +static void __init memmap_init_reserved_pages(void)
> +{
> +   struct memblock_region *region;
> +   phys_addr_t start, end;
> +   u64 i;
> +
> +   /* initialize struct pages for the reserved regions */
> +   for_each_reserved_mem_range(i, , )
> +   reserve_bootmem_region(start, end);
> +
> +   /* and also treat struct pages for the NOMAP regions as PageReserved 
> */
> +   for_each_mem_region(region) {
> +   if (memblock_is_nomap(region)) {
> +   start = region->base;
> +   end = start + region->size;
> +   reserve_bootmem_region(start, end);
> +   }
> +   }
> +}
> +
>  static unsigned long __init free_low_memory_core_early(void)
>  {
> unsigned long count = 0;
> @@ -2010,8 +2035,7 @@ static unsigned long __init 
> free_low_memory_core_early(void)
>
> memblock_clear_hotplug(0, -1);
>
> -   for_each_reserved_mem_range(i, , )
> -   reserve_bootmem_region(start, end);
> +   memmap_init_reserved_pages();
>
> /*
>  * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> --
> 2.28.0
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 1/4] include/linux/mmzone.h: add documentation for pfn_valid()

2021-05-11 Thread Ard Biesheuvel
On Tue, 11 May 2021 at 12:06, Mike Rapoport  wrote:
>
> From: Mike Rapoport 
>
> Add comment describing the semantics of pfn_valid() that clarifies that
> pfn_valid() only checks for availability of a memory map entry (i.e. struct
> page) for a PFN rather than availability of usable memory backing that PFN.
>
> The most "generic" version of pfn_valid() used by the configurations with
> SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most
> suitable place for documentation about semantics of pfn_valid().
>
> Suggested-by: Anshuman Khandual 
> Signed-off-by: Mike Rapoport 
> Reviewed-by: Anshuman Khandual 

Acked-by: Ard Biesheuvel 

> ---
>  include/linux/mmzone.h | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0d53eba1c383..e5945ca24df7 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -1427,6 +1427,17 @@ static inline int pfn_section_valid(struct mem_section 
> *ms, unsigned long pfn)
>  #endif
>
>  #ifndef CONFIG_HAVE_ARCH_PFN_VALID
> +/**
> + * pfn_valid - check if there is a valid memory map entry for a PFN
> + * @pfn: the page frame number to check
> + *
> + * Check if there is a valid memory map entry aka struct page for the @pfn.
> + * Note, that availability of the memory map entry does not imply that
> + * there is actual usable memory at that @pfn. The struct page may
> + * represent a hole or an unusable page frame.
> + *
> + * Return: 1 for PFNs that have memory map entries and 0 otherwise
> + */
>  static inline int pfn_valid(unsigned long pfn)
>  {
> struct mem_section *ms;
> --
> 2.28.0
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages

2021-04-14 Thread Ard Biesheuvel
On Wed, 14 Apr 2021 at 17:14, David Hildenbrand  wrote:
>
> On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport 
> >
> > The struct pages representing a reserved memory region are initialized
> > using reserve_bootmem_range() function. This function is called for each
> > reserved region just before the memory is freed from memblock to the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the default
> > values set by the memory map initialization which makes it necessary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
>
> I assume these pages are never given to the buddy, because we don't have
> a direct mapping. So to the kernel, it's essentially just like a memory
> hole with benefits.
>
> I can spot that we want to export such memory like any special memory
> thingy/hole in /proc/iomem -- "reserved", which makes sense.
>
> I would assume that MEMBLOCK_NOMAP is a special type of *reserved*
> memory. IOW, that for_each_reserved_mem_range() should already succeed
> on these as well -- we should mark anything that is MEMBLOCK_NOMAP
> implicitly as reserved. Or are there valid reasons not to do so? What
> can anyone do with that memory?
>
> I assume they are pretty much useless for the kernel, right? Like other
> reserved memory ranges.
>

On ARM, we need to know whether any physical regions that do not
contain system memory contain something with device semantics or not.
One of the examples is ACPI tables: these are in reserved memory, and
so they are not covered by the linear region. However, when the ACPI
core ioremap()s an arbitrary memory region, we don't know whether it
is mapping a memory region or a device region unless we keep track of
this in some way. (Device mappings require device attributes, but
firmware tables require memory attributes, as they might be accessed
using misaligned reads)


>
> >
> > Split out initialization of the reserved pages to a function with a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the
> > reserved regions and mark struct pages for the NOMAP regions as
> > PageReserved.
> >
> > Signed-off-by: Mike Rapoport 
> > ---
> >   mm/memblock.c | 23 +--
> >   1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init 
> > __free_memory_core(phys_addr_t start,
> >   return end_pfn - start_pfn;
> >   }
> >
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > + struct memblock_region *region;
> > + phys_addr_t start, end;
> > + u64 i;
> > +
> > + /* initialize struct pages for the reserved regions */
> > + for_each_reserved_mem_range(i, , )
> > + reserve_bootmem_region(start, end);
> > +
> > + /* and also treat struct pages for the NOMAP regions as PageReserved 
> > */
> > + for_each_mem_region(region) {
> > + if (memblock_is_nomap(region)) {
> > + start = region->base;
> > + end = start + region->size;
> > + reserve_bootmem_region(start, end);
> > + }
> > + }
> > +}
> > +
> >   static unsigned long __init free_low_memory_core_early(void)
> >   {
> >   unsigned long count = 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init 
> > free_low_memory_core_early(void)
> >
> >   memblock_clear_hotplug(0, -1);
> >
> > - for_each_reserved_mem_range(i, , )
> > - reserve_bootmem_region(start, end);
> > + memmap_init_reserved_pages();
> >
> >   /*
> >* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id
> >
>
>
> --
> Thanks,
>
> David / dhildenb
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE

2021-03-30 Thread Ard Biesheuvel
On Tue, 30 Mar 2021 at 15:56, Marc Zyngier  wrote:
>
> On Tue, 30 Mar 2021 14:15:19 +0100,
> Ard Biesheuvel  wrote:
> >
> > On Tue, 30 Mar 2021 at 15:04, Marc Zyngier  wrote:
> > >
> > > On Tue, 30 Mar 2021 13:49:18 +0100,
> > > Ard Biesheuvel  wrote:
> > > >
> > > > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier  wrote:
> > > > >
> > > > > On Tue, 30 Mar 2021 12:21:26 +0100,
> > > > > Ard Biesheuvel  wrote:
> > > > > >
> > > > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 
> > > > > > 52-bit VA
> > > > > > configurations") introduced a new layout for the 52-bit VA space, in
> > > > > > order to maximize the space available to the linear region. After 
> > > > > > this
> > > > > > change, the kernel VA space is no longer split 1:1 down the middle, 
> > > > > > and
> > > > > > as it turns out, this violates an assumption in the KVM init code 
> > > > > > when
> > > > > > it chooses the layout for the nVHE EL2 mapping.
> > > > > >
> > > > > > Given that EFI does not support 52-bit VA addressing (as it only
> > > > > > supports 4k pages), and that in general, loaders cannot assume that 
> > > > > > the
> > > > > > kernel being loaded supports 52-bit VA/PA addressing in the first 
> > > > > > place,
> > > > > > we can safely assume that the kernel, and therefore the .idmap 
> > > > > > section,
> > > > > > will be 48-bit addressable on 52-bit VA capable systems.
> > > > > >
> > > > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > > > > > window starting at address 0x0, containing the ID map and the
> > > > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 
> > > > > > byte
> > > > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes 
> > > > > > in
> > > > > > size, so it is slightly larger, but this only matters on systems 
> > > > > > where
> > > > > > the DRAM footprint in the physical memory map exceeds 3968 TB)
> > > > >
> > > > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> > > > > necessarily because I have that much memory, but because my system has
> > > > > multiple memory banks, one of which lands on that spot, I cannot map
> > > > > such memory at EL2. We'll explode at run time.
> > > > >
> > > > > Can we keep the private mapping to 47 bits and restore the missing
> > > > > chunk to the linear mapping? Of course, it means that the linear map
> > > > > is now potential no linear anymore, so we'd have to garantee that the
> > > > > kernel lines in the first 2^47 bits instead. Crap.
> > > > >
> > > >
> > > > Yeah. The linear region needs to be contiguous. Alternatively, we
> > > > could restrict the upper address limit for loading the kernel to 47
> > > > bits.
> > >
> > > Is that something we can do retroactively? We could mandate it for
> > > LVA systems only, but that's a bit odd.
> > >
> >
> > Yeah, especially given the fact that LVA systems will be VHE capable
> > and may therefore not care in the first place.
> >
> > On systems that have memory that high, EFI is likely to load the
> > kernel there, as it usually allocates from the top down, and it tries
> > to avoid having to move it around unless asked to (via KASLR), in
> > which case it will currently randomize over the entire available
> > memory space.
> >
> > So it is going to add a special case for a corner^2 case, i.e., nVHE
> > on 52-bit/64k pages with more than 3968 TB distance between the start
> > and end of DRAM. Ugh.
>
> Yeah. I'd rather we ignore that memory altogether, but I don't think
> we can.
>
> > It seems to me that the only way to solve this is to permit the idmap
> > and the hyp linear region to overlap, and use the 2^47 byte window at
> > the top of the address space for the hyp private mappings instead of
> > the one at the bottom.
>
> But that's the hard problem I want to avoid thinking of.
>
> We need to ensure that there is no EL1 VA that is congruent with the
> idmap over the kern_hyp_va() transformation. It means imposing
> restrictions over the EL1 linear map, and prevent any allocation that
> would result in this overlap (and that is including text).
>
> How do we do that?
>

A phys to virt offset of 0x0 is perfectly acceptable, no? The only
difference is that the idmapped bits are in another part of the VA
space.

> Frankly, I think we need to start looking into enabling VHE for the
> nVHE /behaviour/. Having a single TTBR on these systems is just
> insane.
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE

2021-03-30 Thread Ard Biesheuvel
On Tue, 30 Mar 2021 at 15:04, Marc Zyngier  wrote:
>
> On Tue, 30 Mar 2021 13:49:18 +0100,
> Ard Biesheuvel  wrote:
> >
> > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier  wrote:
> > >
> > > On Tue, 30 Mar 2021 12:21:26 +0100,
> > > Ard Biesheuvel  wrote:
> > > >
> > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > > > configurations") introduced a new layout for the 52-bit VA space, in
> > > > order to maximize the space available to the linear region. After this
> > > > change, the kernel VA space is no longer split 1:1 down the middle, and
> > > > as it turns out, this violates an assumption in the KVM init code when
> > > > it chooses the layout for the nVHE EL2 mapping.
> > > >
> > > > Given that EFI does not support 52-bit VA addressing (as it only
> > > > supports 4k pages), and that in general, loaders cannot assume that the
> > > > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > > > we can safely assume that the kernel, and therefore the .idmap section,
> > > > will be 48-bit addressable on 52-bit VA capable systems.
> > > >
> > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > > > window starting at address 0x0, containing the ID map and the
> > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > > > size, so it is slightly larger, but this only matters on systems where
> > > > the DRAM footprint in the physical memory map exceeds 3968 TB)
> > >
> > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> > > necessarily because I have that much memory, but because my system has
> > > multiple memory banks, one of which lands on that spot, I cannot map
> > > such memory at EL2. We'll explode at run time.
> > >
> > > Can we keep the private mapping to 47 bits and restore the missing
> > > chunk to the linear mapping? Of course, it means that the linear map
> > > is now potential no linear anymore, so we'd have to garantee that the
> > > kernel lines in the first 2^47 bits instead. Crap.
> > >
> >
> > Yeah. The linear region needs to be contiguous. Alternatively, we
> > could restrict the upper address limit for loading the kernel to 47
> > bits.
>
> Is that something we can do retroactively? We could mandate it for
> LVA systems only, but that's a bit odd.
>

Yeah, especially given the fact that LVA systems will be VHE capable
and may therefore not care in the first place.

On systems that have memory that high, EFI is likely to load the
kernel there, as it usually allocates from the top down, and it tries
to avoid having to move it around unless asked to (via KASLR), in
which case it will currently randomize over the entire available
memory space.

So it is going to add a special case for a corner^2 case, i.e., nVHE
on 52-bit/64k pages with more than 3968 TB distance between the start
and end of DRAM. Ugh.

It seems to me that the only way to solve this is to permit the idmap
and the hyp linear region to overlap, and use the 2^47 byte window at
the top of the address space for the hyp private mappings instead of
the one at the bottom.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE

2021-03-30 Thread Ard Biesheuvel
On Tue, 30 Mar 2021 at 14:44, Marc Zyngier  wrote:
>
> On Tue, 30 Mar 2021 12:21:26 +0100,
> Ard Biesheuvel  wrote:
> >
> > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > configurations") introduced a new layout for the 52-bit VA space, in
> > order to maximize the space available to the linear region. After this
> > change, the kernel VA space is no longer split 1:1 down the middle, and
> > as it turns out, this violates an assumption in the KVM init code when
> > it chooses the layout for the nVHE EL2 mapping.
> >
> > Given that EFI does not support 52-bit VA addressing (as it only
> > supports 4k pages), and that in general, loaders cannot assume that the
> > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > we can safely assume that the kernel, and therefore the .idmap section,
> > will be 48-bit addressable on 52-bit VA capable systems.
> >
> > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > window starting at address 0x0, containing the ID map and the
> > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > size, so it is slightly larger, but this only matters on systems where
> > the DRAM footprint in the physical memory map exceeds 3968 TB)
>
> So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> necessarily because I have that much memory, but because my system has
> multiple memory banks, one of which lands on that spot, I cannot map
> such memory at EL2. We'll explode at run time.
>
> Can we keep the private mapping to 47 bits and restore the missing
> chunk to the linear mapping? Of course, it means that the linear map
> is now potential no linear anymore, so we'd have to garantee that the
> kernel lines in the first 2^47 bits instead. Crap.
>

Yeah. The linear region needs to be contiguous. Alternatively, we
could restrict the upper address limit for loading the kernel to 47
bits.

> >
> > Fixes: f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA 
> > configurations")
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  Documentation/arm64/booting.rst |  6 +++---
> >  arch/arm64/kvm/va_layout.c  | 18 ++
> >  2 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/arm64/booting.rst 
> > b/Documentation/arm64/booting.rst
> > index 7552dbc1cc54..418ec9b63d2c 100644
> > --- a/Documentation/arm64/booting.rst
> > +++ b/Documentation/arm64/booting.rst
> > @@ -121,8 +121,8 @@ Header notes:
> > to the base of DRAM, since memory below it is not
> > accessible via the linear mapping
> >   1
> > -   2MB aligned base may be anywhere in physical
> > -   memory
> > +   2MB aligned base may be anywhere in the 48-bit
> > +   addressable physical memory region
> >Bits 4-63  Reserved.
> >= 
> > ===
> >
> > @@ -132,7 +132,7 @@ Header notes:
> >depending on selected features, and is effectively unbound.
> >
> >  The Image must be placed text_offset bytes from a 2MB aligned base
> > -address anywhere in usable system RAM and called there. The region
> > +address in 48-bit addressable system RAM and called there. The region
> >  between the 2 MB aligned base address and the start of the image has no
> >  special significance to the kernel, and may be used for other purposes.
> >  At least image_size bytes from the start of the image must be free for
> > diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> > index 978301392d67..e9ab449de197 100644
> > --- a/arch/arm64/kvm/va_layout.c
> > +++ b/arch/arm64/kvm/va_layout.c
> > @@ -62,9 +62,19 @@ __init void kvm_compute_layout(void)
> >   phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
> >   u64 hyp_va_msb;
> >
> > - /* Where is my RAM region? */
> > - hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
> > - hyp_va_msb ^= BIT(vabits_actual - 1);
> > + /*
> > +  * On LVA capable hardware, the kernel is guaranteed to reside
> > +  * in the 48-bit addressable part of physical memory, and so
> > +  * the idmap will be located there as well. Put the EL2 linear
> > +  * region right after it, where it can grow upward to fill the
> > +  

[PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE

2021-03-30 Thread Ard Biesheuvel
Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
configurations") introduced a new layout for the 52-bit VA space, in
order to maximize the space available to the linear region. After this
change, the kernel VA space is no longer split 1:1 down the middle, and
as it turns out, this violates an assumption in the KVM init code when
it chooses the layout for the nVHE EL2 mapping.

Given that EFI does not support 52-bit VA addressing (as it only
supports 4k pages), and that in general, loaders cannot assume that the
kernel being loaded supports 52-bit VA/PA addressing in the first place,
we can safely assume that the kernel, and therefore the .idmap section,
will be 48-bit addressable on 52-bit VA capable systems.

So in this case, organize the nVHE EL2 address space as a 2^48 byte
window starting at address 0x0, containing the ID map and the
hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
size, so it is slightly larger, but this only matters on systems where
the DRAM footprint in the physical memory map exceeds 3968 TB)

Fixes: f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA 
configurations")
Signed-off-by: Ard Biesheuvel 
---
 Documentation/arm64/booting.rst |  6 +++---
 arch/arm64/kvm/va_layout.c  | 18 ++
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index 7552dbc1cc54..418ec9b63d2c 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -121,8 +121,8 @@ Header notes:
  to the base of DRAM, since memory below it is not
  accessible via the linear mapping
1
- 2MB aligned base may be anywhere in physical
- memory
+ 2MB aligned base may be anywhere in the 48-bit
+ addressable physical memory region
   Bits 4-63Reserved.
   = ===
 
@@ -132,7 +132,7 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address anywhere in usable system RAM and called there. The region
+address in 48-bit addressable system RAM and called there. The region
 between the 2 MB aligned base address and the start of the image has no
 special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
index 978301392d67..e9ab449de197 100644
--- a/arch/arm64/kvm/va_layout.c
+++ b/arch/arm64/kvm/va_layout.c
@@ -62,9 +62,19 @@ __init void kvm_compute_layout(void)
phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
u64 hyp_va_msb;
 
-   /* Where is my RAM region? */
-   hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
-   hyp_va_msb ^= BIT(vabits_actual - 1);
+   /*
+* On LVA capable hardware, the kernel is guaranteed to reside
+* in the 48-bit addressable part of physical memory, and so
+* the idmap will be located there as well. Put the EL2 linear
+* region right after it, where it can grow upward to fill the
+* entire 52-bit VA region.
+*/
+   if (vabits_actual > VA_BITS_MIN) {
+   hyp_va_msb = BIT(VA_BITS_MIN);
+   } else {
+   hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
+   hyp_va_msb ^= BIT(vabits_actual - 1);
+   }
 
tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
(u64)(high_memory - 1));
@@ -72,7 +82,7 @@ __init void kvm_compute_layout(void)
va_mask = GENMASK_ULL(tag_lsb - 1, 0);
tag_val = hyp_va_msb;
 
-   if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (vabits_actual - 
1)) {
+   if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb < (vabits_actual - 1)) 
{
/* We have some free bits to insert a random tag. */
tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, 
tag_lsb);
}
-- 
2.31.0.291.g576ba9dcdaf-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 3/5] ARM: implement support for SMCCC TRNG entropy source

2021-03-15 Thread Ard Biesheuvel
On Wed, 6 Jan 2021 at 11:35, Andre Przywara  wrote:
>
> From: Ard Biesheuvel 
>
> Implement arch_get_random_seed_*() for ARM based on the firmware
> or hypervisor provided entropy source described in ARM DEN0098.
>
> This will make the kernel's random number generator consume entropy
> provided by this interface, at early boot, and periodically at
> runtime when reseeding.
>
> Cc: Linus Walleij 
> Cc: Russell King 
> Signed-off-by: Ard Biesheuvel 
> [Andre: rework to be initialised by the SMCCC firmware driver]
> Signed-off-by: Andre Przywara 
> Reviewed-by: Linus Walleij 

I think this one could be dropped into rmk's patch tracker now, right?


> ---
>  arch/arm/Kconfig  |  4 ++
>  arch/arm/include/asm/archrandom.h | 64 +++
>  2 files changed, 68 insertions(+)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 138248999df7..bfe642510b0a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1644,6 +1644,10 @@ config STACKPROTECTOR_PER_TASK
>   Enable this option to switch to a different method that uses a
>   different canary value for each task.
>
> +config ARCH_RANDOM
> +   def_bool y
> +   depends on HAVE_ARM_SMCCC_DISCOVERY
> +
>  endmenu
>
>  menu "Boot options"
> diff --git a/arch/arm/include/asm/archrandom.h 
> b/arch/arm/include/asm/archrandom.h
> index a8e84ca5c2ee..f3e96a5b65f8 100644
> --- a/arch/arm/include/asm/archrandom.h
> +++ b/arch/arm/include/asm/archrandom.h
> @@ -2,9 +2,73 @@
>  #ifndef _ASM_ARCHRANDOM_H
>  #define _ASM_ARCHRANDOM_H
>
> +#ifdef CONFIG_ARCH_RANDOM
> +
> +#include 
> +#include 
> +
> +#define ARM_SMCCC_TRNG_MIN_VERSION 0x1UL
> +
> +extern bool smccc_trng_available;
> +
> +static inline bool __init smccc_probe_trng(void)
> +{
> +   struct arm_smccc_res res;
> +
> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_VERSION, );
> +   if ((s32)res.a0 < 0)
> +   return false;
> +   if (res.a0 >= ARM_SMCCC_TRNG_MIN_VERSION) {
> +   /* double check that the 32-bit flavor is available */
> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_FEATURES,
> +ARM_SMCCC_TRNG_RND32,
> +);
> +   if ((s32)res.a0 >= 0)
> +   return true;
> +   }
> +
> +   return false;
> +}
> +
> +static inline bool __must_check arch_get_random_long(unsigned long *v)
> +{
> +   return false;
> +}
> +
> +static inline bool __must_check arch_get_random_int(unsigned int *v)
> +{
> +   return false;
> +}
> +
> +static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
> +{
> +   struct arm_smccc_res res;
> +
> +   if (smccc_trng_available) {
> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND32, 8 * sizeof(*v), 
> );
> +
> +   if (res.a0 != 0)
> +   return false;
> +
> +   *v = res.a3;
> +   return true;
> +   }
> +
> +   return false;
> +}
> +
> +static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
> +{
> +   return arch_get_random_seed_long((unsigned long *)v);
> +}
> +
> +
> +#else /* !CONFIG_ARCH_RANDOM */
> +
>  static inline bool __init smccc_probe_trng(void)
>  {
> return false;
>  }
>
> +#endif /* CONFIG_ARCH_RANDOM */
>  #endif /* _ASM_ARCHRANDOM_H */
> --
> 2.17.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Ard Biesheuvel
On Fri, 5 Mar 2021 at 12:38, Marc Zyngier  wrote:
>
> On Fri, 05 Mar 2021 02:38:17 +,
> Sami Tolvanen  wrote:
> >
> > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen  
> > wrote:
> > >
> > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
> > > >
> > > > On Thu, 04 Mar 2021 21:25:41 +,
> > > > Sami Tolvanen  wrote:
>
> [...]
>
> > > > > I assume hyp_panic() ends up being placed too far from __guest_enter()
> > > > > when the kernel is large enough. Possibly something to do with LLVM
> > > > > always splitting functions into separate sections with LTO. I'm not
> > > > > sure why the linker cannot shuffle things around to make everyone
> > > > > happy in this case, but I confirmed that this patch also fixes the
> > > > > build issue for me:
> > > > >
> > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
> > > > > b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > index af8e940d0f03..128197b7c794 100644
> > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, 
> > > > > u64 par)
> > > > >  }
> > > > >  NOKPROBE_SYMBOL(__hyp_call_panic);
> > > > >
> > > > > -void __noreturn hyp_panic(void)
> > > > > +void __noreturn hyp_panic(void) __section(".text")
> > > > >  {
> > > > > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > > > > u64 elr = read_sysreg_el2(SYS_ELR);
> > > > >
> > > >
> > > > We're getting into black-magic territory here. Why wouldn't hyp_panic
> > > > be in the .text section already?
> > >
> > > It's not quite black magic. LLVM essentially flips on
> > > -ffunction-sections with LTO and therefore, hyp_panic() will be in
> > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
> > > Everything ends up in .text when we link vmlinux, of course.
> > >
> > > $ readelf --sections vmlinux.o | grep hyp_panic
> > >   [3936] .text.hyp_panic   PROGBITS   004b56e4
> >
> > Note that disabling LTO here has essentially the same effect as using
> > __section(".text"). It stops the compiler from splitting these
> > functions into .text.* sections and makes it less likely that
> > hyp_panic() ends up too far away from __guest_enter().
> >
> > If neither of these workarounds sound appealing, I suppose we could
> > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts?
>
> That would be an actual fix instead of a workaround, as it would
> remove existing assumptions about the relative locations of the two
> objects. I guess you need to fix both instances with something such
> as:
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index b0afad7a99c6..a43e1f7ee354 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
>
> // If the hyp context is loaded, go straight to hyp_panic
> get_loaded_vcpu x0, x1
> -   cbz x0, hyp_panic
> -
> +   cbnzx0, 1f
> +   adr_l   x0, hyp_panic
> +   br  x0
> +1:

Agree with replacing the conditional branches that refer to external
symbols: the compiler never emits those, for the reason we are seeing
here, i.e., the range is simply insufficient.

But let's just use 'b hyp_panic' instead, no?



> // The hyp context is saved so make sure it is restored to allow
> // hyp_panic to run at hyp and, subsequently, panic to run in the 
> host.
> // This makes use of __guest_exit to avoid duplication but sets the
> @@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
> // current state is saved to the guest context but it will only be
> // accurate if the guest had been completely restored.
> adr_this_cpu x0, kvm_hyp_ctxt, x1
> -   adr x1, hyp_panic
> +   adr_l   x1, hyp_panic
> str x1, [x0, #CPU_XREG_OFFSET(30)]
>
> get_vcpu_ptrx1, x0
>
> which is completely untested. I wouldn't be surprised if there were
> more of these somewhere.
>

A quick grep gives me

$ objdump -r vmlinux.o |grep BR19
0005b6e0 R_AARCH64_CONDBR19  hyp_panic
00418e08 R_AARCH64_CONDBR19  __memcpy
00418e14 R_AARCH64_CONDBR19  __memcpy
3818 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3898 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3918 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3998 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3a18 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3a98 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3b18 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3b98 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3c10 R_AARCH64_CONDBR19  __kvm_nvhe___host_exit
3c1c R_AARCH64_CONDBR19  __kvm_nvhe___host_exit
64f0 R_AARCH64_CONDBR19  __kvm_nvhe_hyp_panic
078c 

Re: [PATCH v7 00/23] arm64: Early CPU feature override, and applications to VHE, BTI and PAuth

2021-02-08 Thread Ard Biesheuvel
On Mon, 8 Feb 2021 at 15:32, Will Deacon  wrote:
>
> Hi Marc,
>
> On Mon, Feb 08, 2021 at 09:57:09AM +, Marc Zyngier wrote:
> > It recently came to light that there is a need to be able to override
> > some CPU features very early on, before the kernel is fully up and
> > running. The reasons for this range from specific feature support
> > (such as using Protected KVM on VHE HW, which is the main motivation
> > for this work) to errata workaround (a feature is broken on a CPU and
> > needs to be turned off, or rather not enabled).
> >
> > This series tries to offer a limited framework for this kind of
> > problems, by allowing a set of options to be passed on the
> > command-line and altering the feature set that the cpufeature
> > subsystem exposes to the rest of the kernel. Note that this doesn't
> > change anything for code that directly uses the CPU ID registers.
>
> I applied this locally, but I'm seeing consistent boot failure under QEMU when
> KASAN is enabled. I tried sprinkling some __no_sanitize_address annotations
> around (see below) but it didn't help. The culprit appears to be
> early_fdt_map(), but looking a bit more closely, I'm really nervous about the
> way we call into C functions from __primary_switched. Remember -- this code
> runs _twice_ when KASLR is active: before and after the randomization. This
> also means that any memory writes the first time around can be lost due to
> the D-cache invalidation when (re-)creating the kernel page-tables.
>

Not just cache invalidation - BSS gets wiped again as well.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure

2021-01-25 Thread Ard Biesheuvel
On Mon, 25 Jan 2021 at 15:28, Marc Zyngier  wrote:
>
> On 2021-01-25 14:19, Ard Biesheuvel wrote:
> > On Mon, 25 Jan 2021 at 14:54, Marc Zyngier  wrote:
> >>
> >> On 2021-01-25 12:54, Ard Biesheuvel wrote:
>
> [...]
>
> >> > This struct now takes up
> >> > - ~100 bytes for the characters themselves (which btw are not emitted
> >> > into __initdata or __initconst)
> >> > - 6x8 bytes for the char pointers
> >> > - 6x24 bytes for the RELA relocations that annotate these pointers as
> >> > quantities that need to be relocated at boot (on a kernel built with
> >> > KASLR)
> >> >
> >> > I know it's only a drop in the ocean, but in this case, where the
> >> > struct is statically declared and defined only once, and in the same
> >> > place, we could easily turn this into
> >> >
> >> > static const struct {
> >> >char alias[24];
> >> >char param[20];
> >> > };
> >> >
> >> > and get rid of all the overhead. The only slightly annoying thing is
> >> > that the array sizes need to be kept in sync with the largest instance
> >> > appearing in the array, but this is easy when the struct type is
> >> > declared in the same place where its only instance is defined.
> >>
> >> Fair enough. I personally find the result butt-ugly, but I agree
> >> that it certainly saves some memory. Does the following work for
> >> you? I can even give symbolic names to the various constants (how
> >> generous of me! ;-).
> >>
> >
> > To be honest, I was anticipating more of a discussion, but this looks
> > reasonable to me.
>
> It looked like a reasonable ask: all the strings are completely useless
> once the kernel has booted, and I'm the first to moan that I can't boot
> an arm64 kernel with less than 60MB of RAM (OK, it's a pretty bloated
> kernel...).
>
> > Does 'charfeature[80];' really need 80 bytes though?
>
> It really needs 75 bytes, because of this:
>
> { "arm64.nopauth",
>   "id_aa64isar1.gpi=0 id_aa64isar1.gpa=0 "
>   "id_aa64isar1.api=0 id_aa64isar1.apa=0"  },
>
> 80 is a round enough number.
>

Fair enough. This will inflate the struct substantially, but at least
it's all __initconst data now, and it's all NUL bytes so it compresses
much better than the pointers and RELA entries.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v5 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure

2021-01-25 Thread Ard Biesheuvel
On Mon, 25 Jan 2021 at 14:54, Marc Zyngier  wrote:
>
> On 2021-01-25 12:54, Ard Biesheuvel wrote:
> > On Mon, 25 Jan 2021 at 11:53, Marc Zyngier  wrote:
> >>
> >> Given that the early cpufeature infrastructure has borrowed quite
> >> a lot of code from the kaslr implementation, let's reimplement
> >> the matching of the "nokaslr" option with it.
> >>
> >> Signed-off-by: Marc Zyngier 
> >> Acked-by: Catalin Marinas 
> >> Acked-by: David Brazdil 
> >> ---
> >>  arch/arm64/kernel/idreg-override.c | 15 +
> >>  arch/arm64/kernel/kaslr.c  | 36
> >> ++
> >>  2 files changed, 17 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/arch/arm64/kernel/idreg-override.c
> >> b/arch/arm64/kernel/idreg-override.c
> >> index cbb8eaa48742..3ccf51b84ba4 100644
> >> --- a/arch/arm64/kernel/idreg-override.c
> >> +++ b/arch/arm64/kernel/idreg-override.c
> >> @@ -31,8 +31,22 @@ static const struct ftr_set_desc mmfr1 __initdata =
> >> {
> >> },
> >>  };
> >>
> >> +extern struct arm64_ftr_override kaslr_feature_override;
> >> +
> >> +static const struct ftr_set_desc kaslr __initdata = {
> >
> > This should be __initconst not __initdata (below too)
> >
> >> +   .name   = "kaslr",
> >> +#ifdef CONFIG_RANDOMIZE_BASE
> >> +   .override   = _feature_override,
> >> +#endif
> >> +   .fields = {
> >> +   { "disabled", 0 },
> >> +   {}
> >> +   },
> >> +};
> >> +
> >>  static const struct ftr_set_desc * const regs[] __initdata = {
> >> ,
> >> +   ,
> >>  };
> >>
> >>  static const struct {
> >> @@ -41,6 +55,7 @@ static const struct {
> >>  } aliases[] __initdata = {
> >> { "kvm-arm.mode=nvhe",  "id_aa64mmfr1.vh=0" },
> >> { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
> >> +   { "nokaslr","kaslr.disabled=1" },
> >>  };
> >>
> >
> > This struct now takes up
> > - ~100 bytes for the characters themselves (which btw are not emitted
> > into __initdata or __initconst)
> > - 6x8 bytes for the char pointers
> > - 6x24 bytes for the RELA relocations that annotate these pointers as
> > quantities that need to be relocated at boot (on a kernel built with
> > KASLR)
> >
> > I know it's only a drop in the ocean, but in this case, where the
> > struct is statically declared and defined only once, and in the same
> > place, we could easily turn this into
> >
> > static const struct {
> >char alias[24];
> >char param[20];
> > };
> >
> > and get rid of all the overhead. The only slightly annoying thing is
> > that the array sizes need to be kept in sync with the largest instance
> > appearing in the array, but this is easy when the struct type is
> > declared in the same place where its only instance is defined.
>
> Fair enough. I personally find the result butt-ugly, but I agree
> that it certainly saves some memory. Does the following work for
> you? I can even give symbolic names to the various constants (how
> generous of me! ;-).
>

To be honest, I was anticipating more of a discussion, but this looks
reasonable to me. Does 'charfeature[80];' really need 80 bytes
though?

> diff --git a/arch/arm64/kernel/idreg-override.c
> b/arch/arm64/kernel/idreg-override.c
> index d1310438d95c..9e7043bdc808 100644
> --- a/arch/arm64/kernel/idreg-override.c
> +++ b/arch/arm64/kernel/idreg-override.c
> @@ -14,15 +14,15 @@
>   #include 
>
>   struct ftr_set_desc {
> -   const char  *name;
> +   charname[20];
> struct arm64_ftr_override   *override;
> struct {
> -   const char  *name;
> +   charname[20];
> u8  shift;
> }   fields[];
>   };
>
> -static const struct ftr_set_desc mmfr1 __initdata = {
> +static const struct ftr_set_desc mmfr1 __initconst = {
> .name   = "id_aa64mmfr1",
> .override   = _aa64mmfr1_override,
> .fields = {
> @@ -31,7 +31,7 @@ static const struct ftr_set_desc mmfr1 __initdata = {
> },
>   };
>
&g

Re: [PATCH v5 18/21] arm64: Move "nokaslr" over to the early cpufeature infrastructure

2021-01-25 Thread Ard Biesheuvel
On Mon, 25 Jan 2021 at 11:53, Marc Zyngier  wrote:
>
> Given that the early cpufeature infrastructure has borrowed quite
> a lot of code from the kaslr implementation, let's reimplement
> the matching of the "nokaslr" option with it.
>
> Signed-off-by: Marc Zyngier 
> Acked-by: Catalin Marinas 
> Acked-by: David Brazdil 
> ---
>  arch/arm64/kernel/idreg-override.c | 15 +
>  arch/arm64/kernel/kaslr.c  | 36 ++
>  2 files changed, 17 insertions(+), 34 deletions(-)
>
> diff --git a/arch/arm64/kernel/idreg-override.c 
> b/arch/arm64/kernel/idreg-override.c
> index cbb8eaa48742..3ccf51b84ba4 100644
> --- a/arch/arm64/kernel/idreg-override.c
> +++ b/arch/arm64/kernel/idreg-override.c
> @@ -31,8 +31,22 @@ static const struct ftr_set_desc mmfr1 __initdata = {
> },
>  };
>
> +extern struct arm64_ftr_override kaslr_feature_override;
> +
> +static const struct ftr_set_desc kaslr __initdata = {

This should be __initconst not __initdata (below too)

> +   .name   = "kaslr",
> +#ifdef CONFIG_RANDOMIZE_BASE
> +   .override   = _feature_override,
> +#endif
> +   .fields = {
> +   { "disabled", 0 },
> +   {}
> +   },
> +};
> +
>  static const struct ftr_set_desc * const regs[] __initdata = {
> ,
> +   ,
>  };
>
>  static const struct {
> @@ -41,6 +55,7 @@ static const struct {
>  } aliases[] __initdata = {
> { "kvm-arm.mode=nvhe",  "id_aa64mmfr1.vh=0" },
> { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" },
> +   { "nokaslr","kaslr.disabled=1" },
>  };
>

This struct now takes up
- ~100 bytes for the characters themselves (which btw are not emitted
into __initdata or __initconst)
- 6x8 bytes for the char pointers
- 6x24 bytes for the RELA relocations that annotate these pointers as
quantities that need to be relocated at boot (on a kernel built with
KASLR)

I know it's only a drop in the ocean, but in this case, where the
struct is statically declared and defined only once, and in the same
place, we could easily turn this into

static const struct {
   char alias[24];
   char param[20];
};

and get rid of all the overhead. The only slightly annoying thing is
that the array sizes need to be kept in sync with the largest instance
appearing in the array, but this is easy when the struct type is
declared in the same place where its only instance is defined.


>  static char *cmdline_contains_option(const char *cmdline, const char *option)
> diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
> index 5fc86e7d01a1..27f8939deb1b 100644
> --- a/arch/arm64/kernel/kaslr.c
> +++ b/arch/arm64/kernel/kaslr.c
> @@ -51,39 +51,7 @@ static __init u64 get_kaslr_seed(void *fdt)
> return ret;
>  }
>
> -static __init bool cmdline_contains_nokaslr(const u8 *cmdline)
> -{
> -   const u8 *str;
> -
> -   str = strstr(cmdline, "nokaslr");
> -   return str == cmdline || (str > cmdline && *(str - 1) == ' ');
> -}
> -
> -static __init bool is_kaslr_disabled_cmdline(void *fdt)
> -{
> -   if (!IS_ENABLED(CONFIG_CMDLINE_FORCE)) {
> -   int node;
> -   const u8 *prop;
> -
> -   node = fdt_path_offset(fdt, "/chosen");
> -   if (node < 0)
> -   goto out;
> -
> -   prop = fdt_getprop(fdt, node, "bootargs", NULL);
> -   if (!prop)
> -   goto out;
> -
> -   if (cmdline_contains_nokaslr(prop))
> -   return true;
> -
> -   if (IS_ENABLED(CONFIG_CMDLINE_EXTEND))
> -   goto out;
> -
> -   return false;
> -   }
> -out:
> -   return cmdline_contains_nokaslr(CONFIG_CMDLINE);
> -}
> +struct arm64_ftr_override kaslr_feature_override __initdata;
>
>  /*
>   * This routine will be executed with the kernel mapped at its default 
> virtual
> @@ -126,7 +94,7 @@ u64 __init kaslr_early_init(void)
>  * Check if 'nokaslr' appears on the command line, and
>  * return 0 if that is the case.
>  */
> -   if (is_kaslr_disabled_cmdline(fdt)) {
> +   if (kaslr_feature_override.val & kaslr_feature_override.mask & 0xf) {
> kaslr_status = KASLR_DISABLED_CMDLINE;
> return 0;
> }
> --
> 2.29.2
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 0/5] ARM: arm64: Add SMCCC TRNG entropy service

2021-01-20 Thread Ard Biesheuvel
On Wed, 20 Jan 2021 at 14:01, Will Deacon  wrote:
>
> On Wed, 6 Jan 2021 10:34:48 +, Andre Przywara wrote:
> > a fix to v5, now *really* fixing the wrong priority of SMCCC vs. RNDR
> > in arch_get_random_seed_long_early(). Apologies for messing this up
> > in v5 and thanks to broonie for being on the watch!
> >
> > Will, Catalin: it would be much appreciated if you could consider taking
> > patch 1/5. This contains the common definitions, and is a prerequisite
> > for every other patch, although they are somewhat independent and likely
> > will need to go through different subsystems.
> >
> > [...]
>
> Applied the first patch only to arm64 (for-next/rng), thanks!
>
> [1/5] firmware: smccc: Add SMCCC TRNG function call IDs
>   https://git.kernel.org/arm64/c/67c6bb56b649
>
> What's the plan for the rest of the series, and I think the related
> change over at [1]?
>

Given that Ted seems to have lost interest in /dev/random patches, I
was hoping [1] could be taken via the arm64 tree instead. Without this
patch, I don't think we should expose the SMCCC RNG interface via
arch_get_random_seed(), given how insanely often it will be called in
that case.

Note that the KVM patch implements the opposite end of this interface,
and is not affected by [1] at all, so that can be taken at any time.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/2] KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility

2021-01-15 Thread Ard Biesheuvel
On Fri, 15 Jan 2021 at 15:03, Marc Zyngier  wrote:
>
> It looks like we have broken firmware out there that wrongly advertises
> a GICv2 compatibility interface, despite the CPUs not being able to deal
> with it.
>
> To work around this, check that the CPU initialising KVM is actually able
> to switch to MMIO instead of system registers, and use that as a
> precondition to enable GICv2 compatibility in KVM.
>
> Note that the detection happens on a single CPU. If the firmware is
> lying *and* that the CPUs are asymetric, all hope is lost anyway.
>
> Reported-by: Shameerali Kolothum Thodi 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 35 +++--
>  arch/arm64/kvm/vgic/vgic-v3.c   |  8 ++--
>  2 files changed, 39 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index 005daa0c9dd7..ee3682b9873c 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -408,11 +408,42 @@ void __vgic_v3_init_lrs(void)
>  /*
>   * Return the GIC CPU configuration:
>   * - [31:0]  ICH_VTR_EL2
> - * - [63:32] RES0
> + * - [62:32] RES0
> + * - [63]MMIO (GICv2) capable
>   */
>  u64 __vgic_v3_get_gic_config(void)
>  {
> -   return read_gicreg(ICH_VTR_EL2);
> +   u64 val, sre = read_gicreg(ICC_SRE_EL1);
> +   unsigned long flags = 0;
> +
> +   /*
> +* To check whether we have a MMIO-based (GICv2 compatible)
> +* CPU interface, we need to disable the system register
> +* view. To do that safely, we have to prevent any interrupt
> +* from firing (which would be deadly).
> +*
> +* Note that this only makes sense on VHE, as interrupts are
> +* already masked for nVHE as part of the exception entry to
> +* EL2.
> +*/
> +   if (has_vhe())
> +   flags = local_daif_save();
> +
> +   write_gicreg(0, ICC_SRE_EL1);
> +   isb();
> +
> +   val = read_gicreg(ICC_SRE_EL1);
> +
> +   write_gicreg(sre, ICC_SRE_EL1);
> +   isb();
> +
> +   if (has_vhe())
> +   local_daif_restore(flags);
> +
> +   val  = (val & ICC_SRE_EL1_SRE) ? 0 : (1ULL << 63);
> +   val |= read_gicreg(ICH_VTR_EL2);
> +
> +   return val;
>  }
>
>  u64 __vgic_v3_read_vmcr(void)
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 8e7bf3151057..67b27b47312b 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -584,8 +584,10 @@ early_param("kvm-arm.vgic_v4_enable", 
> early_gicv4_enable);
>  int vgic_v3_probe(const struct gic_kvm_info *info)
>  {
> u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
> +   bool has_v2;
> int ret;
>
> +   has_v2 = ich_vtr_el2 >> 63;
> ich_vtr_el2 = (u32)ich_vtr_el2;
>
> /*
> @@ -605,13 +607,15 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
>  gicv4_enable ? "en" : "dis");
> }
>
> +   kvm_vgic_global_state.vcpu_base = 0;
> +
> if (!info->vcpu.start) {
> kvm_info("GICv3: no GICV resource entry\n");
> -   kvm_vgic_global_state.vcpu_base = 0;
> +   } else if (!has_v2) {
> +   pr_warn("CPU interface incapable of MMIO access\n");

Could we include FW_BUG here to stress that this is a firmware problem?

> } else if (!PAGE_ALIGNED(info->vcpu.start)) {
> pr_warn("GICV physical address 0x%llx not page aligned\n",
> (unsigned long long)info->vcpu.start);
> -   kvm_vgic_global_state.vcpu_base = 0;
> } else {
> kvm_vgic_global_state.vcpu_base = info->vcpu.start;
> kvm_vgic_global_state.can_emulate_gicv2 = true;
> --
> 2.29.2
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility

2021-01-08 Thread Ard Biesheuvel
On Fri, 8 Jan 2021 at 19:13, Marc Zyngier  wrote:
>
> On 2021-01-08 17:59, Ard Biesheuvel wrote:
> > On Fri, 8 Jan 2021 at 18:12, Marc Zyngier  wrote:
> >>
> >> It looks like we have broken firmware out there that wrongly
> >> advertises
> >> a GICv2 compatibility interface, despite the CPUs not being able to
> >> deal
> >> with it.
> >>
> >> To work around this, check that the CPU initialising KVM is actually
> >> able
> >> to switch to MMIO instead of system registers, and use that as a
> >> precondition to enable GICv2 compatibility in KVM.
> >>
> >> Note that the detection happens on a single CPU. If the firmware is
> >> lying *and* that the CPUs are asymetric, all hope is lost anyway.
> >>
> >> Reported-by: Shameerali Kolothum Thodi
> >> 
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 34
> >> +++--
> >>  arch/arm64/kvm/vgic/vgic-v3.c   |  8 ++--
> >>  2 files changed, 38 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> >> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> >> index 005daa0c9dd7..d504499ab917 100644
> >> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> >> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> >> @@ -408,11 +408,41 @@ void __vgic_v3_init_lrs(void)
> >>  /*
> >>   * Return the GIC CPU configuration:
> >>   * - [31:0]  ICH_VTR_EL2
> >> - * - [63:32] RES0
> >> + * - [62:32] RES0
> >> + * - [63]MMIO (GICv2) capable
> >>   */
> >>  u64 __vgic_v3_get_gic_config(void)
> >>  {
> >> -   return read_gicreg(ICH_VTR_EL2);
> >> +   u64 sre = read_gicreg(ICC_SRE_EL1);
> >> +   unsigned long flags = 0;
> >> +   bool v2_capable;
> >> +
> >> +   /*
> >> +* To check whether we have a MMIO-based (GICv2 compatible)
> >> +* CPU interface, we need to disable the system register
> >> +* view. To do that safely, we have to prevent any interrupt
> >> +* from firing (which would be deadly).
> >> +*
> >> +* Note that this only makes sense on VHE, as interrupts are
> >> +* already masked for nVHE as part of the exception entry to
> >> +* EL2.
> >> +*/
> >> +   if (has_vhe())
> >> +   flags = local_daif_save();
> >> +
> >> +   write_gicreg(0, ICC_SRE_EL1);
> >> +   isb();
> >> +
> >> +   v2_capable = !(read_gicreg(ICC_SRE_EL1) & ICC_SRE_EL1_SRE);
> >> +
> >> +   write_gicreg(sre, ICC_SRE_EL1);
> >> +   isb();
> >> +
> >> +   if (has_vhe())
> >> +   local_daif_restore(flags);
> >> +
> >> +   return (read_gicreg(ICH_VTR_EL2) |
> >> +   v2_capable ? (1ULL << 63) : 0);
> >>  }
> >>
> >
> > Is it necessary to perform this check unconditionally? We only care
> > about this if the firmware claims v2 compat support.
>
> Indeed. But this is done exactly once per boot, and I see it as
> a way to extract the CPU configuration more than anything else.
>
> Extracting it *only* when we have some v2 compat info would mean
> sharing that information with EL2 (in the nVHE case), and it felt
> more hassle than it is worth.
>
> Do you foresee any issue with this, other than the whole thing
> being disgusting (which I wilfully admit)?
>

No I don't think it's a problem per se. Just a bit disappointing that
every system will be burdened with this for as long as the last v2
compat capable system is still being supported.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility

2021-01-08 Thread Ard Biesheuvel
On Fri, 8 Jan 2021 at 18:12, Marc Zyngier  wrote:
>
> It looks like we have broken firmware out there that wrongly advertises
> a GICv2 compatibility interface, despite the CPUs not being able to deal
> with it.
>
> To work around this, check that the CPU initialising KVM is actually able
> to switch to MMIO instead of system registers, and use that as a
> precondition to enable GICv2 compatibility in KVM.
>
> Note that the detection happens on a single CPU. If the firmware is
> lying *and* that the CPUs are asymetric, all hope is lost anyway.
>
> Reported-by: Shameerali Kolothum Thodi 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 34 +++--
>  arch/arm64/kvm/vgic/vgic-v3.c   |  8 ++--
>  2 files changed, 38 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> index 005daa0c9dd7..d504499ab917 100644
> --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -408,11 +408,41 @@ void __vgic_v3_init_lrs(void)
>  /*
>   * Return the GIC CPU configuration:
>   * - [31:0]  ICH_VTR_EL2
> - * - [63:32] RES0
> + * - [62:32] RES0
> + * - [63]MMIO (GICv2) capable
>   */
>  u64 __vgic_v3_get_gic_config(void)
>  {
> -   return read_gicreg(ICH_VTR_EL2);
> +   u64 sre = read_gicreg(ICC_SRE_EL1);
> +   unsigned long flags = 0;
> +   bool v2_capable;
> +
> +   /*
> +* To check whether we have a MMIO-based (GICv2 compatible)
> +* CPU interface, we need to disable the system register
> +* view. To do that safely, we have to prevent any interrupt
> +* from firing (which would be deadly).
> +*
> +* Note that this only makes sense on VHE, as interrupts are
> +* already masked for nVHE as part of the exception entry to
> +* EL2.
> +*/
> +   if (has_vhe())
> +   flags = local_daif_save();
> +
> +   write_gicreg(0, ICC_SRE_EL1);
> +   isb();
> +
> +   v2_capable = !(read_gicreg(ICC_SRE_EL1) & ICC_SRE_EL1_SRE);
> +
> +   write_gicreg(sre, ICC_SRE_EL1);
> +   isb();
> +
> +   if (has_vhe())
> +   local_daif_restore(flags);
> +
> +   return (read_gicreg(ICH_VTR_EL2) |
> +   v2_capable ? (1ULL << 63) : 0);
>  }
>

Is it necessary to perform this check unconditionally? We only care
about this if the firmware claims v2 compat support.

>  u64 __vgic_v3_read_vmcr(void)
> diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
> index 8e7bf3151057..67b27b47312b 100644
> --- a/arch/arm64/kvm/vgic/vgic-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> @@ -584,8 +584,10 @@ early_param("kvm-arm.vgic_v4_enable", 
> early_gicv4_enable);
>  int vgic_v3_probe(const struct gic_kvm_info *info)
>  {
> u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
> +   bool has_v2;
> int ret;
>
> +   has_v2 = ich_vtr_el2 >> 63;
> ich_vtr_el2 = (u32)ich_vtr_el2;
>
> /*
> @@ -605,13 +607,15 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
>  gicv4_enable ? "en" : "dis");
> }
>
> +   kvm_vgic_global_state.vcpu_base = 0;
> +
> if (!info->vcpu.start) {
> kvm_info("GICv3: no GICV resource entry\n");
> -   kvm_vgic_global_state.vcpu_base = 0;
> +   } else if (!has_v2) {
> +   pr_warn("CPU interface incapable of MMIO access\n");
> } else if (!PAGE_ALIGNED(info->vcpu.start)) {
> pr_warn("GICV physical address 0x%llx not page aligned\n",
> (unsigned long long)info->vcpu.start);
> -   kvm_vgic_global_state.vcpu_base = 0;
> } else {
> kvm_vgic_global_state.vcpu_base = info->vcpu.start;
> kvm_vgic_global_state.can_emulate_gicv2 = true;
> --
> 2.29.2
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Work around broken GCC 4.9 handling of "S" constraint

2020-12-17 Thread Ard Biesheuvel
On Thu, 17 Dec 2020 at 12:11, Marc Zyngier  wrote:
>
> GCC 4.9 seems to have a problem with the "S" asm constraint
> when the symbol lives in the same compilation unit, and pretends
> the constraint is impossible:
>
> $ cat x.c
> void *foo(void)
> {
> static int x;
> int *addr;
> asm("adrp %0, %1" : "=r" (addr) : "S" ());
> return addr;
> }
>
> $ 
> ~/Work/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/bin/aarch64-linux-gnu-gcc
>  -S -x c -O2 x.c
> x.c: In function ‘foo’:
> x.c:5:2: error: impossible constraint in ‘asm’
>   asm("adrp %0, %1" : "=r" (addr) : "S" ());
>   ^
>
> Boo. Following revisions of the compiler work just fine, though.
>
> We can fallback to the "i" constraint for GCC version prior to 5.0,
> which *seems* to do the right thing. Hopefully we will be able to
> remove this at some point, but in the meantime this gets us going.
>
> Signed-off-by: Marc Zyngier 

Acked-by: Ard Biesheuvel 

> ---
> * From v1: Dropped the detection hack and rely on GCC_VERSION
>
>  arch/arm64/include/asm/kvm_asm.h | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 7ccf770c53d9..8a33d83ea843 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -199,6 +199,12 @@ extern void __vgic_v3_init_lrs(void);
>
>  extern u32 __kvm_get_mdcr_el2(void);
>
> +#if defined(GCC_VERSION) && GCC_VERSION < 5
> +#define SYM_CONSTRAINT "i"
> +#else
> +#define SYM_CONSTRAINT "S"
> +#endif
> +
>  /*
>   * Obtain the PC-relative address of a kernel symbol
>   * s: symbol
> @@ -215,7 +221,7 @@ extern u32 __kvm_get_mdcr_el2(void);
> typeof(s) *addr;\
> asm("adrp   %0, %1\n"   \
> "add%0, %0, :lo12:%1\n" \
> -   : "=r" (addr) : "S" ());  \
> +   : "=r" (addr) : SYM_CONSTRAINT ());   \
> addr;   \
> })
>
> --
> 2.29.2
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Work around broken GCC handling of "S" constraint

2020-12-07 Thread Ard Biesheuvel
On Mon, 7 Dec 2020 at 18:41, Marc Zyngier  wrote:
>
> On 2020-12-07 17:19, Ard Biesheuvel wrote:
> > (resend with David's email address fixed)
>
> Irk. Thanks for that.
>
> >> > +#ifdef CONFIG_CC_HAS_BROKEN_S_CONSTRAINT
> >> > +#define SYM_CONSTRAINT "i"
> >> > +#else
> >> > +#define SYM_CONSTRAINT "S"
> >> > +#endif
> >> > +
> >>
> >> Could we just check GCC_VERSION here?
>
> I guess we could. But I haven't investigated which exact range of
> compiler is broken (GCC 6.3 seems fixed, but that's the oldest
> I have apart from the offending 4.9).
>

I tried 5.4 on godbolt, and it seems happy. And the failure will be
obvious, so we can afford to get it slightly wrong and refine it
later.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Work around broken GCC handling of "S" constraint

2020-12-07 Thread Ard Biesheuvel
(resend with David's email address fixed)

On Mon, 7 Dec 2020 at 18:17, Ard Biesheuvel  wrote:
>
> On Mon, 7 Dec 2020 at 16:43, Marc Zyngier  wrote:
> >
> > GCC 4.9 seems to have a problem with the "S" asm constraint
> > when the symbol lives in the same compilation unit, and pretends
> > the constraint is impossible:
> >
> > $ cat x.c
> > void *foo(void)
> > {
> > static int x;
> > int *addr;
> > asm("adrp %0, %1" : "=r" (addr) : "S" ());
> > return addr;
> > }
> >
> > $ 
> > ~/Work/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/bin/aarch64-linux-gnu-gcc
> >  -S -x c -O2 x.c
> > x.c: In function ‘foo’:
> > x.c:5:2: error: impossible constraint in ‘asm’
> >   asm("adrp %0, %1" : "=r" (addr) : "S" ());
> >   ^
> >
> > Boo. Following revisions of the compiler work just fine, though.
> >
> > We can fallback to the "i" constraint in that case, which
> > *seems* to do the right thing. Hopefully we will be able to
> > remove this at some point, but in the meantime this gets us going.
> >
> > Signed-off-by: Marc Zyngier 
>
> > ---
> >  arch/arm64/Makefile  | 9 +
> >  arch/arm64/include/asm/kvm_asm.h | 8 +++-
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> > index 5789c2d18d43..c4ee8e64ad1a 100644
> > --- a/arch/arm64/Makefile
> > +++ b/arch/arm64/Makefile
> > @@ -44,12 +44,21 @@ cc_has_k_constraint := $(call try-run,echo  
> > \
> > return 0;   \
> > }' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
> >
> > +cc_has_broken_s_constraint := $(call try-run,echo  \
> > +   'void *foo(void) {  \
> > +   static int x;   \
> > +   int *addr;  \
> > +   asm("adrp %0, %1" : "=r" (addr) : "S" ());\
> > +   return addr;\
> > +   }' | $(CC) -S -x c -c -O2 -o "$$TMP" 
> > -,,-DCONFIG_CC_HAS_BROKEN_S_CONSTRAINT=1)
> > +
> >  ifeq ($(CONFIG_BROKEN_GAS_INST),y)
> >  $(warning Detected assembler with broken .inst; disassembly will be 
> > unreliable)
> >  endif
> >
> >  KBUILD_CFLAGS  += -mgeneral-regs-only  \
> >$(compat_vdso) $(cc_has_k_constraint)
> > +KBUILD_CFLAGS  += $(cc_has_broken_s_constraint)
> >  KBUILD_CFLAGS  += $(call cc-disable-warning, psabi)
> >  KBUILD_AFLAGS  += $(compat_vdso)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h 
> > b/arch/arm64/include/asm/kvm_asm.h
> > index 7ccf770c53d9..fa8e886998a3 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -199,6 +199,12 @@ extern void __vgic_v3_init_lrs(void);
> >
> >  extern u32 __kvm_get_mdcr_el2(void);
> >
> > +#ifdef CONFIG_CC_HAS_BROKEN_S_CONSTRAINT
> > +#define SYM_CONSTRAINT "i"
> > +#else
> > +#define SYM_CONSTRAINT "S"
> > +#endif
> > +
>
> Could we just check GCC_VERSION here?
>
> >  /*
> >   * Obtain the PC-relative address of a kernel symbol
> >   * s: symbol
> > @@ -215,7 +221,7 @@ extern u32 __kvm_get_mdcr_el2(void);
> > typeof(s) *addr;\
> > asm("adrp   %0, %1\n"   \
> > "add%0, %0, :lo12:%1\n" \
> > -   : "=r" (addr) : "S" ());  \
> > +   : "=r" (addr) : SYM_CONSTRAINT ());   \
> > addr;   \
> > })
> >
> > --
> > 2.29.2
> >
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: Work around broken GCC handling of "S" constraint

2020-12-07 Thread Ard Biesheuvel
On Mon, 7 Dec 2020 at 16:43, Marc Zyngier  wrote:
>
> GCC 4.9 seems to have a problem with the "S" asm constraint
> when the symbol lives in the same compilation unit, and pretends
> the constraint is impossible:
>
> $ cat x.c
> void *foo(void)
> {
> static int x;
> int *addr;
> asm("adrp %0, %1" : "=r" (addr) : "S" ());
> return addr;
> }
>
> $ 
> ~/Work/gcc-linaro-aarch64-linux-gnu-4.9-2014.09_linux/bin/aarch64-linux-gnu-gcc
>  -S -x c -O2 x.c
> x.c: In function ‘foo’:
> x.c:5:2: error: impossible constraint in ‘asm’
>   asm("adrp %0, %1" : "=r" (addr) : "S" ());
>   ^
>
> Boo. Following revisions of the compiler work just fine, though.
>
> We can fallback to the "i" constraint in that case, which
> *seems* to do the right thing. Hopefully we will be able to
> remove this at some point, but in the meantime this gets us going.
>
> Signed-off-by: Marc Zyngier 

> ---
>  arch/arm64/Makefile  | 9 +
>  arch/arm64/include/asm/kvm_asm.h | 8 +++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index 5789c2d18d43..c4ee8e64ad1a 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -44,12 +44,21 @@ cc_has_k_constraint := $(call try-run,echo
>   \
> return 0;   \
> }' | $(CC) -S -x c -o "$$TMP" -,,-DCONFIG_CC_HAS_K_CONSTRAINT=1)
>
> +cc_has_broken_s_constraint := $(call try-run,echo  \
> +   'void *foo(void) {  \
> +   static int x;   \
> +   int *addr;  \
> +   asm("adrp %0, %1" : "=r" (addr) : "S" ());\
> +   return addr;\
> +   }' | $(CC) -S -x c -c -O2 -o "$$TMP" 
> -,,-DCONFIG_CC_HAS_BROKEN_S_CONSTRAINT=1)
> +
>  ifeq ($(CONFIG_BROKEN_GAS_INST),y)
>  $(warning Detected assembler with broken .inst; disassembly will be 
> unreliable)
>  endif
>
>  KBUILD_CFLAGS  += -mgeneral-regs-only  \
>$(compat_vdso) $(cc_has_k_constraint)
> +KBUILD_CFLAGS  += $(cc_has_broken_s_constraint)
>  KBUILD_CFLAGS  += $(call cc-disable-warning, psabi)
>  KBUILD_AFLAGS  += $(compat_vdso)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 7ccf770c53d9..fa8e886998a3 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -199,6 +199,12 @@ extern void __vgic_v3_init_lrs(void);
>
>  extern u32 __kvm_get_mdcr_el2(void);
>
> +#ifdef CONFIG_CC_HAS_BROKEN_S_CONSTRAINT
> +#define SYM_CONSTRAINT "i"
> +#else
> +#define SYM_CONSTRAINT "S"
> +#endif
> +

Could we just check GCC_VERSION here?

>  /*
>   * Obtain the PC-relative address of a kernel symbol
>   * s: symbol
> @@ -215,7 +221,7 @@ extern u32 __kvm_get_mdcr_el2(void);
> typeof(s) *addr;\
> asm("adrp   %0, %1\n"   \
> "add%0, %0, :lo12:%1\n" \
> -   : "=r" (addr) : "S" ());  \
> +   : "=r" (addr) : SYM_CONSTRAINT ());   \
> addr;   \
> })
>
> --
> 2.29.2
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 6/6] kvm: arm64: Remove hyp_symbol_addr

2020-11-24 Thread Ard Biesheuvel
On Thu, 19 Nov 2020 at 17:26, David Brazdil  wrote:
>
> The helper was used to force PC-relative addressing in hyp code because
> absolute addressing via constant-pools used to generate kernel VAs. This
> was cumbersome and required programmers to remember to use the helper
> whenever they wanted to take a pointer.
>
> Now that hyp relocations are fixed up, there is no need for the helper
> any logner. Remove it.
>
> Signed-off-by: David Brazdil 

Acked-by: Ard Biesheuvel 

> ---
>  arch/arm64/include/asm/kvm_asm.h | 20 
>  arch/arm64/kvm/hyp/include/hyp/switch.h  |  4 ++--
>  arch/arm64/kvm/hyp/nvhe/hyp-smp.c|  4 ++--
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c |  4 ++--
>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  2 +-
>  5 files changed, 7 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 1a86581e581e..1961d23c0c40 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -203,26 +203,6 @@ extern void __vgic_v3_init_lrs(void);
>
>  extern u32 __kvm_get_mdcr_el2(void);
>
> -/*
> - * Obtain the PC-relative address of a kernel symbol
> - * s: symbol
> - *
> - * The goal of this macro is to return a symbol's address based on a
> - * PC-relative computation, as opposed to a loading the VA from a
> - * constant pool or something similar. This works well for HYP, as an
> - * absolute VA is guaranteed to be wrong. Only use this if trying to
> - * obtain the address of a symbol (i.e. not something you obtained by
> - * following a pointer).
> - */
> -#define hyp_symbol_addr(s) \
> -   ({  \
> -   typeof(s) *addr;\
> -   asm("adrp   %0, %1\n"   \
> -   "add%0, %0, :lo12:%1\n" \
> -   : "=r" (addr) : "S" ());  \
> -   addr;   \
> -   })
> -
>  #define __KVM_EXTABLE(from, to)  
>   \
> "   .pushsection__kvm_ex_table, \"a\"\n"\
> "   .align  3\n"\
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
> b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 84473574c2e7..54f4860cd87c 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -505,8 +505,8 @@ static inline void __kvm_unexpected_el2_exception(void)
> struct exception_table_entry *entry, *end;
> unsigned long elr_el2 = read_sysreg(elr_el2);
>
> -   entry = hyp_symbol_addr(__start___kvm_ex_table);
> -   end = hyp_symbol_addr(__stop___kvm_ex_table);
> +   entry = &__start___kvm_ex_table;
> +   end = &__stop___kvm_ex_table;
>
> while (entry < end) {
> addr = (unsigned long)>insn + entry->insn;
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c 
> b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
> index ceb427aabb91..6870d9f3d4b7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
> @@ -33,8 +33,8 @@ unsigned long __hyp_per_cpu_offset(unsigned int cpu)
> if (cpu >= ARRAY_SIZE(kvm_arm_hyp_percpu_base))
> hyp_panic();
>
> -   cpu_base_array = (unsigned 
> long*)hyp_symbol_addr(kvm_arm_hyp_percpu_base);
> +   cpu_base_array = (unsigned long*)(_arm_hyp_percpu_base[0]);
> this_cpu_base = kern_hyp_va(cpu_base_array[cpu]);
> -   elf_base = (unsigned long)hyp_symbol_addr(__per_cpu_start);
> +   elf_base = (unsigned long)&__per_cpu_start;
> return this_cpu_base - elf_base;
>  }
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c 
> b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index 313ef42f0eab..f64380a49a72 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -147,7 +147,7 @@ static int psci_cpu_suspend(u64 func_id, struct 
> kvm_cpu_context *host_ctxt)
>  * point if it is a deep sleep state.
>  */
> ret = psci_call(func_id, power_state,
> -   __hyp_pa(hyp_symbol_addr(__kvm_hyp_cpu_entry)),
> +   __hyp_pa(__kvm_hyp_cpu_entry),
> __hyp_pa(cpu_params));
>
> release_reset_state(cpu_state);
> @@ -182,7 +182,7 @@ static int psci_cpu_on(u64 fun

Re: [RFC PATCH 5/6] kvm: arm64: Fix constant-pool users in hyp

2020-11-24 Thread Ard Biesheuvel
On Thu, 19 Nov 2020 at 17:26, David Brazdil  wrote:
>
> Hyp code used to use absolute addressing via a constant pool to obtain
> the kernel VA of 3 symbols - panic, __hyp_panic_string and
> __kvm_handle_stub_hvc. This used to work because the kernel would
> relocate the addresses in the constant pool to kernel VA at boot and hyp
> would simply load them from there.
>
> Now that relocations are fixed up to point to hyp VAs, this does not
> work any longer. Rework the helpers to convert hyp VA to kernel VA / PA
> as needed.
>

Ok, so the reason for the problem is that the literal exists inside
the HYP text, and all literals are fixed up using the HYP mapping,
even if they don't point to something that is mapped at HYP. Would it
make sense to simply disregard literals that point outside of the HYP
VA mapping?

> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 29 +++--
>  arch/arm64/kvm/hyp/nvhe/host.S   | 29 +++--
>  2 files changed, 34 insertions(+), 24 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 8cb8974ec9cc..0676ff2105bb 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -72,9 +72,14 @@ alternative_cb kvm_update_va_mask
>  alternative_cb_end
>  .endm
>
> +.macro hyp_pa reg, tmp
> +   ldr_l   \tmp, hyp_physvirt_offset
> +   add \reg, \reg, \tmp
> +.endm
> +
>  /*
> - * Convert a kernel image address to a PA
> - * reg: kernel address to be converted in place
> + * Convert a hypervisor VA to a kernel image address
> + * reg: hypervisor address to be converted in place
>   * tmp: temporary register
>   *
>   * The actual code generation takes place in kvm_get_kimage_voffset, and
> @@ -82,18 +87,22 @@ alternative_cb_end
>   * perform the register allocation (kvm_get_kimage_voffset uses the
>   * specific registers encoded in the instructions).
>   */
> -.macro kimg_pa reg, tmp
> +.macro hyp_kimg reg, tmp
> +   /* Convert hyp VA -> PA. */
> +   hyp_pa  \reg, \tmp
> +
> +   /* Load kimage_voffset. */
>  alternative_cb kvm_get_kimage_voffset
> -   movz\tmp, #0
> -   movk\tmp, #0, lsl #16
> -   movk\tmp, #0, lsl #32
> -   movk\tmp, #0, lsl #48
> +   movz\tmp, #0
> +   movk\tmp, #0, lsl #16
> +   movk\tmp, #0, lsl #32
> +   movk\tmp, #0, lsl #48
>  alternative_cb_end
>
> -   /* reg = __pa(reg) */
> -   sub \reg, \reg, \tmp
> +   /* Convert PA -> kimg VA. */
> +   add \reg, \reg, \tmp
>  .endm
> -
> +
>  #else
>
>  #include 
> diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
> index 596dd5ae8e77..bcb80d525d8c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/host.S
> +++ b/arch/arm64/kvm/hyp/nvhe/host.S
> @@ -74,27 +74,28 @@ SYM_FUNC_END(__host_enter)
>   * void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 
> par);
>   */
>  SYM_FUNC_START(__hyp_do_panic)
> -   /* Load the format arguments into x1-7 */
> -   mov x6, x3
> -   get_vcpu_ptr x7, x3
> -
> -   mrs x3, esr_el2
> -   mrs x4, far_el2
> -   mrs x5, hpfar_el2
> -
> /* Prepare and exit to the host's panic funciton. */
> mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
>   PSR_MODE_EL1h)
> msr spsr_el2, lr
> ldr lr, =panic
> +   hyp_kimg lr, x6
> msr elr_el2, lr
>
> -   /*
> -* Set the panic format string and enter the host, conditionally
> -* restoring the host context.
> -*/
> +   /* Set the panic format string. Use the, now free, LR as scratch. */
> +   ldr lr, =__hyp_panic_string
> +   hyp_kimg lr, x6
> +
> +   /* Load the format arguments into x1-7. */
> +   mov x6, x3
> +   get_vcpu_ptr x7, x3
> +   mrs x3, esr_el2
> +   mrs x4, far_el2
> +   mrs x5, hpfar_el2
> +
> +   /* Enter the host, conditionally restoring the host context. */
> cmp x0, xzr
> -   ldr x0, =__hyp_panic_string
> +   mov x0, lr
> b.eq__host_enter_without_restoring
> b   __host_enter_for_panic
>  SYM_FUNC_END(__hyp_do_panic)
> @@ -124,7 +125,7 @@ SYM_FUNC_END(__hyp_do_panic)
>  * Preserve x0-x4, which may contain stub parameters.
>  */
> ldr x5, =__kvm_handle_stub_hvc
> -   kimg_pa x5, x6
> +   hyp_pa  x5, x6
> br  x5
>  .L__vect_end\@:
>  .if ((.L__vect_end\@ - .L__vect_start\@) > 0x80)
> --
> 2.29.2.299.gdc1121823c-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 4/6] kvm: arm64: Remove patching of fn pointers in hyp

2020-11-24 Thread Ard Biesheuvel
On Thu, 19 Nov 2020 at 17:25, David Brazdil  wrote:
>
> Taking a function pointer will now generate a R_AARCH64_RELATIVE that is
> fixed up at early boot. Remove the alternative-based mechanism for
> converting the address from a kernel VA.
>
> Signed-off-by: David Brazdil 

Acked-by: Ard Biesheuvel 

> ---
>  arch/arm64/include/asm/kvm_mmu.h   | 18 --
>  arch/arm64/kernel/image-vars.h |  1 -
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 ---
>  arch/arm64/kvm/va_layout.c |  6 --
>  4 files changed, 4 insertions(+), 32 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index e5226f7e4732..8cb8974ec9cc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -121,24 +121,6 @@ static __always_inline unsigned long 
> __kern_hyp_va(unsigned long v)
>
>  #define kern_hyp_va(v) ((typeof(v))(__kern_hyp_va((unsigned 
> long)(v
>
> -static __always_inline unsigned long __kimg_hyp_va(unsigned long v)
> -{
> -   unsigned long offset;
> -
> -   asm volatile(ALTERNATIVE_CB("movz %0, #0\n"
> -   "movk %0, #0, lsl #16\n"
> -   "movk %0, #0, lsl #32\n"
> -   "movk %0, #0, lsl #48\n",
> -   kvm_update_kimg_phys_offset)
> -: "=r" (offset));
> -
> -   return __kern_hyp_va((v - offset) | PAGE_OFFSET);
> -}
> -
> -#define kimg_fn_hyp_va(v)  ((typeof(*v))(__kimg_hyp_va((unsigned 
> long)(v
> -
> -#define kimg_fn_ptr(x) (typeof(x) **)(x)
> -
>  /*
>   * We currently support using a VM-specified IPA size. For backward
>   * compatibility, the default IPA size is fixed to 40bits.
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index 8539f34d7538..6379721236cf 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -64,7 +64,6 @@ __efistub__ctype  = _ctype;
>  /* Alternative callbacks for init-time patching of nVHE hyp code. */
>  KVM_NVHE_ALIAS(kvm_patch_vector_branch);
>  KVM_NVHE_ALIAS(kvm_update_va_mask);
> -KVM_NVHE_ALIAS(kvm_update_kimg_phys_offset);
>  KVM_NVHE_ALIAS(kvm_get_kimage_voffset);
>
>  /* Global kernel state accessed by nVHE hyp code. */
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index b3db5f4eea27..7998eff5f0a2 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -110,9 +110,9 @@ static void handle___vgic_v3_restore_aprs(struct 
> kvm_cpu_context *host_ctxt)
>
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>
> -#define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = kimg_fn_ptr(handle_##x)
> +#define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
>
> -static const hcall_t *host_hcall[] = {
> +static const hcall_t host_hcall[] = {
> HANDLE_FUNC(__kvm_vcpu_run),
> HANDLE_FUNC(__kvm_flush_vm_context),
> HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
> @@ -132,7 +132,6 @@ static const hcall_t *host_hcall[] = {
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
>  {
> DECLARE_REG(unsigned long, id, host_ctxt, 0);
> -   const hcall_t *kfn;
> hcall_t hfn;
>
> id -= KVM_HOST_SMCCC_ID(0);
> @@ -140,13 +139,11 @@ static void handle_host_hcall(struct kvm_cpu_context 
> *host_ctxt)
> if (unlikely(id >= ARRAY_SIZE(host_hcall)))
> goto inval;
>
> -   kfn = host_hcall[id];
> -   if (unlikely(!kfn))
> +   hfn = host_hcall[id];
> +   if (unlikely(!hfn))
> goto inval;
>
> cpu_reg(host_ctxt, 0) = SMCCC_RET_SUCCESS;
> -
> -   hfn = kimg_fn_hyp_va(kfn);
> hfn(host_ctxt);
>
> return;
> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> index 7f45a98eacfd..0494315f71f2 100644
> --- a/arch/arm64/kvm/va_layout.c
> +++ b/arch/arm64/kvm/va_layout.c
> @@ -373,12 +373,6 @@ static void generate_mov_q(u64 val, __le32 *origptr, 
> __le32 *updptr, int nr_inst
> *updptr++ = cpu_to_le32(insn);
>  }
>
> -void kvm_update_kimg_phys_offset(struct alt_instr *alt,
> -__le32 *origptr, __le32 *updptr, int nr_inst)
> -{
> -   generate_mov_q(kimage_voffset + PHYS_OFFSET, origptr, updptr, 
> nr_inst);
> -}
> -
>  void kvm_get_kimage_voffset(struct alt_instr *alt,
> __le32 *origptr, __le32 *updptr, int nr_inst)
>  {
> --
> 2.29.2.299.gdc1121823c-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 3/6] kvm: arm64: Fix up RELR relocation in hyp code/data

2020-11-24 Thread Ard Biesheuvel
On Thu, 19 Nov 2020 at 17:25, David Brazdil  wrote:
>
> The arm64 kernel also supports packing of relocation data using the RELR
> format. Implement a parser of RELR data and fixup the relocations using
> the same infra as RELA relocs.
>
> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/kvm/va_layout.c | 41 ++
>  1 file changed, 41 insertions(+)
>
> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> index b80fab974896..7f45a98eacfd 100644
> --- a/arch/arm64/kvm/va_layout.c
> +++ b/arch/arm64/kvm/va_layout.c
> @@ -145,6 +145,43 @@ static void __fixup_hyp_rela(void)
> __fixup_hyp_rel(rel[i].r_offset);
>  }
>
> +#ifdef CONFIG_RELR

Please prefer IS_ENABLED() [below] if the code in question can compile
(but perhaps not link) correctly when the symbol is not set.

> +static void __fixup_hyp_relr(void)

__init ?

> +{
> +   u64 *rel, *end;
> +
> +   rel = (u64*)(kimage_vaddr + __load_elf_u64(__relr_offset));
> +   end = rel + (__load_elf_u64(__relr_size) / sizeof(*rel));
> +

The reason for this little dance with the offset and size exists
because the initial relocation routine runs from the ID mapping, but
the relocation fixups are performed via the kernel's VA mapping, as
the ID mapping does not cover the entire image. So simple adrp/add
pairs aren't suitable there.

In this case (as well as in the previous patch, btw), that problem
does not exist, and so I think we should be able to simply define
start and end markers inside the .rela sections, and reference them
here as symbols with external linkage (which ensures that they are
referenced relatively, although you could add in a
__attribute__((visibility("hidden"))) for good measure)



> +   while (rel < end) {
> +   unsigned n;
> +   u64 addr = *(rel++);

Parens are redundant here (and below)

> +
> +   /* Address must not have the LSB set. */
> +   BUG_ON(addr & BIT(0));
> +
> +   /* Fix up the first address of the chain. */
> +   __fixup_hyp_rel(addr);
> +
> +   /*
> +* Loop over bitmaps, i.e. as long as words' LSB is 1.
> +* Each bit (ordered from LSB to MSB) represents one word from
> +* the last full address (exclusive). If the corresponding bit
> +* is 1, there is a relative relocation on that word.
> +*/
> +   for (n = 0; rel < end && (*rel & BIT(0)); n++) {
> +   unsigned i;
> +   u64 bitmap = *(rel++);
> +
> +   for (i = 1; i < 64; ++i) {
> +   if ((bitmap & BIT(i)))
> +   __fixup_hyp_rel(addr + 8 * (63 * n + 
> i));
> +   }
> +   }
> +   }
> +}
> +#endif
> +
>  /*
>   * The kernel relocated pointers to kernel VA. Iterate over relocations in
>   * the hypervisor ELF sections and convert them to hyp VA. This avoids the
> @@ -156,6 +193,10 @@ __init void kvm_fixup_hyp_relocations(void)
> return;
>
> __fixup_hyp_rela();
> +
> +#ifdef CONFIG_RELR
> +   __fixup_hyp_relr();
> +#endif
>  }
>
>  static u32 compute_instruction(int n, u32 rd, u32 rn)
> --
> 2.29.2.299.gdc1121823c-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH 2/6] kvm: arm64: Fix up RELA relocations in hyp code/data

2020-11-24 Thread Ard Biesheuvel
On Thu, 19 Nov 2020 at 17:25, David Brazdil  wrote:
>
> KVM nVHE code runs under a different VA mapping than the kernel, hence
> so far it relied only on PC-relative addressing to avoid accidentally
> using a relocated kernel VA from a constant pool (see hyp_symbol_addr).
>
> So as to reduce the possibility of a programmer error, fixup the
> relocated addresses instead. Let the kernel relocate them to kernel VA
> first, but then iterate over them again, filter those that point to hyp
> code/data and convert the kernel VA to hyp VA.
>
> This is done after kvm_compute_layout and before apply_alternatives.
>

If this is significant enough to call out, please include the reason for it.

> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  1 +
>  arch/arm64/kernel/smp.c  |  4 +-
>  arch/arm64/kvm/va_layout.c   | 76 
>  3 files changed, 80 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
> b/arch/arm64/include/asm/kvm_mmu.h
> index 5168a0c516ae..e5226f7e4732 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -105,6 +105,7 @@ alternative_cb_end
>  void kvm_update_va_mask(struct alt_instr *alt,
> __le32 *origptr, __le32 *updptr, int nr_inst);
>  void kvm_compute_layout(void);
> +void kvm_fixup_hyp_relocations(void);
>
>  static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  {
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 18e9727d3f64..30241afc2c93 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -434,8 +434,10 @@ static void __init hyp_mode_check(void)
>"CPU: CPUs started in inconsistent modes");
> else
> pr_info("CPU: All CPU(s) started at EL1\n");
> -   if (IS_ENABLED(CONFIG_KVM))
> +   if (IS_ENABLED(CONFIG_KVM)) {
> kvm_compute_layout();
> +   kvm_fixup_hyp_relocations();
> +   }
>  }
>
>  void __init smp_cpus_done(unsigned int max_cpus)
> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> index d8cc51bd60bf..b80fab974896 100644
> --- a/arch/arm64/kvm/va_layout.c
> +++ b/arch/arm64/kvm/va_layout.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -82,6 +83,81 @@ __init void kvm_compute_layout(void)
> init_hyp_physvirt_offset();
>  }
>
> +#define __load_elf_u64(s)  \
> +   ({  \
> +   extern u64 s;   \
> +   u64 val;\
> +   \
> +   asm ("ldr %0, =%1" : "=r"(val) : "S"());  \
> +   val;\
> +   })
> +

Do you need this to ensure that the reference is absolute? There may
be more elegant ways to achieve that, using weak references for
instance.

Also, in the relocation startup code, I deliberately used a 32-bit
quantity here, as it won't get confused for an absolute virtual
address that needs relocation.


> +static bool __is_within_bounds(u64 addr, char *start, char *end)
> +{
> +   return start <= (char*)addr && (char*)addr < end;
> +}
> +
> +static bool __is_in_hyp_section(u64 addr)
> +{
> +   return __is_within_bounds(addr, __hyp_text_start, __hyp_text_end) ||
> +  __is_within_bounds(addr, __hyp_rodata_start, __hyp_rodata_end) 
> ||
> +  __is_within_bounds(addr,
> + CHOOSE_NVHE_SYM(__per_cpu_start),
> + CHOOSE_NVHE_SYM(__per_cpu_end));
> +}
> +

It is slightly disappointing that we need to filter these one by one
like this, but I don't think there are any guarantees about the order
in which the R_AARCH64_RELATIVE entries appear.

> +static void __fixup_hyp_rel(u64 addr)

__init ?

> +{
> +   u64 *ptr, kern_va, hyp_va;
> +
> +   /* Adjust the relocation address taken from ELF for KASLR. */
> +   addr += kaslr_offset();
> +
> +   /* Skip addresses not in any of the hyp sections. */
> +   if (!__is_in_hyp_section(addr))
> +   return;
> +
> +   /* Get the LM alias of the relocation address. */
> +   ptr = (u64*)kvm_ksym_ref((void*)addr);
> +
> +   /*
> +* Read the value at the relocation address. It has already been
> +* relocated to the actual kernel kimg VA.
> +*/
> +   kern_va = (u64)kvm_ksym_ref((void*)*ptr);
> +
> +   /* Convert to hyp VA. */
> +   hyp_va = __early_kern_hyp_va(kern_va);
> +
> +   /* Store hyp VA at the relocation address. */
> +   *ptr = __early_kern_hyp_va(kern_va);
> +}
> +
> +static void __fixup_hyp_rela(void)

__init ?

> +{
> +   Elf64_Rela *rel;
> +   size_t i, n;
> +
> +  

Re: [RFC PATCH 1/6] kvm: arm64: Set up .hyp.rodata ELF section

2020-11-24 Thread Ard Biesheuvel
On Thu, 19 Nov 2020 at 17:25, David Brazdil  wrote:
>
> We will need to recognize pointers in .rodata specific to hyp,

Why?

> so
> establish a .hyp.rodata ELF section. Merge it with the existing
> .hyp.data..ro_after_init as they are treated the same at runtime.
>

Does this mean HYP .text, .rodata etc are all writable some time after
the kernel .text/.rodata have been mapped read-only? That is not a
problem per se, but it deserves being called out.


> Signed-off-by: David Brazdil 
> ---
>  arch/arm64/include/asm/sections.h | 2 +-
>  arch/arm64/kernel/vmlinux.lds.S   | 7 ---
>  arch/arm64/kvm/arm.c  | 7 +++
>  arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
>  4 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/include/asm/sections.h 
> b/arch/arm64/include/asm/sections.h
> index 8ff579361731..a6f3557d1ab2 100644
> --- a/arch/arm64/include/asm/sections.h
> +++ b/arch/arm64/include/asm/sections.h
> @@ -11,7 +11,7 @@ extern char __alt_instructions[], __alt_instructions_end[];
>  extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[];
>  extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
>  extern char __hyp_text_start[], __hyp_text_end[];
> -extern char __hyp_data_ro_after_init_start[], __hyp_data_ro_after_init_end[];
> +extern char __hyp_rodata_start[], __hyp_rodata_end[];
>  extern char __idmap_text_start[], __idmap_text_end[];
>  extern char __initdata_begin[], __initdata_end[];
>  extern char __inittext_begin[], __inittext_end[];
> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> index 4382b5d0645d..6f2fd9734d63 100644
> --- a/arch/arm64/kernel/vmlinux.lds.S
> +++ b/arch/arm64/kernel/vmlinux.lds.S
> @@ -31,10 +31,11 @@ jiffies = jiffies_64;
> __stop___kvm_ex_table = .;
>
>  #define HYPERVISOR_DATA_SECTIONS   \
> -   HYP_SECTION_NAME(.data..ro_after_init) : {  \
> -   __hyp_data_ro_after_init_start = .; \
> +   HYP_SECTION_NAME(.rodata) : {   \
> +   __hyp_rodata_start = .; \
> *(HYP_SECTION_NAME(.data..ro_after_init))   \
> -   __hyp_data_ro_after_init_end = .;   \
> +   *(HYP_SECTION_NAME(.rodata))\
> +   __hyp_rodata_end = .;   \
> }
>
>  #define HYPERVISOR_PERCPU_SECTION  \
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d6d5211653b7..119c97e8900a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1688,11 +1688,10 @@ static int init_hyp_mode(void)
> goto out_err;
> }
>
> -   err = 
> create_hyp_mappings(kvm_ksym_ref(__hyp_data_ro_after_init_start),
> - kvm_ksym_ref(__hyp_data_ro_after_init_end),
> - PAGE_HYP_RO);
> +   err = create_hyp_mappings(kvm_ksym_ref(__hyp_rodata_start),
> + kvm_ksym_ref(__hyp_rodata_end), 
> PAGE_HYP_RO);
> if (err) {
> -   kvm_err("Cannot map .hyp.data..ro_after_init section\n");
> +   kvm_err("Cannot map .hyp.rodata section\n");
> goto out_err;
> }
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S 
> b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> index 5d76ff2ba63e..b0789183d49d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp.lds.S
> @@ -17,4 +17,5 @@ SECTIONS {
> PERCPU_INPUT(L1_CACHE_BYTES)
> }
> HYP_SECTION(.data..ro_after_init)
> +   HYP_SECTION(.rodata)
>  }
> --
> 2.29.2.299.gdc1121823c-goog
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-20 Thread Ard Biesheuvel
On Fri, 20 Nov 2020 at 11:52, André Przywara  wrote:
>
> On 19/11/2020 13:41, Ard Biesheuvel wrote:
>
> Hi,
>
> > On Fri, 13 Nov 2020 at 19:24, Andre Przywara  wrote:
> >>
> >> The ARM architected TRNG firmware interface, described in ARM spec
> >> DEN0098, defines an ARM SMCCC based interface to a true random number
> >> generator, provided by firmware.
> >> This can be discovered via the SMCCC >=v1.1 interface, and provides
> >> up to 192 bits of entropy per call.
> >>
> >> Hook this SMC call into arm64's arch_get_random_*() implementation,
> >> coming to the rescue when the CPU does not implement the ARM v8.5 RNG
> >> system registers.
> >>
> >> For the detection, we piggy back on the PSCI/SMCCC discovery (which gives
> >> us the conduit to use (hvc/smc)), then try to call the
> >> ARM_SMCCC_TRNG_VERSION function, which returns -1 if this interface is
> >> not implemented.
> >>
> >> Signed-off-by: Andre Przywara 
> >> ---
> >>  arch/arm64/include/asm/archrandom.h | 69 -
> >>  1 file changed, 58 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/archrandom.h 
> >> b/arch/arm64/include/asm/archrandom.h
> >> index abe07c21da8e..fe34bfd30caa 100644
> >> --- a/arch/arm64/include/asm/archrandom.h
> >> +++ b/arch/arm64/include/asm/archrandom.h
> >> @@ -4,13 +4,24 @@
> >>
> >>  #ifdef CONFIG_ARCH_RANDOM
> >>
> >> +#include 
> >>  #include 
> >>  #include 
> >>  #include 
> >>
> >> +#define ARM_SMCCC_TRNG_MIN_VERSION 0x1UL
> >> +
> >> +extern bool smccc_trng_available;
> >> +
> >>  static inline bool __init smccc_probe_trng(void)
> >>  {
> >> -   return false;
> >> +   struct arm_smccc_res res;
> >> +
> >> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_VERSION, );
> >> +   if ((s32)res.a0 < 0)
> >> +   return false;
> >> +
> >> +   return res.a0 >= ARM_SMCCC_TRNG_MIN_VERSION;
> >>  }
> >>
> >>  static inline bool __arm64_rndr(unsigned long *v)
> >> @@ -43,26 +54,52 @@ static inline bool __must_check 
> >> arch_get_random_int(unsigned int *v)
> >>
> >>  static inline bool __must_check arch_get_random_seed_long(unsigned long 
> >> *v)
> >>  {
> >> +   struct arm_smccc_res res;
> >> +
> >> /*
> >>  * Only support the generic interface after we have detected
> >>  * the system wide capability, avoiding complexity with the
> >>  * cpufeature code and with potential scheduling between CPUs
> >>  * with and without the feature.
> >>  */
> >> -   if (!cpus_have_const_cap(ARM64_HAS_RNG))
> >> -   return false;
> >> +   if (cpus_have_const_cap(ARM64_HAS_RNG))
> >> +   return __arm64_rndr(v);
> >>
> >> -   return __arm64_rndr(v);
> >> -}
> >> +   if (smccc_trng_available) {
> >> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 64, );
> >> +   if ((int)res.a0 < 0)
> >> +   return false;
> >>
> >> +   *v = res.a3;
> >> +   return true;
> >> +   }
> >> +
> >> +   return false;
> >> +}
> >>
> >
> > I think we should be more rigorous here in how we map the concepts of
> > random seeds and random numbers onto the various sources.
> >
> > First of all, assuming my patch dropping the call to
> > arch_get_random_seed_long() from add_interrupt_randomness() gets
> > accepted, we should switch to RNDRRS here, and implement the non-seed
> > variants using RNDR.
>
> I agree (and have a patch ready), but that seems independent from this
> series.
>

Well, it will conflict, but other than that, I agree it is orthogonal.

> > However, this is still semantically inaccurate: RNDRRS does not return
> > a random *seed*, it returns a number drawn from a freshly seeded
> > pseudo-random sequence. This means that the TRNG interface, if
> > implemented, is a better choice, and so we should try it first. Note
> > that on platforms that don't implement both, only one of these will be
> > available in the first place. But on platforms that *do* implement
> > both, the firmware interface may actually be less wasteful 

Re: [PATCH v3 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-19 Thread Ard Biesheuvel
On Fri, 13 Nov 2020 at 19:24, Andre Przywara  wrote:
>
> The ARM architected TRNG firmware interface, described in ARM spec
> DEN0098, defines an ARM SMCCC based interface to a true random number
> generator, provided by firmware.
> This can be discovered via the SMCCC >=v1.1 interface, and provides
> up to 192 bits of entropy per call.
>
> Hook this SMC call into arm64's arch_get_random_*() implementation,
> coming to the rescue when the CPU does not implement the ARM v8.5 RNG
> system registers.
>
> For the detection, we piggy back on the PSCI/SMCCC discovery (which gives
> us the conduit to use (hvc/smc)), then try to call the
> ARM_SMCCC_TRNG_VERSION function, which returns -1 if this interface is
> not implemented.
>
> Signed-off-by: Andre Przywara 
> ---
>  arch/arm64/include/asm/archrandom.h | 69 -
>  1 file changed, 58 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/include/asm/archrandom.h 
> b/arch/arm64/include/asm/archrandom.h
> index abe07c21da8e..fe34bfd30caa 100644
> --- a/arch/arm64/include/asm/archrandom.h
> +++ b/arch/arm64/include/asm/archrandom.h
> @@ -4,13 +4,24 @@
>
>  #ifdef CONFIG_ARCH_RANDOM
>
> +#include 
>  #include 
>  #include 
>  #include 
>
> +#define ARM_SMCCC_TRNG_MIN_VERSION 0x1UL
> +
> +extern bool smccc_trng_available;
> +
>  static inline bool __init smccc_probe_trng(void)
>  {
> -   return false;
> +   struct arm_smccc_res res;
> +
> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_VERSION, );
> +   if ((s32)res.a0 < 0)
> +   return false;
> +
> +   return res.a0 >= ARM_SMCCC_TRNG_MIN_VERSION;
>  }
>
>  static inline bool __arm64_rndr(unsigned long *v)
> @@ -43,26 +54,52 @@ static inline bool __must_check 
> arch_get_random_int(unsigned int *v)
>
>  static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
>  {
> +   struct arm_smccc_res res;
> +
> /*
>  * Only support the generic interface after we have detected
>  * the system wide capability, avoiding complexity with the
>  * cpufeature code and with potential scheduling between CPUs
>  * with and without the feature.
>  */
> -   if (!cpus_have_const_cap(ARM64_HAS_RNG))
> -   return false;
> +   if (cpus_have_const_cap(ARM64_HAS_RNG))
> +   return __arm64_rndr(v);
>
> -   return __arm64_rndr(v);
> -}
> +   if (smccc_trng_available) {
> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 64, );
> +   if ((int)res.a0 < 0)
> +   return false;
>
> +   *v = res.a3;
> +   return true;
> +   }
> +
> +   return false;
> +}
>

I think we should be more rigorous here in how we map the concepts of
random seeds and random numbers onto the various sources.

First of all, assuming my patch dropping the call to
arch_get_random_seed_long() from add_interrupt_randomness() gets
accepted, we should switch to RNDRRS here, and implement the non-seed
variants using RNDR.

However, this is still semantically inaccurate: RNDRRS does not return
a random *seed*, it returns a number drawn from a freshly seeded
pseudo-random sequence. This means that the TRNG interface, if
implemented, is a better choice, and so we should try it first. Note
that on platforms that don't implement both, only one of these will be
available in the first place. But on platforms that *do* implement
both, the firmware interface may actually be less wasteful in terms of
resources: the TRNG interface returns every bit drawn from the
underlying entropy source, whereas RNDRRS uses ~500 bits of entropy to
reseed a DRBG that gets used only once to draw a single 64-bit number.
And the cost of the SMCCC call in terms of CPU time is charged to the
caller, which is appropriate here.

Then, I don't think we should ever return false without even trying if
RNDRRS is available if the SMCCC invocation fails.

Something like this perhaps?

if (smccc_trng_available) {
  arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 64, );
  if ((int)res.a0 >= 0) {
*v = res.a3;
return true;
  }
}

if (cpus_have_const_cap(ARM64_HAS_RNG))
   return __arm64_rndrrs(v);

return false;

(and something similar 2x below)


>  static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
>  {
> +   struct arm_smccc_res res;
> unsigned long val;
> -   bool ok = arch_get_random_seed_long();
>
> -   *v = val;
> -   return ok;
> +   if (cpus_have_const_cap(ARM64_HAS_RNG)) {
> +   if (arch_get_random_seed_long()) {
> +   *v = val;
> +   return true;
> +   }
> +   return false;
> +   }
> +
> +   if (smccc_trng_available) {
> +   arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND64, 32, );
> +   if ((int)res.a0 < 0)
> +   return false;
> +
> +   *v = res.a3 & GENMASK(31, 0);
> +   return 

Re: [PATCH v3 0/5] ARM: arm64: Add SMCCC TRNG entropy service

2020-11-13 Thread Ard Biesheuvel
On Fri, 13 Nov 2020 at 19:24, Andre Przywara  wrote:
>
> Hi,
>
> an update to v2 with some fixes and a few tweaks. Ard's patch [1] should
> significantly reduce the frequency of arch_get_random_seed_long() calls,
> not sure if that is enough the appease the concerns about the
> potentially long latency of SMC calls. I also dropped the direct
> arch_get_random() call in KVM for the same reason. An alternative could
> be to just use the SMC in the _early() versions, but then we would lose
> the SMCCC entropy source for the periodic reseeds. This could be mitigated
> by using a hwrng driver [2] and rngd.
> The only other non-minor change to v2 is the addition of using the SMCCC
> call in the _early() variant. For a changelog see below.
>
> Sudeep: patch 1/5 is a prerequisite for all other patches, which
> themselves could be considered separate and need to go via different trees.
> If we could agree on that one now and get that merged, it would help the
> handling of the other patches going forward.
>
> Cheers,
> Andre
> ==
>
> The ARM architected TRNG firmware interface, described in ARM spec
> DEN0098[3], defines an ARM SMCCC based interface to a true random number
> generator, provided by firmware.
>
> This series collects all the patches implementing this in various
> places: as a user feeding into the ARCH_RANDOM pool, both for ARM and
> arm64, and as a service provider for KVM guests.
>
> Patch 1 introduces the interface definition used by all three entities.
> Patch 2 prepares the Arm SMCCC firmware driver to probe for the
> interface. This patch is needed to avoid a later dependency on *two*
> patches (there might be a better solution to this problem).
>
> Patch 3 implements the ARM part, patch 4 is the arm64 version.
> The final patch 5 adds support to provide random numbers to KVM guests.
>
> This was tested on:
> - QEMU -kernel (no SMCCC, regression test)
> - Juno w/ prototype of the h/w Trusted RNG support
> - mainline KVM (SMCCC, but no TRNG: regression test)
> - ARM and arm64 KVM guests, using the KVM service in patch 5/5
>
> Based on v5.10-rc3, please let me know if I should rebased on something
> else. A git repo is accessible at:
> https://gitlab.arm.com/linux-arm/linux-ap/-/commits/smccc-trng/v3/
>
> Cheers,
> Andre
>
> [1] 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2020-November/615446.html
> [2] https://gitlab.arm.com/linux-arm/linux-ap/-/commit/87e3722f437
> [3] https://developer.arm.com/documentation/den0098/latest/
>
> Changelog v2 ... v3:
> - ARM: fix compilation with randconfig
> - arm64: use SMCCC call also in arch_get_random_seed_long_early()
> - KVM: comment on return value usage
> - KVM: use more interesting UUID (enjoy, Marc!)

UUIDs are constructed using certain rules, so probably better to
refrain from playing games with them here.

If Marc wants an easter egg, he will have to wait until Easter.

> - KVM: use bitmaps instead of open coded long arrays
> - KVM: drop direct usage of arch_get_random() interface
>
> Changelog "v1" ... v2:
> - trigger ARCH_RANDOM initialisation from the SMCCC firmware driver
> - use a single bool in smccc.c to hold the initialisation state for arm64
> - handle endianess correctly in the KVM provider
>
> Andre Przywara (2):
>   firmware: smccc: Introduce SMCCC TRNG framework
>   arm64: Add support for SMCCC TRNG entropy source
>
> Ard Biesheuvel (3):
>   firmware: smccc: Add SMCCC TRNG function call IDs
>   ARM: implement support for SMCCC TRNG entropy source
>   KVM: arm64: implement the TRNG hypervisor call
>
>  arch/arm/Kconfig|  4 ++
>  arch/arm/include/asm/archrandom.h   | 74 +
>  arch/arm64/include/asm/archrandom.h | 79 +++
>  arch/arm64/include/asm/kvm_host.h   |  2 +
>  arch/arm64/kvm/Makefile |  2 +-
>  arch/arm64/kvm/hypercalls.c |  6 ++
>  arch/arm64/kvm/trng.c   | 85 +
>  drivers/firmware/smccc/smccc.c  |  5 ++
>  include/linux/arm-smccc.h   | 31 +++
>  9 files changed, 277 insertions(+), 11 deletions(-)
>  create mode 100644 arch/arm/include/asm/archrandom.h
>  create mode 100644 arch/arm64/kvm/trng.c
>
> --
> 2.17.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 3/5] ARM: implement support for SMCCC TRNG entropy source

2020-11-06 Thread Ard Biesheuvel
On Fri, 6 Nov 2020 at 16:30, Marc Zyngier  wrote:
>
> On 2020-11-05 12:56, Andre Przywara wrote:
> > From: Ard Biesheuvel 
> >
> > Implement arch_get_random_seed_*() for ARM based on the firmware
> > or hypervisor provided entropy source described in ARM DEN0098.
> >
> > This will make the kernel's random number generator consume entropy
> > provided by this interface, at early boot, and periodically at
> > runtime when reseeding.
> >
> > Cc: Linus Walleij 
> > Cc: Russell King 
> > Signed-off-by: Ard Biesheuvel 
> > [Andre: rework to be initialised by the SMCCC firmware driver]
> > Signed-off-by: Andre Przywara 
> > ---
> >  arch/arm/Kconfig  |  4 ++
> >  arch/arm/include/asm/archrandom.h | 64 +++
> >  2 files changed, 68 insertions(+)
> >
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index fe2f17eb2b50..06fda4f954fd 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -1667,6 +1667,10 @@ config STACKPROTECTOR_PER_TASK
> > Enable this option to switch to a different method that uses a
> > different canary value for each task.
> >
> > +config ARCH_RANDOM
> > + def_bool y
> > + depends on HAVE_ARM_SMCCC
> > +
> >  endmenu
> >
> >  menu "Boot options"
> > diff --git a/arch/arm/include/asm/archrandom.h
> > b/arch/arm/include/asm/archrandom.h
> > index a8e84ca5c2ee..f3e96a5b65f8 100644
> > --- a/arch/arm/include/asm/archrandom.h
> > +++ b/arch/arm/include/asm/archrandom.h
> > @@ -2,9 +2,73 @@
> >  #ifndef _ASM_ARCHRANDOM_H
> >  #define _ASM_ARCHRANDOM_H
> >
> > +#ifdef CONFIG_ARCH_RANDOM
> > +
> > +#include 
> > +#include 
> > +
> > +#define ARM_SMCCC_TRNG_MIN_VERSION 0x1UL
> > +
> > +extern bool smccc_trng_available;
> > +
> > +static inline bool __init smccc_probe_trng(void)
> > +{
> > + struct arm_smccc_res res;
> > +
> > + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_VERSION, );
> > + if ((s32)res.a0 < 0)
> > + return false;
> > + if (res.a0 >= ARM_SMCCC_TRNG_MIN_VERSION) {
> > + /* double check that the 32-bit flavor is available */
> > + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_FEATURES,
> > +  ARM_SMCCC_TRNG_RND32,
> > +  );
> > + if ((s32)res.a0 >= 0)
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +static inline bool __must_check arch_get_random_long(unsigned long *v)
> > +{
> > + return false;
> > +}
> > +
> > +static inline bool __must_check arch_get_random_int(unsigned int *v)
> > +{
> > + return false;
> > +}
> > +
> > +static inline bool __must_check arch_get_random_seed_long(unsigned
> > long *v)
> > +{
> > + struct arm_smccc_res res;
> > +
> > + if (smccc_trng_available) {
> > + arm_smccc_1_1_invoke(ARM_SMCCC_TRNG_RND32, 8 * sizeof(*v), 
> > );
> > +
> > + if (res.a0 != 0)
> > + return false;
> > +
> > + *v = res.a3;
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> > +
> > +static inline bool __must_check arch_get_random_seed_int(unsigned int
> > *v)
> > +{
> > + return arch_get_random_seed_long((unsigned long *)v);
>
> I don't think this cast is safe. At least not on 64bit.

True, but this is arch/arm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 5/5] KVM: arm64: implement the TRNG hypervisor call

2020-11-05 Thread Ard Biesheuvel
On Thu, 5 Nov 2020 at 15:13, Marc Zyngier  wrote:
>
> On 2020-11-05 12:56, Andre Przywara wrote:
> > From: Ard Biesheuvel 
> >
> > Provide a hypervisor implementation of the ARM architected TRNG
> > firmware
> > interface described in ARM spec DEN0098. All function IDs are
> > implemented,
> > including both 32-bit and 64-bit versions of the TRNG_RND service,
> > which
> > is the centerpiece of the API.
> >
> > The API is backed by arch_get_unsigned_seed_long(), which is
> > implemented
> > in terms of RNDRRS currently, and will be alternatively backed by a SMC
> > call to the secure firmware using same interface after a future patch.
> > If neither are available, the kernel's entropy pool is used instead.
> >
> > Signed-off-by: Ard Biesheuvel 
> > Signed-off-by: Andre Przywara 
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  2 +
> >  arch/arm64/kvm/Makefile   |  2 +-
> >  arch/arm64/kvm/hypercalls.c   |  6 ++
> >  arch/arm64/kvm/trng.c | 91 +++
> >  4 files changed, 100 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/arm64/kvm/trng.c
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h
> > index 781d029b8aa8..615932bacf76 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -652,4 +652,6 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu
> > *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >   ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >
> > +int kvm_trng_call(struct kvm_vcpu *vcpu);
> > +
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index 1504c81fbf5d..a510037e3270 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o
> > $(KVM)/eventfd.o \
> >inject_fault.o regmap.o va_layout.o handle_exit.o \
> >guest.o debug.o reset.o sys_regs.o \
> >vgic-sys-reg-v3.o fpsimd.o pmu.o \
> > -  aarch32.o arch_timer.o \
> > +  aarch32.o arch_timer.o trng.o \
> >vgic/vgic.o vgic/vgic-init.o \
> >vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 25ea4ecb6449..ead21b98b620 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -71,6 +71,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >   if (gpa != GPA_INVALID)
> >   val = gpa;
> >   break;
> > + case ARM_SMCCC_TRNG_VERSION:
> > + case ARM_SMCCC_TRNG_FEATURES:
> > + case ARM_SMCCC_TRNG_GET_UUID:
> > + case ARM_SMCCC_TRNG_RND32:
> > + case ARM_SMCCC_TRNG_RND64:
> > + return kvm_trng_call(vcpu);
> >   default:
> >   return kvm_psci_call(vcpu);
> >   }
> > diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
> > new file mode 100644
> > index ..5a27b2d99977
> > --- /dev/null
> > +++ b/arch/arm64/kvm/trng.c
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2020 Arm Ltd.
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include 
> > +
> > +#define ARM_SMCCC_TRNG_VERSION_1_0   0x1UL
> > +
> > +#define TRNG_SUCCESS 0UL
>
> SMCCC_RET_SUCCESS
>
> > +#define TRNG_NOT_SUPPORTED   ((unsigned long)-1)
>
> SMCCC_RET_NOT_SUPPORTED
>
> > +#define TRNG_INVALID_PARAMETER   ((unsigned long)-2)
>
> *crap*. Why isn't that the same value as SMCCC_RET_INVALID_PARAMETER?
> Is it too late to fix the spec?
>

The SMCC_RET_ and TRNG_ ID spaces are deliberately disjoint,
so that we can add TRNG result codes that are only relevant in the
contex of this ABI without having to modify the SMCCC spec.

> > +#define TRNG_NO_ENTROPY  ((unsigned long)-3)
> > +
> > +#define MAX_BITS32   96
> > +#define MAX_BITS64   192
>
> Nothing seems to be using these definitions.
>

Indeed.

> > +
> > +static const uuid_t arm_smc_trng_uuid __aligned(4) = UUID_INIT(
> > + 0x023534a2, 0xe0bc, 0x45ec, 0x95, 0xdd, 0x33, 0x34, 0xc1, 0xcc, 0x31,
> > 0x89);
>
> I object to the lack of Easter egg

Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-05 Thread Ard Biesheuvel
On Thu, 5 Nov 2020 at 15:30, Mark Rutland  wrote:
>
> On Thu, Nov 05, 2020 at 03:04:57PM +0100, Ard Biesheuvel wrote:
> > On Thu, 5 Nov 2020 at 15:03, Mark Rutland  wrote:
> > > On Thu, Nov 05, 2020 at 01:41:42PM +, Mark Brown wrote:
> > > > On Thu, Nov 05, 2020 at 12:56:55PM +, Andre Przywara wrote:
>
> > > That said, I'm not sure it's great to plumb this under the
> > > arch_get_random*() interfaces, e.g. given this measn that
> > > add_interrupt_randomness() will end up trapping to the host all the time
> > > when it calls arch_get_random_seed_long().
> >
> > As it turns out, add_interrupt_randomness() isn't actually used on ARM.
>
> It's certainly called on arm64, per a warning I just hacked in:
>
> [1.083802] [ cut here ]
> [1.084802] add_interrupt_randomness called
> [1.085685] WARNING: CPU: 1 PID: 0 at drivers/char/random.c:1267 
> add_interrupt_randomness+0x2e8/0x318
> [1.087599] Modules linked in:
> [1.088258] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.0-rc2-dirty #13
> [1.089672] Hardware name: linux,dummy-virt (DT)
> [1.090659] pstate: 60400085 (nZCv daIf +PAN -UAO -TCO BTYPE=--)
> [1.091910] pc : add_interrupt_randomness+0x2e8/0x318
> [1.092965] lr : add_interrupt_randomness+0x2e8/0x318
> [1.094021] sp : 80001000be80
> [1.094732] x29: 80001000be80 x28: 2d0c80209840
> [1.095859] x27: 137c3e3a x26: 8000100abdd0
> [1.096978] x25: 0035 x24: 67918bda8000
> [1.098100] x23: c57c31923fe8 x22: fffedc14
> [1.099224] x21: 2d0dbef796a0 x20: c57c331d16a0
> [1.100339] x19: c57c33720a48 x18: 0010
> [1.101459] x17:  x16: 0002
> [1.102578] x15: 00e7 x14: 80001000bb20
> [1.103706] x13: ffea x12: c57c337b56e8
> [1.104821] x11: 0003 x10: c57c3379d6a8
> [1.105944] x9 : c57c3379d700 x8 : 00017fe8
> [1.107073] x7 : c000efff x6 : 0001
> [1.108186] x5 : 00057fa8 x4 : 
> [1.109305] x3 :  x2 : c57c337455d0
> [1.110428] x1 : db8dc9c2a1e0f600 x0 : 
> [1.111552] Call trace:
> [1.112083]  add_interrupt_randomness+0x2e8/0x318
> [1.113074]  handle_irq_event_percpu+0x48/0x90
> [1.114016]  handle_irq_event+0x48/0xf8
> [1.114826]  handle_fasteoi_irq+0xa4/0x130
> [1.115689]  generic_handle_irq+0x30/0x48
> [1.116528]  __handle_domain_irq+0x64/0xc0
> [1.117392]  gic_handle_irq+0xc0/0x138
> [1.118194]  el1_irq+0xbc/0x180
> [1.118870]  arch_cpu_idle+0x20/0x30
> [1.119630]  default_idle_call+0x8c/0x350
> [1.120479]  do_idle+0x224/0x298
> [1.121163]  cpu_startup_entry+0x28/0x70
> [1.121994]  secondary_start_kernel+0x184/0x198
>
> ... and I couldn't immediately spot why 32-bit arm  would be different.
>

Hmm, I actually meant both arm64 and ARM.

Marc looked into this at my request a while ago, and I had a look
myself as well at the time, and IIRC, we both concluded that we don't
hit that code path. Darn.

In any case, the way add_interrupt_randomness() calls
arch_get_random_seed_long() is absolutely insane, so we should try to
fix that in any case.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/5] arm64: Add support for SMCCC TRNG entropy source

2020-11-05 Thread Ard Biesheuvel
On Thu, 5 Nov 2020 at 15:03, Mark Rutland  wrote:
>
> On Thu, Nov 05, 2020 at 01:41:42PM +, Mark Brown wrote:
> > On Thu, Nov 05, 2020 at 12:56:55PM +, Andre Przywara wrote:
> >
> > >  static inline bool __must_check arch_get_random_seed_int(unsigned int *v)
> > >  {
> > > +   struct arm_smccc_res res;
> > > unsigned long val;
> > > -   bool ok = arch_get_random_seed_long();
> > >
> > > -   *v = val;
> > > -   return ok;
> > > +   if (cpus_have_const_cap(ARM64_HAS_RNG)) {
> > > +   if (arch_get_random_seed_long()) {
> > > +   *v = val;
> > > +   return true;
> > > +   }
> > > +   return false;
> > > +   }
> >
> > It isn't obvious to me why we don't fall through to trying the SMCCC
> > TRNG here if for some reason the v8.5-RNG didn't give us something.
> > Definitely an obscure possibility but still...
>
> I think it's better to assume that if we have a HW RNG and it's not
> giving us entropy, it's not worthwhile trapping to the host, which might
> encounter the exact same issue.
>
> I'd rather we have one RNG source that we trust works, and use that
> exclusively.
>
> That said, I'm not sure it's great to plumb this under the
> arch_get_random*() interfaces, e.g. given this measn that
> add_interrupt_randomness() will end up trapping to the host all the time
> when it calls arch_get_random_seed_long().
>

As it turns out, add_interrupt_randomness() isn't actually used on ARM.

> Is there an existing interface for "slow" runtime entropy that we can
> plumb this into instead?
>
> Thanks,
> Mark.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: implement the TRNG hypervisor call

2020-10-03 Thread Ard Biesheuvel
On Sat, 3 Oct 2020 at 12:30, Andrew Jones  wrote:
>
> Hi Ard,
>

Hi Drew,

Thanks for taking a look.

> On Sat, Oct 03, 2020 at 10:56:04AM +0200, Ard Biesheuvel wrote:
> > Provide a hypervisor implementation of the ARM architected TRNG firmware
> > interface described in ARM spec DEN0098. All function IDs are implemented,
> > including both 32-bit and 64-bit versions of the TRNG_RND service, which
> > is the centerpiece of the API.
> >
> > The API is backed by arch_get_random_seed_long(), which is implemented
> > in terms of RNDR currently, and will be alternatively backed by a SMC
> > call to the secure firmware using same interface after a future patch.
> > If neither are available, the kernel's entropy pool is used instead.
> >
> > Cc: Marc Zyngier 
> > Cc: James Morse 
> > Cc: Julien Thierry 
> > Cc: Suzuki K Poulose 
> > Cc: Catalin Marinas 
> > Cc: Will Deacon 
> > Cc: Mark Rutland 
> > Cc: Lorenzo Pieralisi 
> > Cc: Sudeep Holla 
> > Cc: Sami Mujawar 
> > Cc: Andre Przywara 
> > Cc: Alexandru Elisei 
> > Cc: Laszlo Ersek 
> > Cc: Leif Lindholm 
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  2 +
> >  arch/arm64/kvm/Makefile   |  2 +-
> >  arch/arm64/kvm/hypercalls.c   |  6 ++
> >  arch/arm64/kvm/trng.c | 91 
> >  include/linux/arm-smccc.h | 31 +++
> >  5 files changed, 131 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 65568b23868a..f76164d390ea 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -688,4 +688,6 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >   ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >
> > +int kvm_trng_call(struct kvm_vcpu *vcpu);
> > +
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index 99977c1972cc..e117d7086500 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
> > $(KVM)/eventfd.o \
> >inject_fault.o regmap.o va_layout.o hyp.o handle_exit.o \
> >guest.o debug.o reset.o sys_regs.o \
> >vgic-sys-reg-v3.o fpsimd.o pmu.o \
> > -  aarch32.o arch_timer.o \
> > +  aarch32.o arch_timer.o trng.o \
> >vgic/vgic.o vgic/vgic-init.o \
> >vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 550dfa3e53cd..70c5e815907d 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -62,6 +62,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> >   if (gpa != GPA_INVALID)
> >   val = gpa;
> >   break;
> > + case ARM_SMCCC_TRNG_VERSION:
> > + case ARM_SMCCC_TRNG_FEATURES:
> > + case ARM_SMCCC_TRNG_GET_UUID:
> > + case ARM_SMCCC_TRNG_RND:
> > + case ARM_SMCCC_TRNG_RND64:
> > + return kvm_trng_call(vcpu);
> >   default:
> >   return kvm_psci_call(vcpu);
> >   }
> > diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
> > new file mode 100644
> > index ..71f704075e4a
> > --- /dev/null
> > +++ b/arch/arm64/kvm/trng.c
> > @@ -0,0 +1,91 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2020 Arm Ltd.
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include 
> > +
> > +#define ARM_SMCCC_TRNG_VERSION_1_0   0x1UL
> > +
> > +#define TRNG_SUCCESS 0UL
> > +#define TRNG_NOT_SUPPORTED   ((unsigned long)-1)
> > +#define TRNG_INVALID_PARAMETER   ((unsigned long)-2)
> > +#define TRNG_NO_ENTROPY  ((unsigned long)-3)
> > +
> > +#define MAX_BITS32   96
> > +#define MAX_BITS64   192
> > +
> > +static const uuid_t arm_smc_trng_uuid __aligned(4) = UUID_INIT(
> > + 0x023534a2, 0xe0bc, 0x45ec, 0x95, 0xdd, 0x33, 0x34, 0xc1, 0xcc, 0x31, 
> > 0x89);
>
> Where does this UUID come from?
>

uuidgen

The only requirement for the UUID is that we can distinguish
implementations, in order to be 

[PATCH] KVM: arm64: implement the TRNG hypervisor call

2020-10-03 Thread Ard Biesheuvel
Provide a hypervisor implementation of the ARM architected TRNG firmware
interface described in ARM spec DEN0098. All function IDs are implemented,
including both 32-bit and 64-bit versions of the TRNG_RND service, which
is the centerpiece of the API.

The API is backed by arch_get_random_seed_long(), which is implemented
in terms of RNDR currently, and will be alternatively backed by a SMC
call to the secure firmware using same interface after a future patch.
If neither are available, the kernel's entropy pool is used instead.

Cc: Marc Zyngier 
Cc: James Morse 
Cc: Julien Thierry 
Cc: Suzuki K Poulose 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mark Rutland 
Cc: Lorenzo Pieralisi 
Cc: Sudeep Holla 
Cc: Sami Mujawar 
Cc: Andre Przywara 
Cc: Alexandru Elisei 
Cc: Laszlo Ersek 
Cc: Leif Lindholm 
Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/kvm_host.h |  2 +
 arch/arm64/kvm/Makefile   |  2 +-
 arch/arm64/kvm/hypercalls.c   |  6 ++
 arch/arm64/kvm/trng.c | 91 
 include/linux/arm-smccc.h | 31 +++
 5 files changed, 131 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 65568b23868a..f76164d390ea 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -688,4 +688,6 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_arm_vcpu_sve_finalized(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
 
+int kvm_trng_call(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 99977c1972cc..e117d7086500 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
$(KVM)/eventfd.o \
 inject_fault.o regmap.o va_layout.o hyp.o handle_exit.o \
 guest.o debug.o reset.o sys_regs.o \
 vgic-sys-reg-v3.o fpsimd.o pmu.o \
-aarch32.o arch_timer.o \
+aarch32.o arch_timer.o trng.o \
 vgic/vgic.o vgic/vgic-init.o \
 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3e53cd..70c5e815907d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -62,6 +62,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
if (gpa != GPA_INVALID)
val = gpa;
break;
+   case ARM_SMCCC_TRNG_VERSION:
+   case ARM_SMCCC_TRNG_FEATURES:
+   case ARM_SMCCC_TRNG_GET_UUID:
+   case ARM_SMCCC_TRNG_RND:
+   case ARM_SMCCC_TRNG_RND64:
+   return kvm_trng_call(vcpu);
default:
return kvm_psci_call(vcpu);
}
diff --git a/arch/arm64/kvm/trng.c b/arch/arm64/kvm/trng.c
new file mode 100644
index ..71f704075e4a
--- /dev/null
+++ b/arch/arm64/kvm/trng.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2020 Arm Ltd.
+
+#include 
+#include 
+
+#include 
+
+#include 
+
+#define ARM_SMCCC_TRNG_VERSION_1_0 0x1UL
+
+#define TRNG_SUCCESS   0UL
+#define TRNG_NOT_SUPPORTED ((unsigned long)-1)
+#define TRNG_INVALID_PARAMETER ((unsigned long)-2)
+#define TRNG_NO_ENTROPY((unsigned long)-3)
+
+#define MAX_BITS32 96
+#define MAX_BITS64 192
+
+static const uuid_t arm_smc_trng_uuid __aligned(4) = UUID_INIT(
+   0x023534a2, 0xe0bc, 0x45ec, 0x95, 0xdd, 0x33, 0x34, 0xc1, 0xcc, 0x31, 
0x89);
+
+static int kvm_trng_do_rnd(struct kvm_vcpu *vcpu, int size)
+{
+   u32 num_bits = smccc_get_arg1(vcpu);
+   u64 bits[3];
+   int i;
+
+   if (num_bits > 3 * size) {
+   smccc_set_retval(vcpu, TRNG_NOT_SUPPORTED, 0, 0, 0);
+   return 1;
+   }
+
+   /* get as many bits as we need to fulfil the request */
+   for (i = 0; i < DIV_ROUND_UP(num_bits, 64); i++)
+   /* use the arm64 specific backend directly if one exists */
+   if (!arch_get_random_seed_long((unsigned long *)[i]))
+   bits[i] = get_random_long();
+
+   if (num_bits % 64)
+   bits[i - 1] &= U64_MAX >> (64 - (num_bits % 64));
+
+   while (i < ARRAY_SIZE(bits))
+   bits[i++] = 0;
+
+   if (size == 32)
+   smccc_set_retval(vcpu, TRNG_SUCCESS, lower_32_bits(bits[1]),
+upper_32_bits(bits[0]), 
lower_32_bits(bits[0]));
+   else
+   smccc_set_retval(vcpu, TRNG_SUCCESS, bits[2], bits[1], bits[0]);
+
+   memzero_explicit(bits, sizeof(bits));
+   return 1;
+}
+
+int kvm_trng_call(struct kvm_vcpu *vcpu)
+{
+   const __be32 *u = (__be32 *)arm_smc_trng_uuid.b;
+   u32 func_id = smccc_get_function(vcpu);
+   unsigned lon

Re: [RFC PATCH v1 0/3] put arm64 kvm_config on a diet

2020-08-04 Thread Ard Biesheuvel
On Tue, 4 Aug 2020 at 14:45, Alex Bennée  wrote:
>
> Hi,
>
> When building guest kernels for virtualisation we were bringing in a
> bunch of stuff from physical hardware which we don't need for our
> idealised fixable virtual PCI devices. This series makes some Kconfig
> changes to allow the ThunderX and XGene PCI drivers to be compiled
> out. It also drops PCI_QUIRKS from the KVM guest build as a virtual
> PCI device should be quirk free.
>

What about PCI passthrough?

> This is my first time hacking around Kconfig so I hope I've got the
> balance between depends and selects right but please let be know if it
> could be specified in a cleaner way.
>
> Alex Bennée (3):
>   arm64: allow de-selection of ThunderX PCI controllers
>   arm64: gate the whole of pci-xgene on CONFIG_PCI_XGENE
>   kernel/configs: don't include PCI_QUIRKS in KVM guest configs
>
>  arch/arm64/Kconfig.platforms| 2 ++
>  arch/arm64/configs/defconfig| 1 +
>  drivers/pci/controller/Kconfig  | 7 +++
>  drivers/pci/controller/Makefile | 8 +++-
>  kernel/configs/kvm_guest.config | 1 +
>  5 files changed, 14 insertions(+), 5 deletions(-)
>
> --
> 2.20.1
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 08/15] arm64: kvm: Split hyp/switch.c to VHE/nVHE

2020-06-26 Thread Ard Biesheuvel
On Thu, 25 Jun 2020 at 10:16, Marc Zyngier  wrote:
>
> On 2020-06-25 06:03, kernel test robot wrote:
> > Hi David,
> >
> > Thank you for the patch! Perhaps something to improve:
> >
> > [auto build test WARNING on linus/master]
> > [also build test WARNING on v5.8-rc2 next-20200624]
> > [cannot apply to kvmarm/next arm64/for-next/core
> > arm-perf/for-next/perf]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use  as documented in
> > https://git-scm.com/docs/git-format-patch]
> >
> > url:
> > https://github.com/0day-ci/linux/commits/David-Brazdil/Split-off-nVHE-hyp-code/20200618-203230
> > base:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > 1b5044021070efa3259f3e9548dc35d1eb6aa844
> > config: arm64-randconfig-r021-20200624 (attached as .config)
> > compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project
> > 8911a35180c6777188fefe0954a2451a2b91deaf)
> > reproduce (this is a W=1 build):
> > wget
> > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> > -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # install arm64 cross compiling tool for clang build
> > # apt-get install binutils-aarch64-linux-gnu
> > # save the attached .config to linux build tree
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross
> > ARCH=arm64
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot 
> >
> > All warnings (new ones prefixed by >>):
> >
> >>> arch/arm64/kvm/hyp/nvhe/switch.c:244:28: warning: no previous
> >>> prototype for function 'hyp_panic' [-Wmissing-prototypes]
> >void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context
> > *host_ctxt)
>
> I really wish we could turn these warnings off. They don't add much.
> Or is there an annotation we could stick on the function (something
> like __called_from_asm_please_leave_me_alone springs to mind...)?
>

We should define 'asmlinkage' to whatever makes this warning go away.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v4 0/5] Add CFI flash emulation

2020-04-26 Thread Ard Biesheuvel
On Fri, 24 Apr 2020 at 19:03, Will Deacon  wrote:
>
> On Fri, Apr 24, 2020 at 09:40:51AM +0100, Will Deacon wrote:
> > On Thu, Apr 23, 2020 at 06:38:39PM +0100, Andre Przywara wrote:
> > > an update for the CFI flash emulation, addressing Alex' comments and
> > > adding direct mapping support.
> > > The actual code changes to the flash emulation are minimal, mostly this
> > > is about renaming and cleanups.
> > > This versions now adds some patches. 1/5 is a required fix, the last
> > > three patches add mapping support as an extension. See below.
> >
> > Cheers, this mostly looks good to me. I've left a couple of minor comments,
> > and I'll give Alexandru a chance to have another look, but hopefully we can
> > merge it soon.
>
> Ok, I pushed this out along with the follow-up patch.
>

Cheers for that, this is useful stuff.

For the record, I did a quick benchmark on RPi4 booting Debian in a
VM, and I get the following delays (with GRUB and EFI timeouts both
set to 0)

17:04:58.487065
17:04:58.563700 UEFI firmware (version  built at 22:13:20 on Apr 23 2020)
17:04:58.853653 Welcome to GRUB!
17:04:58.924606 Booting `Debian GNU/Linux'
17:04:58.927835 Loading Linux 5.5.0-2-arm64 ...
17:04:59.063490 Loading initial ramdisk ...
17:05:01.303560 /dev/vda2: recovering journal
17:05:01.408861 /dev/vda2: clean, 37882/500960 files, 457154/2001920 blocks
17:05:09.646023 Debian GNU/Linux bullseye/sid rpi4vm64 ttyS0

So it takes less than 400 ms from starting kvmtool to entering GRUB
when the boot path is set normally. Any other delays you are observing
may be due to the fact that no boot path has been configured yet,
which is why it attempts PXE boot or other things.

Also, note that you can pass the --rng option to kvmtool to get the
EFI_RNG_PROTOCOL to be exposed to the EFI stub, for KASLR and for
seeding the kernel's RNG.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v4 0/5] Add CFI flash emulation

2020-04-24 Thread Ard Biesheuvel
On Thu, 23 Apr 2020 at 19:55, Ard Biesheuvel  wrote:
>
> On Thu, 23 Apr 2020 at 19:39, Andre Przywara  wrote:
> >
> > Hi,
> >
> > an update for the CFI flash emulation, addressing Alex' comments and
> > adding direct mapping support.
> > The actual code changes to the flash emulation are minimal, mostly this
> > is about renaming and cleanups.
> > This versions now adds some patches. 1/5 is a required fix, the last
> > three patches add mapping support as an extension. See below.
> >
> > In addition to a branch with this series[1], I also put a git branch with
> > all the changes compared to v3[2] as separate patches on the server, please
> > have a look if you want to verify against a previous review.
> >
> > ===
> > The EDK II UEFI firmware implementation requires some storage for the EFI
> > variables, which is typically some flash storage.
> > Since this is already supported on the EDK II side, and looks like a
> > generic standard, this series adds a CFI flash emulation to kvmtool.
> >
> > Patch 2/5 is the actual emulation code, patch 1/5 is a bug-fix for
> > registering MMIO devices, which is needed for this device.
> > Patches 3-5 add support for mapping the flash memory into guest, should
> > it be in read-array mode. For this to work, patch 3/5 is cherry-picked
> > from Alex' PCIe reassignable BAR series, to support removing a memslot
> > mapping. Patch 4/5 adds support for read-only mappings, while patch 5/5
> > adds or removes the mapping based on the current state.
> > I am happy to squash 5/5 into 2/5, if we agree that patch 3/5 should be
> > merged either separately or the PCIe series is actually merged before
> > this one.
> >
> > This is one missing piece towards a working UEFI boot with kvmtool on
> > ARM guests, the other is to provide writable PCI BARs, which is WIP.
> > This series alone already enables UEFI boot, but only with virtio-mmio.
> >
>
> Excellent! Thanks for taking the time to implement the r/o memslot for
> the flash, it really makes the UEFI firmware much more usable.
>
> I will test this as soon as I get a chance, probably tomorrow.
>

I tested this on a SynQuacer box as a host, using EFI firmware [0]
built from patches provided by Sami.

I booted the Debian buster installer, completed the installation, and
could boot into the system. The only glitch was the fact that the
reboot didn't work, but I suppose we are not preserving the memory the
contains the firmware image, so there is nothing to reboot into. But
just restarting kvmtool with the flash image containing the EFI
variables got me straight into GRUB and the installed OS.

Tested-by: Ard Biesheuvel 

Thanks again for getting this sorted.


[0] https://people.linaro.org/~ard.biesheuvel/KVMTOOL_EFI.fd
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v4 0/5] Add CFI flash emulation

2020-04-24 Thread Ard Biesheuvel
On Thu, 23 Apr 2020 at 23:32, André Przywara  wrote:
>
> On 23/04/2020 21:43, Ard Biesheuvel wrote:
>
> Hi Ard,
>
> > On Thu, 23 Apr 2020 at 19:55, Ard Biesheuvel  wrote:
> >>
> >> On Thu, 23 Apr 2020 at 19:39, Andre Przywara  
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> an update for the CFI flash emulation, addressing Alex' comments and
> >>> adding direct mapping support.
> >>> The actual code changes to the flash emulation are minimal, mostly this
> >>> is about renaming and cleanups.
> >>> This versions now adds some patches. 1/5 is a required fix, the last
> >>> three patches add mapping support as an extension. See below.
> >>>
> >>> In addition to a branch with this series[1], I also put a git branch with
> >>> all the changes compared to v3[2] as separate patches on the server, 
> >>> please
> >>> have a look if you want to verify against a previous review.
> >>>
> >>> ===
> >>> The EDK II UEFI firmware implementation requires some storage for the EFI
> >>> variables, which is typically some flash storage.
> >>> Since this is already supported on the EDK II side, and looks like a
> >>> generic standard, this series adds a CFI flash emulation to kvmtool.
> >>>
> >>> Patch 2/5 is the actual emulation code, patch 1/5 is a bug-fix for
> >>> registering MMIO devices, which is needed for this device.
> >>> Patches 3-5 add support for mapping the flash memory into guest, should
> >>> it be in read-array mode. For this to work, patch 3/5 is cherry-picked
> >>> from Alex' PCIe reassignable BAR series, to support removing a memslot
> >>> mapping. Patch 4/5 adds support for read-only mappings, while patch 5/5
> >>> adds or removes the mapping based on the current state.
> >>> I am happy to squash 5/5 into 2/5, if we agree that patch 3/5 should be
> >>> merged either separately or the PCIe series is actually merged before
> >>> this one.
> >>>
> >>> This is one missing piece towards a working UEFI boot with kvmtool on
> >>> ARM guests, the other is to provide writable PCI BARs, which is WIP.
> >>> This series alone already enables UEFI boot, but only with virtio-mmio.
> >>>
> >>
> >> Excellent! Thanks for taking the time to implement the r/o memslot for
> >> the flash, it really makes the UEFI firmware much more usable.
> >>
> >> I will test this as soon as I get a chance, probably tomorrow.
> >>
> >
> > I tested this on a SynQuacer box as a host, using EFI firmware [0]
> > built from patches provided by Sami.
> >
> > I booted the Debian buster installer, completed the installation, and
> > could boot into the system. The only glitch was the fact that the
> > reboot didn't work, but I suppose we are not preserving the memory the
> > contains the firmware image, so there is nothing to reboot into.
>
> It's even worth, kvmtool does actually not support reset at all:
> https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/tree/kvm-cpu.c#n220
>
> And yeah, the UEFI firmware is loaded at the beginning of RAM, so most
> of it is long gone by then.
> kvmtool could reload the image and reset the VCPUs, but every device
> emulation would need to be reset, for which there is no code yet.
>

Fair enough. For my use case, it doesn't really matter anyway.

> > But
> > just restarting kvmtool with the flash image containing the EFI
> > variables got me straight into GRUB and the installed OS.
>
> So, yeah, this is the way to do it ;-)
>
> > Tested-by: Ard Biesheuvel 
>
> Many thanks for that!
>
> > Thanks again for getting this sorted.
>
> It was actually easier than I thought (see the last patch).
>
> Just curious: the images Sami gave me this morning did not show any
> issues anymore (no no-syndrome fault, no alignment issues), even without
> the mapping [1]. And even though I saw the 800k read traps, I didn't
> notice any real performance difference (on a Juno). The PXE timeout was
> definitely much more noticeable.
>
> So did you see any performance impact with this series?
>

You normally don't PXE boot. There is an issue with the iSCSI driver
as well, which causes a boot delay for some reason, so I disabled that
in my build.

I definitely *feels* faster :-) But in any case, exposing the array
mode as a r/o memslot is definitely the right way to deal with this.
Even if Sami did find a workaround that masks the error, it is no
guarantee that all accesses go through that library.


> > [0] https://people.linaro.org/~ard.biesheuvel/KVMTOOL_EFI.fd
>
> Ah, nice, will this stay there for a while? I can't provide binaries,
> but wanted others to be able to easily test this.
>

Sure, I will leave it up until Linaro decides to take down my account.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v4 0/5] Add CFI flash emulation

2020-04-24 Thread Ard Biesheuvel
On Fri, 24 Apr 2020 at 14:08, André Przywara  wrote:
>
> On 24/04/2020 07:45, Ard Biesheuvel wrote:
>
> Hi,
>
> (adding Leif for EDK-2 discussion)
>
> > On Thu, 23 Apr 2020 at 23:32, André Przywara  wrote:
> >>
> >> On 23/04/2020 21:43, Ard Biesheuvel wrote:
>
> [ ... kvmtool series to add CFI flash emulation allowing EDK-2 to store
> variables. Starting with this version (v4) the flash memory region is
> presented as a read-only memslot to KVM, to allow direct guest accesses
> as opposed to trap-and-emulate even read accesses to the array.]
>
> >>
> >>
> >> Just curious: the images Sami gave me this morning did not show any
> >> issues anymore (no no-syndrome fault, no alignment issues), even without
> >> the mapping [1]. And even though I saw the 800k read traps, I didn't
> >> notice any real performance difference (on a Juno). The PXE timeout was
> >> definitely much more noticeable.
> >>
> >> So did you see any performance impact with this series?
> >>
> >
> > You normally don't PXE boot. There is an issue with the iSCSI driver
> > as well, which causes a boot delay for some reason, so I disabled that
> > in my build.
> >
> > I definitely *feels* faster :-) But in any case, exposing the array
> > mode as a r/o memslot is definitely the right way to deal with this.
> > Even if Sami did find a workaround that masks the error, it is no
> > guarantee that all accesses go through that library.
>
> So I was wondering about this, maybe you can confirm or debunk this:
> - Any memory given to the compiler (through a pointer) is assumed to be
> "normal" memory: the compiler can re-arrange accesses, split them up or
> collate them. Also unaligned accesses should be allowed - although I
> guess most compilers would avoid them.
> - This normally forbids to give a pointer to memory mapped as "device
> memory" to the compiler, since this would violate all of the assumptions
> above.
> - If the device mapped as "device memory" is actually memory (SRAM,
> ROM/flash, framebuffer), then most of the assumptions are met, except
> the alignment requirement, which is bound to the mapping type, not the
> actual device (ARMv8 ARM: Unaligned accesses to device memory always
> trap, regardless of SCTLR.A)
> - To accommodate the latter, GCC knows the option -malign-strict, to
> avoid unaligned accesses. TF-A and U-Boot use this option, to run
> without the MMU enabled.
>
> Now if EDK-2 lets the compiler deal with the flash memory region
> directly, I think this would still be prone to alignment faults. In fact
> an earlier build I got from Sami faulted on exactly that, when I ran it,
> even with the r/o memslot mapping in place.
>
> So should EDK-2 add -malign-strict to be safe?

It already uses this in various places where it matters.

> or
> Should EDK-2 add an additional or alternate mapping using a non-device
> memory type (with all the mismatched attributes consequences)?

The memory mapped NOR flash in UEFI is really a special case, since we
need the OS to map it for us at runtime, and we cannot tell it to
switch between normal-NC and device attributes depending on which mode
the firmware is using it in.

Note that this is not any different on bare metal.

> or
> Should EDK-2 only touch the flash region using MMIO accessors, and
> forbid the compiler direct access to that region?
>

It should only touch those regions using abstractions it defines
itself, and which can be backed in different ways. This is already the
case in EDK2: it has its own CopyMem, ZeroMem, etc string library, and
bans the use the standard C ones. On top of that, it bans struct
assignment, initializers for automatic variables and are things that
result in such calls to be emitted implicitly.

So in practice, this issue is under control, unless you use a version
of those abstractions that willingly uses unaligned accesses (we have
optimized versions based on the cortex-strings library). So my
suspicion is that this may have caused the crash: on bare metal, we
have to switch to the non-optimized string library for the variable
driver for this reason.

The real solution is to fix EDK2, and make the variable stack work
with NOR flash that is non-memory mapped. This is something that has
come up before, and the other day, Sami and I were just discussing
logging this as a wishlist item for the firmware team.


> So does EDK-2 get away with this because the compiler typically avoids
> unaligned accesses?
>

There are certainly some places in the current code base where it is
the compiler that is emitting reads from the NOR flash region, but
there aren't that many. Moving t

Re: [PATCH kvmtool v4 0/5] Add CFI flash emulation

2020-04-23 Thread Ard Biesheuvel
On Thu, 23 Apr 2020 at 19:39, Andre Przywara  wrote:
>
> Hi,
>
> an update for the CFI flash emulation, addressing Alex' comments and
> adding direct mapping support.
> The actual code changes to the flash emulation are minimal, mostly this
> is about renaming and cleanups.
> This versions now adds some patches. 1/5 is a required fix, the last
> three patches add mapping support as an extension. See below.
>
> In addition to a branch with this series[1], I also put a git branch with
> all the changes compared to v3[2] as separate patches on the server, please
> have a look if you want to verify against a previous review.
>
> ===
> The EDK II UEFI firmware implementation requires some storage for the EFI
> variables, which is typically some flash storage.
> Since this is already supported on the EDK II side, and looks like a
> generic standard, this series adds a CFI flash emulation to kvmtool.
>
> Patch 2/5 is the actual emulation code, patch 1/5 is a bug-fix for
> registering MMIO devices, which is needed for this device.
> Patches 3-5 add support for mapping the flash memory into guest, should
> it be in read-array mode. For this to work, patch 3/5 is cherry-picked
> from Alex' PCIe reassignable BAR series, to support removing a memslot
> mapping. Patch 4/5 adds support for read-only mappings, while patch 5/5
> adds or removes the mapping based on the current state.
> I am happy to squash 5/5 into 2/5, if we agree that patch 3/5 should be
> merged either separately or the PCIe series is actually merged before
> this one.
>
> This is one missing piece towards a working UEFI boot with kvmtool on
> ARM guests, the other is to provide writable PCI BARs, which is WIP.
> This series alone already enables UEFI boot, but only with virtio-mmio.
>

Excellent! Thanks for taking the time to implement the r/o memslot for
the flash, it really makes the UEFI firmware much more usable.

I will test this as soon as I get a chance, probably tomorrow.


>
> [1] http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/cfi-flash/v4
> [2] http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/cfi-flash/v3
> git://linux-arm.org/kvmtool.git (branches cfi-flash/v3 and cfi-flash/v4)
>
> Changelog v3 .. v4:
> - Rename file to cfi-flash.c (dash instead of underscore).
> - Unify macro names for states, modes and commands.
> - Enforce one or two chips only.
> - Comment on pow2_size() function.
> - Use more consistent identifier spellings.
> - Assign symbols to status register values.
> - Drop RCR register emulation.
> - Use numerical offsets instead of names for query offsets to match spec.
> - Cleanup error path and reword info message in create_flash_device_file().
> - Add fix to allow non-virtio MMIO device emulations.
> - Support tearing down and adding read-only memslots.
> - Add read-only memslot mapping when in read mode.
>
> Changelog v2 .. v3:
> - Breaking MMIO handling into three separate functions.
> - Assing the flash base address in the memory map, but stay at 32 MB for now.
>   The MMIO area has been moved up to 48 MB, to never overlap with the
>   flash.
> - Impose a limit of 16 MB for the flash size, mostly to fit into the
>   (for now) fixed memory map.
> - Trim flash size down to nearest power-of-2, to match hardware.
> - Announce forced flash size trimming.
> - Rework the CFI query table slightly, to add the addresses as array
>   indicies.
> - Fix error handling when creating the flash device.
> - Fix pow2_size implementation for 0 and 1 as input values.
> - Fix write buffer size handling.
> - Improve some comments.
>
> Changelog v1 .. v2:
> - Add locking for MMIO handling.
> - Fold flash read into handler.
> - Move pow2_size() into generic header.
> - Spell out flash base address.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
>
> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
> wrote:
> >
> > Hi,
> >
> > I've tested this patch by running badblocks and fio on a flash device 
> > inside a
> > guest, everything worked as expected.
> >
> > I've also looked at the flowcharts for device operation from Intel 
> > Application
> > Note 646, pages 12-21, and they seem implemented correctly.
> >
> > A few minor issues below.
> >
> > On 2/21/20 4:55 PM, Andre Przywara wrote:
> > > From: Raphael Gault 
> > >
> > > The EDK II UEFI firmware implementation requires some storage for the EFI
> > > variables, which is typically some flash storage.
> > > Since this is already supported on the EDK II side, we add a CFI flash
> > > emulation to kvmtool.
> > > This is backed by a file, specified via the --flash or -F command line
> > > option. Any flash writes done by the guest will immediately be reflected
> > > into this file (kvmtool mmap's the file).
> > > The flash will be limited to the nearest power-of-2 size, so only the
> > > first 2 MB of a 3 MB file will be used.
> > >
> > > This implements a CFI flash using the "Intel/Sharp extended command
> > > set", as specified in:
> > > - JEDEC JESD68.01
> > > - JEDEC JEP137B
> > > - Intel Application Note 646
> > > Some gaps in those specs have been filled by looking at real devices and
> > > other implementations (QEMU, Linux kernel driver).
> > >
> > > At the moment this relies on DT to advertise the base address of the
> > > flash memory (mapped into the MMIO address space) and is only enabled
> > > for ARM/ARM64. The emulation itself is architecture agnostic, though.
> > >
> > > This is one missing piece toward a working UEFI boot with kvmtool on
> > > ARM guests, the other is to provide writable PCI BARs, which is WIP.
> > >
>
> I have given this a spin with UEFI built for kvmtool, and it appears
> to be working correctly. However, I noticed that it is intolerably
> slow, which seems to be caused by the fact that both array mode and
> command mode (or whatever it is called in the CFI spec) are fully
> emulated, whereas in the QEMU implementation (for instance), the
> region is actually exposed to the guest using a read-only KVM memslot
> in array mode, and so the read accesses are made natively.
>
> It is also causing problems in the UEFI implementation, as we can no
> longer use unaligned accesses to read from the region, which is
> something the code currently relies on (and which works fine on actual
> hardware as long as you use normal non-cacheable mappings)
>

Actually, the issue is not alignment. The issue is with instructions
with multiple outputs, which means you cannot do an ordinary memcpy()
from the NOR region using ldp instructions, aligned or not.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  wrote:
>
> Hi,
>
> I've tested this patch by running badblocks and fio on a flash device inside a
> guest, everything worked as expected.
>
> I've also looked at the flowcharts for device operation from Intel Application
> Note 646, pages 12-21, and they seem implemented correctly.
>
> A few minor issues below.
>
> On 2/21/20 4:55 PM, Andre Przywara wrote:
> > From: Raphael Gault 
> >
> > The EDK II UEFI firmware implementation requires some storage for the EFI
> > variables, which is typically some flash storage.
> > Since this is already supported on the EDK II side, we add a CFI flash
> > emulation to kvmtool.
> > This is backed by a file, specified via the --flash or -F command line
> > option. Any flash writes done by the guest will immediately be reflected
> > into this file (kvmtool mmap's the file).
> > The flash will be limited to the nearest power-of-2 size, so only the
> > first 2 MB of a 3 MB file will be used.
> >
> > This implements a CFI flash using the "Intel/Sharp extended command
> > set", as specified in:
> > - JEDEC JESD68.01
> > - JEDEC JEP137B
> > - Intel Application Note 646
> > Some gaps in those specs have been filled by looking at real devices and
> > other implementations (QEMU, Linux kernel driver).
> >
> > At the moment this relies on DT to advertise the base address of the
> > flash memory (mapped into the MMIO address space) and is only enabled
> > for ARM/ARM64. The emulation itself is architecture agnostic, though.
> >
> > This is one missing piece toward a working UEFI boot with kvmtool on
> > ARM guests, the other is to provide writable PCI BARs, which is WIP.
> >

I have given this a spin with UEFI built for kvmtool, and it appears
to be working correctly. However, I noticed that it is intolerably
slow, which seems to be caused by the fact that both array mode and
command mode (or whatever it is called in the CFI spec) are fully
emulated, whereas in the QEMU implementation (for instance), the
region is actually exposed to the guest using a read-only KVM memslot
in array mode, and so the read accesses are made natively.

It is also causing problems in the UEFI implementation, as we can no
longer use unaligned accesses to read from the region, which is
something the code currently relies on (and which works fine on actual
hardware as long as you use normal non-cacheable mappings)

Are there any plans to implement this as well? I am aware that this is
a big ask, but for the general utility of this feature, I think it is
rather important.

-- 
Ard.


> > Signed-off-by: Raphael Gault 
> > [Andre: rewriting and fixing]
> > Signed-off-by: Andre Przywra 
> > ---
> > Hi,
> >
> > an update fixing Alexandru's review comments (many thanks for those!)
> > The biggest change code-wise is the split of the MMIO handler into three
> > different functions. Another significant change is the rounding *down* of
> > the present flash file size to the nearest power-of-two, to match flash
> > hardware chips and Linux' expectations.
> >
> > Cheers,
> > Andre
> >
> > Changelog v2 .. v3:
> > - Breaking MMIO handling into three separate functions.
> > - Assing the flash base address in the memory map, but stay at 32 MB for 
> > now.
> >   The MMIO area has been moved up to 48 MB, to never overlap with the
> >   flash.
> > - Impose a limit of 16 MB for the flash size, mostly to fit into the
> >   (for now) fixed memory map.
> > - Trim flash size down to nearest power-of-2, to match hardware.
> > - Announce forced flash size trimming.
> > - Rework the CFI query table slightly, to add the addresses as array
> >   indicies.
> > - Fix error handling when creating the flash device.
> > - Fix pow2_size implementation for 0 and 1 as input values.
> > - Fix write buffer size handling.
> > - Improve some comments.
> >
> > Changelog v1 .. v2:
> > - Add locking for MMIO handling.
> > - Fold flash read into handler.
> > - Move pow2_size() into generic header.
> > - Spell out flash base address.
> >
> >  Makefile  |   6 +
> >  arm/include/arm-common/kvm-arch.h |   8 +-
> >  builtin-run.c |   2 +
> >  hw/cfi_flash.c| 576 ++
> >  include/kvm/kvm-config.h  |   1 +
> >  include/kvm/util.h|   8 +
> >  6 files changed, 599 insertions(+), 2 deletions(-)
> >  create mode 100644 hw/cfi_flash.c
> >
> > diff --git a/Makefile b/Makefile
> > index 3862112c..7ed6fb5e 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -170,6 +170,7 @@ ifeq ($(ARCH), arm)
> >   CFLAGS  += -march=armv7-a
> >
> >   ARCH_WANT_LIBFDT := y
> > + ARCH_HAS_FLASH_MEM := y
> >  endif
> >
> >  # ARM64
> > @@ -182,6 +183,7 @@ ifeq ($(ARCH), arm64)
> >   ARCH_INCLUDE+= -Iarm/aarch64/include
> >
> >   ARCH_WANT_LIBFDT := y
> > + ARCH_HAS_FLASH_MEM := y
> >  endif
> >
> >  ifeq ($(ARCH),mips)
> > @@ -261,6 +263,10 @@ ifeq (y,$(ARCH_HAS_FRAMEBUFFER))
> >   

Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Wed, 15 Apr 2020 at 18:11, André Przywara  wrote:
>
> On 15/04/2020 16:55, Ard Biesheuvel wrote:
> > On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
> >>
> >> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I've tested this patch by running badblocks and fio on a flash device 
> >>> inside a
> >>> guest, everything worked as expected.
> >>>
> >>> I've also looked at the flowcharts for device operation from Intel 
> >>> Application
> >>> Note 646, pages 12-21, and they seem implemented correctly.
> >>>
> >>> A few minor issues below.
> >>>
> >>> On 2/21/20 4:55 PM, Andre Przywara wrote:
> >>>> From: Raphael Gault 
> >>>>
> >>>> The EDK II UEFI firmware implementation requires some storage for the EFI
> >>>> variables, which is typically some flash storage.
> >>>> Since this is already supported on the EDK II side, we add a CFI flash
> >>>> emulation to kvmtool.
> >>>> This is backed by a file, specified via the --flash or -F command line
> >>>> option. Any flash writes done by the guest will immediately be reflected
> >>>> into this file (kvmtool mmap's the file).
> >>>> The flash will be limited to the nearest power-of-2 size, so only the
> >>>> first 2 MB of a 3 MB file will be used.
> >>>>
> >>>> This implements a CFI flash using the "Intel/Sharp extended command
> >>>> set", as specified in:
> >>>> - JEDEC JESD68.01
> >>>> - JEDEC JEP137B
> >>>> - Intel Application Note 646
> >>>> Some gaps in those specs have been filled by looking at real devices and
> >>>> other implementations (QEMU, Linux kernel driver).
> >>>>
> >>>> At the moment this relies on DT to advertise the base address of the
> >>>> flash memory (mapped into the MMIO address space) and is only enabled
> >>>> for ARM/ARM64. The emulation itself is architecture agnostic, though.
> >>>>
> >>>> This is one missing piece toward a working UEFI boot with kvmtool on
> >>>> ARM guests, the other is to provide writable PCI BARs, which is WIP.
> >>>>
> >>
> >> I have given this a spin with UEFI built for kvmtool, and it appears
> >> to be working correctly. However, I noticed that it is intolerably
> >> slow, which seems to be caused by the fact that both array mode and
> >> command mode (or whatever it is called in the CFI spec) are fully
> >> emulated, whereas in the QEMU implementation (for instance), the
> >> region is actually exposed to the guest using a read-only KVM memslot
> >> in array mode, and so the read accesses are made natively.
> >>
> >> It is also causing problems in the UEFI implementation, as we can no
> >> longer use unaligned accesses to read from the region, which is
> >> something the code currently relies on (and which works fine on actual
> >> hardware as long as you use normal non-cacheable mappings)
> >>
> >
> > Actually, the issue is not alignment. The issue is with instructions
> > with multiple outputs, which means you cannot do an ordinary memcpy()
> > from the NOR region using ldp instructions, aligned or not.
>
> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code.
> My suggestion was to provide a version of memcpy_{from,to}_io(), as
> Linux does, which only uses MMIO accessors to avoid "fancy" instructions.
>

That is possible, and the impact on the code is manageable, given the
modular nature of EDK2.

> Back at this point I was challenging the idea of accessing a flash
> device with a normal memory mapping, because of it failing when being in
> some query mode. Do you know of any best practices for flash mappings?
> Are two mappings common?
>

In the QEMU port of EDK2, we use normal non-cacheable for the first
flash device, which contains the executable image, and is not
updatable by the guest. The second flash bank is used for the variable
store, and is actually mapped as a device all the time.

Another thing I just realized is that you cannot fetch instructions
from an emulated flash device either, so to execute from NOR flash,
you will need a true memory mapping as well.

So in summary, I think the mode switch is needed to be generally
useful, even if the current approach is sufficient for (slow)
read/write using special memory accessors.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3] Add emulation for CFI compatible flash memory

2020-04-15 Thread Ard Biesheuvel
On Wed, 15 Apr 2020 at 18:36, André Przywara  wrote:
>
> On 15/04/2020 17:20, Ard Biesheuvel wrote:
> > On Wed, 15 Apr 2020 at 18:11, André Przywara  wrote:
> >>
> >> On 15/04/2020 16:55, Ard Biesheuvel wrote:
> >>> On Wed, 15 Apr 2020 at 17:43, Ard Biesheuvel  wrote:
> >>>>
> >>>> On Tue, 7 Apr 2020 at 17:15, Alexandru Elisei  
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I've tested this patch by running badblocks and fio on a flash device 
> >>>>> inside a
> >>>>> guest, everything worked as expected.
> >>>>>
> >>>>> I've also looked at the flowcharts for device operation from Intel 
> >>>>> Application
> >>>>> Note 646, pages 12-21, and they seem implemented correctly.
> >>>>>
> >>>>> A few minor issues below.
> >>>>>
> >>>>> On 2/21/20 4:55 PM, Andre Przywara wrote:
> >>>>>> From: Raphael Gault 
> >>>>>>
> >>>>>> The EDK II UEFI firmware implementation requires some storage for the 
> >>>>>> EFI
> >>>>>> variables, which is typically some flash storage.
> >>>>>> Since this is already supported on the EDK II side, we add a CFI flash
> >>>>>> emulation to kvmtool.
> >>>>>> This is backed by a file, specified via the --flash or -F command line
> >>>>>> option. Any flash writes done by the guest will immediately be 
> >>>>>> reflected
> >>>>>> into this file (kvmtool mmap's the file).
> >>>>>> The flash will be limited to the nearest power-of-2 size, so only the
> >>>>>> first 2 MB of a 3 MB file will be used.
> >>>>>>
> >>>>>> This implements a CFI flash using the "Intel/Sharp extended command
> >>>>>> set", as specified in:
> >>>>>> - JEDEC JESD68.01
> >>>>>> - JEDEC JEP137B
> >>>>>> - Intel Application Note 646
> >>>>>> Some gaps in those specs have been filled by looking at real devices 
> >>>>>> and
> >>>>>> other implementations (QEMU, Linux kernel driver).
> >>>>>>
> >>>>>> At the moment this relies on DT to advertise the base address of the
> >>>>>> flash memory (mapped into the MMIO address space) and is only enabled
> >>>>>> for ARM/ARM64. The emulation itself is architecture agnostic, though.
> >>>>>>
> >>>>>> This is one missing piece toward a working UEFI boot with kvmtool on
> >>>>>> ARM guests, the other is to provide writable PCI BARs, which is WIP.
> >>>>>>
> >>>>
> >>>> I have given this a spin with UEFI built for kvmtool, and it appears
> >>>> to be working correctly. However, I noticed that it is intolerably
> >>>> slow, which seems to be caused by the fact that both array mode and
> >>>> command mode (or whatever it is called in the CFI spec) are fully
> >>>> emulated, whereas in the QEMU implementation (for instance), the
> >>>> region is actually exposed to the guest using a read-only KVM memslot
> >>>> in array mode, and so the read accesses are made natively.
> >>>>
> >>>> It is also causing problems in the UEFI implementation, as we can no
> >>>> longer use unaligned accesses to read from the region, which is
> >>>> something the code currently relies on (and which works fine on actual
> >>>> hardware as long as you use normal non-cacheable mappings)
> >>>>
> >>>
> >>> Actually, the issue is not alignment. The issue is with instructions
> >>> with multiple outputs, which means you cannot do an ordinary memcpy()
> >>> from the NOR region using ldp instructions, aligned or not.
> >>
> >> Yes, we traced that down to an "ldrb with post-inc", in the memcpy code.
> >> My suggestion was to provide a version of memcpy_{from,to}_io(), as
> >> Linux does, which only uses MMIO accessors to avoid "fancy" instructions.
> >>
> >
> > That is possible, and the impact on the code is manageable, given the
> > modular nature of EDK2.
> >
> >> Back at this point I was chall

Re: [PATCH v7 7/7] ARM: Enable KASan for ARM

2020-04-10 Thread Ard Biesheuvel
(+ Linus)

On Fri, 10 Apr 2020 at 12:45, Ard Biesheuvel  wrote:
>
> On Fri, 17 Jan 2020 at 23:52, Florian Fainelli  wrote:
> >
> > From: Andrey Ryabinin 
> >
> > This patch enables the kernel address sanitizer for ARM. XIP_KERNEL has
> > not been tested and is therefore not allowed.
> >
> > Acked-by: Dmitry Vyukov 
> > Tested-by: Linus Walleij 
> > Signed-off-by: Abbott Liu 
> > Signed-off-by: Florian Fainelli 
> > ---
> >  Documentation/dev-tools/kasan.rst | 4 ++--
> >  arch/arm/Kconfig  | 9 +
> >  arch/arm/boot/compressed/Makefile | 1 +
> >  drivers/firmware/efi/libstub/Makefile | 3 ++-
> >  4 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/dev-tools/kasan.rst 
> > b/Documentation/dev-tools/kasan.rst
> > index e4d66e7c50de..6acd949989c3 100644
> > --- a/Documentation/dev-tools/kasan.rst
> > +++ b/Documentation/dev-tools/kasan.rst
> > @@ -21,8 +21,8 @@ global variables yet.
> >
> >  Tag-based KASAN is only supported in Clang and requires version 7.0.0 or 
> > later.
> >
> > -Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
> > -architectures, and tag-based KASAN is supported only for arm64.
> > +Currently generic KASAN is supported for the x86_64, arm, arm64, xtensa and
> > +s390 architectures, and tag-based KASAN is supported only for arm64.
> >
> >  Usage
> >  -
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index 96dab76da3b3..70a7eb50984e 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -65,6 +65,7 @@ config ARM
> > select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> > select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && 
> > MMU
> > select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> > +   select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
> > select HAVE_ARCH_MMAP_RND_BITS if MMU
> > select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
> > select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> > @@ -212,6 +213,14 @@ config ARCH_MAY_HAVE_PC_FDC
> >  config ZONE_DMA
> > bool
> >
> > +config KASAN_SHADOW_OFFSET
> > +   hex
> > +   depends on KASAN
> > +   default 0x1f00 if PAGE_OFFSET=0x4000
> > +   default 0x5f00 if PAGE_OFFSET=0x8000
> > +   default 0x9f00 if PAGE_OFFSET=0xC000
> > +   default 0x
> > +
> >  config ARCH_SUPPORTS_UPROBES
> > def_bool y
> >
> > diff --git a/arch/arm/boot/compressed/Makefile 
> > b/arch/arm/boot/compressed/Makefile
> > index 83991a0447fa..efda24b00a44 100644
> > --- a/arch/arm/boot/compressed/Makefile
> > +++ b/arch/arm/boot/compressed/Makefile
> > @@ -25,6 +25,7 @@ endif
> >
> >  GCOV_PROFILE   := n
> >  KASAN_SANITIZE := n
> > +CFLAGS_KERNEL  += -D__SANITIZE_ADDRESS__
> >
> >  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> >  KCOV_INSTRUMENT:= n
> > diff --git a/drivers/firmware/efi/libstub/Makefile 
> > b/drivers/firmware/efi/libstub/Makefile
> > index c35f893897e1..c8b36824189b 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -20,7 +20,8 @@ cflags-$(CONFIG_ARM64):= $(subst 
> > $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> >-fpie $(DISABLE_STACKLEAK_PLUGIN)
> >  cflags-$(CONFIG_ARM)   := $(subst 
> > $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
> >-fno-builtin -fpic \
> > -  $(call cc-option,-mno-single-pic-base)
> > +  $(call cc-option,-mno-single-pic-base) \
> > +  -D__SANITIZE_ADDRESS__
> >
>
> I am not too crazy about this need to unconditionally 'enable' KASAN
> on the command line like this, in order to be able to disable it again
> when CONFIG_KASAN=y.
>
> Could we instead add something like this at the top of
> arch/arm/boot/compressed/string.c?
>
> #ifdef CONFIG_KASAN
> #undef memcpy
> #undef memmove
> #undef memset
> void *__memcpy(void *__dest, __const void *__src, size_t __n) __alias(memcpy);
> void *__memmove(void *__dest, __const void *__src, size_t count)
> __alias(memmove);
> void *__memset(void *s, int c, size_t count) __alias(memset);
> #endif
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 7/7] ARM: Enable KASan for ARM

2020-04-10 Thread Ard Biesheuvel
On Fri, 17 Jan 2020 at 23:52, Florian Fainelli  wrote:
>
> From: Andrey Ryabinin 
>
> This patch enables the kernel address sanitizer for ARM. XIP_KERNEL has
> not been tested and is therefore not allowed.
>
> Acked-by: Dmitry Vyukov 
> Tested-by: Linus Walleij 
> Signed-off-by: Abbott Liu 
> Signed-off-by: Florian Fainelli 
> ---
>  Documentation/dev-tools/kasan.rst | 4 ++--
>  arch/arm/Kconfig  | 9 +
>  arch/arm/boot/compressed/Makefile | 1 +
>  drivers/firmware/efi/libstub/Makefile | 3 ++-
>  4 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/dev-tools/kasan.rst 
> b/Documentation/dev-tools/kasan.rst
> index e4d66e7c50de..6acd949989c3 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -21,8 +21,8 @@ global variables yet.
>
>  Tag-based KASAN is only supported in Clang and requires version 7.0.0 or 
> later.
>
> -Currently generic KASAN is supported for the x86_64, arm64, xtensa and s390
> -architectures, and tag-based KASAN is supported only for arm64.
> +Currently generic KASAN is supported for the x86_64, arm, arm64, xtensa and
> +s390 architectures, and tag-based KASAN is supported only for arm64.
>
>  Usage
>  -
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 96dab76da3b3..70a7eb50984e 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -65,6 +65,7 @@ config ARM
> select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
> select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
> select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
> +   select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL
> select HAVE_ARCH_MMAP_RND_BITS if MMU
> select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT
> select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> @@ -212,6 +213,14 @@ config ARCH_MAY_HAVE_PC_FDC
>  config ZONE_DMA
> bool
>
> +config KASAN_SHADOW_OFFSET
> +   hex
> +   depends on KASAN
> +   default 0x1f00 if PAGE_OFFSET=0x4000
> +   default 0x5f00 if PAGE_OFFSET=0x8000
> +   default 0x9f00 if PAGE_OFFSET=0xC000
> +   default 0x
> +
>  config ARCH_SUPPORTS_UPROBES
> def_bool y
>
> diff --git a/arch/arm/boot/compressed/Makefile 
> b/arch/arm/boot/compressed/Makefile
> index 83991a0447fa..efda24b00a44 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -25,6 +25,7 @@ endif
>
>  GCOV_PROFILE   := n
>  KASAN_SANITIZE := n
> +CFLAGS_KERNEL  += -D__SANITIZE_ADDRESS__
>
>  # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
>  KCOV_INSTRUMENT:= n
> diff --git a/drivers/firmware/efi/libstub/Makefile 
> b/drivers/firmware/efi/libstub/Makefile
> index c35f893897e1..c8b36824189b 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -20,7 +20,8 @@ cflags-$(CONFIG_ARM64):= $(subst 
> $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
>-fpie $(DISABLE_STACKLEAK_PLUGIN)
>  cflags-$(CONFIG_ARM)   := $(subst 
> $(CC_FLAGS_FTRACE),,$(KBUILD_CFLAGS)) \
>-fno-builtin -fpic \
> -  $(call cc-option,-mno-single-pic-base)
> +  $(call cc-option,-mno-single-pic-base) \
> +  -D__SANITIZE_ADDRESS__
>

I am not too crazy about this need to unconditionally 'enable' KASAN
on the command line like this, in order to be able to disable it again
when CONFIG_KASAN=y.

Could we instead add something like this at the top of
arch/arm/boot/compressed/string.c?

#ifdef CONFIG_KASAN
#undef memcpy
#undef memmove
#undef memset
void *__memcpy(void *__dest, __const void *__src, size_t __n) __alias(memcpy);
void *__memmove(void *__dest, __const void *__src, size_t count)
__alias(memmove);
void *__memset(void *s, int c, size_t count) __alias(memset);
#endif
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 12/18] arm64: kernel: Convert to modern annotations for assembly functions

2020-02-28 Thread Ard Biesheuvel
Hi Mark,

On Tue, 18 Feb 2020 at 21:02, Mark Brown  wrote:
>
> In an effort to clarify and simplify the annotation of assembly functions
> in the kernel new macros have been introduced. These replace ENTRY and
> ENDPROC and also add a new annotation for static functions which previously
> had no ENTRY equivalent. Update the annotations in the core kernel code to
> the new macros.
>
> Signed-off-by: Mark Brown 
> ---
>  arch/arm64/kernel/cpu-reset.S |  4 +-
>  arch/arm64/kernel/efi-entry.S |  4 +-
>  arch/arm64/kernel/efi-rt-wrapper.S|  4 +-
>  arch/arm64/kernel/entry-fpsimd.S  | 20 -
>  arch/arm64/kernel/hibernate-asm.S | 16 +++
>  arch/arm64/kernel/hyp-stub.S  | 20 -
>  arch/arm64/kernel/probes/kprobes_trampoline.S |  4 +-
>  arch/arm64/kernel/reloc_test_syms.S   | 44 +--
>  arch/arm64/kernel/relocate_kernel.S   |  4 +-
>  arch/arm64/kernel/sleep.S | 12 ++---
>  arch/arm64/kernel/smccc-call.S|  8 ++--
>  11 files changed, 70 insertions(+), 70 deletions(-)
>
...
> diff --git a/arch/arm64/kernel/efi-entry.S b/arch/arm64/kernel/efi-entry.S
> index 304d5b02ca67..de6ced92950e 100644
> --- a/arch/arm64/kernel/efi-entry.S
> +++ b/arch/arm64/kernel/efi-entry.S
> @@ -25,7 +25,7 @@
>  * we want to be. The kernel image wants to be placed at TEXT_OFFSET
>  * from start of RAM.
>  */
> -ENTRY(entry)
> +SYM_CODE_START(entry)
> /*
>  * Create a stack frame to save FP/LR with extra space
>  * for image_addr variable passed to efi_entry().
> @@ -117,4 +117,4 @@ efi_load_fail:
> ret
>
>  entry_end:
> -ENDPROC(entry)
> +SYM_CODE_END(entry)

This hunk is going to conflict badly with the EFI tree. I will
incorporate this change for v5.7, so could you please just drop it
from this patch?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 3/3] arm64: Ask the compiler to __always_inline functions used by KVM at HYP

2020-02-21 Thread Ard Biesheuvel
On Fri, 21 Feb 2020 at 14:13, Will Deacon  wrote:
>
> On Thu, Feb 20, 2020 at 04:58:39PM +, James Morse wrote:
> > KVM uses some of the static-inline helpers like icache_is_vipt() from
> > its HYP code. This assumes the function is inlined so that the code is
> > mapped to EL2. The compiler may decide not to inline these, and the
> > out-of-line version may not be in the __hyp_text section.
> >
> > Add the additional __always_ hint to these static-inlines that are used
> > by KVM.
> >
> > Signed-off-by: James Morse 
> > ---
> >  arch/arm64/include/asm/cache.h  | 2 +-
> >  arch/arm64/include/asm/cacheflush.h | 2 +-
> >  arch/arm64/include/asm/cpufeature.h | 8 
> >  arch/arm64/include/asm/io.h | 4 ++--
> >  4 files changed, 8 insertions(+), 8 deletions(-)
>
> Acked-by: Will Deacon 
>
> It's the right thing to do, but if this stuff keeps trickling in then
> we should make CONFIG_OPTIMIZE_INLINING depend on !ARM64 because seeing
> "__always_inline" tells you nothing about /why/ it needs to be there and
> it's hard to know if/when you can remove those annotations in future.
>

We might need to follow the same approach as we took for the EFI stub,
and create a special __kvm_hyp symbol namespace so that we can
carefully control which routines from the kernel proper it has access
to.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Ask the compiler to __always_inline functions used by KVM at HYP

2020-02-20 Thread Ard Biesheuvel
On Thu, 20 Feb 2020 at 18:33, James Morse  wrote:
>
> Hi Ard,
>
> On 20/02/2020 17:04, Ard Biesheuvel wrote:
> > On Thu, 20 Feb 2020 at 17:58, James Morse  wrote:
> >> It turns out KVM relies on the inline hint being honoured by the compiler
> >> in quite a few more places than expected. Something about the Shadow Call
> >> Stack support[0] causes the compiler to avoid inline-ing and to place
> >> these functions outside the __hyp_text. This ruins KVM's day.
> >>
> >> Add the simon-says __always_inline annotation to all the static
> >> inlines that KVM calls from HYP code.
>
> > This isn't quite as yuck as I expected, fortunately, but it does beg
> > the question whether we shouldn't simply map the entire kernel at EL2
> > instead?
>
> If the kernel is big enough to need internal veneers (the 128M range?), these 
> would
> certainly go horribly wrong because its running somewhere other than the 
> relocation-time
> address. We would need a way of telling the linker to keep the bits of KVM 
> close together...
>

Ah, of course, there is that as well ...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/3] KVM: arm64: Ask the compiler to __always_inline functions used by KVM at HYP

2020-02-20 Thread Ard Biesheuvel
On Thu, 20 Feb 2020 at 17:58, James Morse  wrote:
>
> Hello!
>
> It turns out KVM relies on the inline hint being honoured by the compiler
> in quite a few more places than expected. Something about the Shadow Call
> Stack support[0] causes the compiler to avoid inline-ing and to place
> these functions outside the __hyp_text. This ruins KVM's day.
>
> Add the simon-says __always_inline annotation to all the static
> inlines that KVM calls from HYP code.
>
> This series based on v5.6-rc2.
>

This isn't quite as yuck as I expected, fortunately, but it does beg
the question whether we shouldn't simply map the entire kernel at EL2
instead?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Memory regions and VMAs across architectures

2019-11-08 Thread Ard Biesheuvel
On 11/8/19 12:19 PM, Christoffer Dall wrote:
> Hi,
>
> I had a look at our relatively complicated logic in
> kvm_arch_prepare_memory_region(), and was wondering if there was room to
> unify some of this handling between architectures.
>
> (If you haven't seen our implementation, you can find it in
> virt/kvm/arm/mmu.c, and it has lovely ASCII art!)
>
> I then had a look at the x86 code, but that doesn't actually do anything
> when creating memory regions, which makes me wonder why the arhitectures
> differ in this aspect.
>
> The reason we added the logic that we have for arm/arm64 is that we
> don't really want to take faults for I/O accesses.  I'm not actually
> sure if this is a corretness thing, or an optimization effort, and the
> original commit message doesn't really explain.  Ard, you wrote that
> code, do you recall the details?
>

I have a vague recollection of implementing execution from read-only
guest memory in order to support execute-in-place from emulated NOR
flash in UEFI, and going down a rabbit hole debugging random, seemingly
unrelated crashes in the host which turned out to be caused by the zero
page getting corrupted because it was mapped read-write in the guest to
back uninitialized regions of the NOR flash.

That doesn't quite answer your question, though - I think it was just an
optimization ...

> In any case, what we do is to check for each VMA backing a memslot, we
> check if the memslot flags and vma flags are a reasonable match, and we
> try to detect I/O mappings by looking for the VM_PFNMAP flag on the VMA
> and pre-populate stage 2 page tables (our equivalent of EPT/NPT/...).
> However, there are some things which are not clear to me:
>
> First, what prevents user space from messing around with the VMAs after
> kvm_arch_prepare_memory_region() completes?  If nothing, then what is
> the value of the cheks we perform wrt. to VMAs?
>
> Second, why would arm/arm64 need special handling for I/O mappings
> compared to other architectures, and how is this dealt with for
> x86/s390/power/... ?
>
>
> Thanks,
>
>  Christoffer
>

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/2] arm64: kvm: expose sanitised cache type register to guest

2019-01-31 Thread Ard Biesheuvel
We currently permit CPUs in the same system to deviate in the exact
topology of the caches, and we subsequently hide this fact from user
space by exposing a sanitised value of the cache type register CTR_EL0.

However, guests running under KVM see the bare value of CTR_EL0, which
could potentially result in issues with, e.g., JITs or other pieces of
code that are sensitive to misreported cache line sizes.

So let's start trapping cache ID instructions if there is a mismatch,
and expose the sanitised version of CTR_EL0 to guests. Note that CTR_EL0
is treated as an invariant to KVM user space, so update that part as well.

Acked-by: Christoffer Dall 
Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/kvm_emulate.h |  3 +
 arch/arm64/include/asm/sysreg.h  |  1 +
 arch/arm64/kvm/sys_regs.c| 59 +++-
 3 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 506386a3edde..87f9c1b6387e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -77,6 +77,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 */
if (!vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID3;
+
+   if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE))
+   vcpu->arch.hcr_el2 |= HCR_TID2;
 }
 
 static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 72dc4c011014..6726a4c98ce7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -361,6 +361,7 @@
 
 #define SYS_CNTKCTL_EL1sys_reg(3, 0, 14, 1, 0)
 
+#define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0)
 #define SYS_CLIDR_EL1  sys_reg(3, 1, 0, 0, 1)
 #define SYS_AIDR_EL1   sys_reg(3, 1, 0, 0, 7)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e3e37228ae4e..1312aebf74e6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1148,6 +1148,49 @@ static int set_raz_id_reg(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *rd,
return __set_id_reg(rd, uaddr, true);
 }
 
+static bool access_ctr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *r)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, r);
+
+   p->regval = read_sanitised_ftr_reg(SYS_CTR_EL0);
+   return true;
+}
+
+static bool access_clidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+const struct sys_reg_desc *r)
+{
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, r);
+
+   p->regval = read_sysreg(clidr_el1);
+   return true;
+}
+
+static bool access_csselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+   if (p->is_write)
+   vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+   else
+   p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+   return true;
+}
+
+static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+   u32 csselr;
+
+   if (p->is_write)
+   return write_to_read_only(vcpu, p, r);
+
+   csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
+   p->regval = get_ccsidr(csselr);
+   return true;
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {   \
SYS_DESC(SYS_##name),   \
@@ -1365,7 +1408,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
{ SYS_DESC(SYS_CNTKCTL_EL1), NULL, reset_val, CNTKCTL_EL1, 0},
 
-   { SYS_DESC(SYS_CSSELR_EL1), NULL, reset_unknown, CSSELR_EL1 },
+   { SYS_DESC(SYS_CCSIDR_EL1), access_ccsidr },
+   { SYS_DESC(SYS_CLIDR_EL1), access_clidr },
+   { SYS_DESC(SYS_CSSELR_EL1), access_csselr, reset_unknown, CSSELR_EL1 },
+   { SYS_DESC(SYS_CTR_EL0), access_ctr },
 
{ SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, },
{ SYS_DESC(SYS_PMCNTENSET_EL0), access_pmcnten, reset_unknown, 
PMCNTENSET_EL0 },
@@ -1665,6 +1711,7 @@ static const struct sys_reg_desc cp14_64_regs[] = {
  * register).
  */
 static const struct sys_reg_desc cp15_regs[] = {
+   { Op1( 0), CRn( 0), CRm( 0), Op2( 1), access_ctr },
{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_vm_reg, NULL, c1_SCTLR },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
@@ -1782,6 +1829,10 @@ static const struct sys_reg_desc cp15_regs[] = {
PMU_PMEVTYPER(30),
/* PMCCFILTR */
{ Op1(0), CRn(14), CRm(15), Op2(7), access_pmu_evtyper },
+
+   { Op1(1), 

[PATCH v2 2/2] arm64: kvm: describe data or unified caches as having 1 set and 1 way

2019-01-31 Thread Ard Biesheuvel
On SMP ARM systems, cache maintenance by set/way should only ever be
done in the context of onlining or offlining CPUs, which is typically
done by bare metal firmware and never in a virtual machine. For this
reason, we trap set/way cache maintenance operations and replace them
with conditional flushing of the entire guest address space.

Due to this trapping, the set/way arguments passed into the set/way
ops are completely ignored, and thus irrelevant. This also means that
the set/way geometry is equally irrelevant, and we can simply report
it as 1 set and 1 way, so that legacy 32-bit ARM system software (i.e.,
the kind that only receives odd fixes) doesn't take a performance hit
due to the trapping when iterating over the cachelines.

Acked-by: Christoffer Dall 
Signed-off-by: Ard Biesheuvel 
---
 arch/arm64/include/asm/kvm_emulate.h |  3 ++-
 arch/arm64/kvm/sys_regs.c| 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 87f9c1b6387e..c450b15511b7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -78,7 +78,8 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
if (!vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID3;
 
-   if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE))
+   if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
+   vcpu_el1_is_32bit(vcpu))
vcpu->arch.hcr_el2 |= HCR_TID2;
 }
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1312aebf74e6..5882e3410acc 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1188,6 +1188,21 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
 
csselr = vcpu_read_sys_reg(vcpu, CSSELR_EL1);
p->regval = get_ccsidr(csselr);
+
+   /*
+* Guests should not be doing cache operations by set/way at all, and
+* for this reason, we trap them and attempt to infer the intent, so
+* that we can flush the entire guest's address space at the appropriate
+* time.
+* To prevent this trapping from causing performance problems, let's
+* expose the geometry of all data and unified caches (which are
+* guaranteed to be PIPT and thus non-aliasing) as 1 set and 1 way.
+* [If guests should attempt to infer aliasing properties from the
+* geometry (which is not permitted by the architecture), they would
+* only do so for virtually indexed caches.]
+*/
+   if (!(csselr & 1)) // data or unified cache
+   p->regval &= ~GENMASK(27, 3);
return true;
 }
 
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/2] arm64: kvm: cache ID register trapping

2019-01-31 Thread Ard Biesheuvel
While looking into whether we could report the cache geometry as 1 set
and 1 way so that the ARM kernel doesn't stall for 13 seconds at boot,
I noticed that we don't expose the sanitised version of CTR_EL0 to guests,
so I fixed that first (#1)

Since that gives us most of the groundwork for overriding the cache
geometry, it is a fairly trivial change (#2) to clear the set/way
fields in the CCSIDR register so that it describes 1 set and 1 way.

Changes since v1:
- fix incorrect mask value (#2)
- make trapping conditional on whether either of the issues is being
  worked around, and it disabled otherwise
- add Christoffer's ack

Cc: suzuki.poul...@arm.com
Cc: marc.zyng...@arm.com
Cc: christoffer.d...@arm.com

Ard Biesheuvel (2):
  arm64: kvm: expose sanitised cache type register to guest
  arm64: kvm: describe data or unified caches as having 1 set and 1 way

 arch/arm64/include/asm/kvm_emulate.h |  4 ++
 arch/arm64/include/asm/sysreg.h  |  1 +
 arch/arm64/kvm/sys_regs.c| 74 +++-
 3 files changed, 77 insertions(+), 2 deletions(-)

-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: kvm-unit-tests gicv2 cases fail

2018-10-18 Thread Ard Biesheuvel
On 19 October 2018 at 11:25, Andrew Jones  wrote:
> On Thu, Oct 18, 2018 at 05:17:57PM +0800, Li Zhang wrote:
>> Hi,
>>
>> I run kvm-units-test on ARM server(QDF2400), but gicv2-ipi,
>> gicv2-active cases fail.
>> By debuging kvm-unit-tests source code, I found that interrupt is not
>> handled. do_handle_exception is not called.
>>
>> By looking into KVM source code, it has GICv2 emulation on GICv3 support.
>> I've tried a lot of kernel versions, from v4.12~v4.19 in mainline,
>> these cases always fail.
>> Is it possible that hardware/software disables this?
>>
>> It seems that GICv2 is a little old for ARM server, I am confused when
>> we can use it.
>> Do we need to care about these cases?
>
> We care, but I haven't seen these issues, and, if this only reproduces on
> a gicv2 emulation on gicv3 configuration, then I don't have any hardware
> where I can reproduce/debug it.
>

Do you have access to a Socionext SynQuacer board? I use it regularly
with QEMU/KVM running both in GICv2 and GICv3 modes, and I haven't
noticed any issues.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: drop resource size check for GICV window

2018-06-09 Thread Ard Biesheuvel


> On 9 Jun 2018, at 12:06, Christoffer Dall  wrote:
> 
>> On Fri, Jun 01, 2018 at 05:06:28PM +0200, Ard Biesheuvel wrote:
>> When booting a 64 KB pages kernel on a ACPI GICv3 system that
>> implements support for v2 emulation, the following warning is
>> produced
>> 
>>  GICV size 0x2000 not a multiple of page size 0x1
>> 
>> and support for v2 emulation is disabled, preventing GICv2 VMs
>> from being able to run on such hosts.
>> 
>> The reason is that vgic_v3_probe() performs a sanity check on the
>> size of the window (it should be a multiple of the page size),
>> while the ACPI MADT parsing code hardcodes the size of the window
>> to 8 KB. This makes sense, considering that ACPI does not bother
>> to describe the size in the first place, under the assumption that
>> platforms implementing ACPI will follow the architecture and not
>> put anything else in the same 64 KB window.
> 
> Does the architecture actually say that anywhere?
> 
>> 
>> So let's just drop the sanity check altogether, and assume that
>> the window is at least 64 KB in size.
> 
> This could obviously be dangerous if broken systems actually exist.
> Marc may know more about that than me.  An alternative would be to
> modify the ACPI code to assume max(8 KB, page size) instead, and/or a
> command line parameter to override this check.
> 
> That said, I'm not directly opposed to this patch, but I'll let Marc
> have a look as well.
> 

This approach was actually Marc’s idea, and he already applied the patch to the 
queue branch afaik.


> 
>> 
>> Fixes: 909777324588 ("KVM: arm/arm64: vgic-new: vgic_init: implement 
>> kvm_vgic_hyp_init")
>> Signed-off-by: Ard Biesheuvel 
>> ---
>> virt/kvm/arm/vgic/vgic-v3.c | 5 -
>> 1 file changed, 5 deletions(-)
>> 
>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>> index bdcf8e7a6161..72fc688c3e9d 100644
>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>> @@ -552,11 +552,6 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
>>pr_warn("GICV physical address 0x%llx not page aligned\n",
>>(unsigned long long)info->vcpu.start);
>>kvm_vgic_global_state.vcpu_base = 0;
>> -} else if (!PAGE_ALIGNED(resource_size(>vcpu))) {
>> -pr_warn("GICV size 0x%llx not a multiple of page size 0x%lx\n",
>> -(unsigned long long)resource_size(>vcpu),
>> -PAGE_SIZE);
>> -kvm_vgic_global_state.vcpu_base = 0;
>>} else {
>>kvm_vgic_global_state.vcpu_base = info->vcpu.start;
>>kvm_vgic_global_state.can_emulate_gicv2 = true;
>> -- 
>> 2.17.0
>> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: drop resource size check for GICV window

2018-06-01 Thread Ard Biesheuvel
When booting a 64 KB pages kernel on a ACPI GICv3 system that
implements support for v2 emulation, the following warning is
produced

  GICV size 0x2000 not a multiple of page size 0x1

and support for v2 emulation is disabled, preventing GICv2 VMs
from being able to run on such hosts.

The reason is that vgic_v3_probe() performs a sanity check on the
size of the window (it should be a multiple of the page size),
while the ACPI MADT parsing code hardcodes the size of the window
to 8 KB. This makes sense, considering that ACPI does not bother
to describe the size in the first place, under the assumption that
platforms implementing ACPI will follow the architecture and not
put anything else in the same 64 KB window.

So let's just drop the sanity check altogether, and assume that
the window is at least 64 KB in size.

Fixes: 909777324588 ("KVM: arm/arm64: vgic-new: vgic_init: implement 
kvm_vgic_hyp_init")
Signed-off-by: Ard Biesheuvel 
---
 virt/kvm/arm/vgic/vgic-v3.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index bdcf8e7a6161..72fc688c3e9d 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -552,11 +552,6 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
pr_warn("GICV physical address 0x%llx not page aligned\n",
(unsigned long long)info->vcpu.start);
kvm_vgic_global_state.vcpu_base = 0;
-   } else if (!PAGE_ALIGNED(resource_size(>vcpu))) {
-   pr_warn("GICV size 0x%llx not a multiple of page size 0x%lx\n",
-   (unsigned long long)resource_size(>vcpu),
-   PAGE_SIZE);
-   kvm_vgic_global_state.vcpu_base = 0;
} else {
kvm_vgic_global_state.vcpu_base = info->vcpu.start;
kvm_vgic_global_state.can_emulate_gicv2 = true;
-- 
2.17.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: Clang arm64 build is broken

2018-04-20 Thread Ard Biesheuvel
On 20 April 2018 at 17:38, Mark Rutland  wrote:
> Hi Andrey,
>
> On Fri, Apr 20, 2018 at 04:59:35PM +0200, Andrey Konovalov wrote:
>> On Fri, Apr 20, 2018 at 10:13 AM, Marc Zyngier  wrote:
>> >> The issue is that
>> >> clang doesn't know about the "S" asm constraint. I reported this to
>> >> clang [2], and hopefully this will get fixed. In the meantime, would
>> >> it possible to work around using the "S" constraint in the kernel?
>> >
>> > I have no idea, I've never used clang to build the kernel. Clang isn't
>> > really supported to build the arm64 kernel anyway (as you mention
>> > below), and working around clang deficiencies would mean that we leave
>> > with the workaround forever. I'd rather enable clang once it is at
>> > feature parity with GCC.
>>
>> The fact that there are some existing issues with building arm64
>> kernel with clang doesn't sound like a good justification for adding
>> new issues :)
>
> I appreciate this is somewhat frustrating, but every feature where clang
> is not at parity with GCC is effectively a functional regression for us.
>
> Recently, the code that clang hasn't liked happens to be security
> critical, and it is somewhat difficult to justify making that code more
> complex to cater for a compiler that we know has outstanding issues with
> features we rely upon.
>
> Which is to say, I'm not sure that there's much justification either
> way. We really need clang to be at feature parity with GCC for it to be
> considered supported.
>
> It would be great if some effort could be focussed on bringing clang to
> feature parity with GCC before implementing new clang-specific features.
>
>> However in this case I do believe that this is more of a bug in clang
>> that should be fixed.
>
> Just to check: does clang implement the rest of the AArch64 machine
> constraints [1]?
>
> We're liable to use more of them in future, and we should aim for parity
> now so that we don't fall into the same trap in future.
>
>> >> While we're here, regarding the other issue with kvm [3], I didn't
>> >> receive any comments as to whether it makes sense to send the fix that
>> >> adds -fno-jump-tables flag when building kvm with clang.
>> >
>> > Is that the only thing missing? Are you sure that there is no other way
>> > for clang to generate absolute addresses that will then lead to a crash?
>> > Again, I'd rather make sure we have the full picture.
>>
>> Well, I have tried applying that patch and running kvm tests that I
>> could find [1], and they passed (actually I think there was an issue
>> with one of them, but I saw the same thing when I tried running them
>> on a kernel built with GCC).
>
> I think what Marc wants is a statement as to whether -fno-jump-tables is
> sufficient to ensure that clang will not try to use absolute addressing,
> based on an understanding of the AArch64 LLVM backend rather than test
> cases.
>
> For example, could any other pass result in the use of an absolute
> address? Or are jump tables the *only* reason that clang would try to
> use an absolute address.
>
> Are there other options that we might need to pass?
>
> Are there any options that we can pass to forbid absolute addressing?
>

One thing to note here is that it is not generally possible to inhibit
all absolute references, given that statically initialized pointer
variables will result in R_AARCH64_ABS64 relocations in any case. So
what we are after (and we should align this with the GCC team as well)
is a code model option or some other flag that prevents the compiler
from *generating code* that relies on absolute addressing, given that
we have no control over this (whereas it is feasible to avoid
statically initialized pointer variables in code that is expected to
execute at a memory offset that is different from the kernel's runtime
memory offset)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: arm64 kvm built with clang doesn't boot

2018-03-17 Thread Ard Biesheuvel
(+ Thomas)

On 16 March 2018 at 17:13, Mark Rutland  wrote:
> On Fri, Mar 16, 2018 at 04:52:08PM +, Nick Desaulniers wrote:
>> + Sami (Google), Takahiro (Linaro)
>>
>> Just so I fully understand the problem enough to articulate it, we'd be
>> looking for the compiler to keep the jump tables for speed (I would guess
>> -fno-jump-tables would emit an if-else chain) but only emit relative jumps
>> (not absolute jumps)?
>
> Our main concern is that there is no absolute addressing. If that rules
> out using a relative jump table, that's ok.
>
> We want to avoid the fragility of collecting -f-no-* options as future
> compiler transformations end up introducing absolute addressing.
>

This all comes back to the assumptions made by the compiler when
building PIC/PIE code, i.e., that symbols should be preemptible and
thus all references should be indirected via GOT entries, and that
text relocations should be avoided.

If we had a way to tell the compiler that these concerns do not apply
for us, we could use -fpic/-fpie in the kernel and be done with it.
-fvisibility=hidden *almost* gives us what we need, but in practice,
only the #pragma variant (#pragma GCC visibility push (hidden)) makes
-fpic behave in a sensible way for freestanding builds, and gets rid
of absolute references where possible (note that statically
initialized pointer variables always involve absolute relocations)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 06/26] KVM: arm/arm64: Do not use kern_hyp_va() with kvm_vgic_global_state

2018-03-16 Thread Ard Biesheuvel
On 16 March 2018 at 11:35, Andrew Jones  wrote:
> On Fri, Mar 16, 2018 at 09:31:57AM +, Marc Zyngier wrote:
>> On 15/03/18 19:16, James Morse wrote:
>> >
>> > (I had a go at using 'Ush', to let the compiler schedule the adrp, but 
>> > couldn't
>> > get it to work.)
>>
>> I tried that as well at some point, but couldn't see how to use it. The
>> compiler was never happy with my use of the constraints, so I gave up
>> and did it my way...
>>
>
> What's 'Ush'? I tried to search for it, but came up short. I'm wondering
> what things I can try (and fail) to use it on too :-)
>

https://gcc.gnu.org/onlinedocs/gccint/Machine-Constraints.html
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 09/41] KVM: arm64: Defer restoring host VFP state to vcpu_put

2018-02-14 Thread Ard Biesheuvel
On 14 February 2018 at 17:38, Christoffer Dall
 wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote:
>> [CC Ard, in case he has a view on how much we care about softirq NEON
>> performance regressions ... and whether my suggestions make sense]
>>
>> On Wed, Feb 14, 2018 at 11:15:54AM +0100, Christoffer Dall wrote:
>> > On Tue, Feb 13, 2018 at 02:08:47PM +, Dave Martin wrote:
>> > > On Tue, Feb 13, 2018 at 09:51:30AM +0100, Christoffer Dall wrote:
>> > > > On Fri, Feb 09, 2018 at 03:59:30PM +, Dave Martin wrote:
>
> [...]
>
>> > >
>> > > kvm_fpsimd_flush_cpu_state() is just an invalidation.  No state is
>> > > actually saved today because we explicitly don't care about preserving
>> > > the SVE state, because the syscall ABI throws the SVE regs away as
>> > > a side effect any syscall including ioctl(KVM_RUN); also (currently) KVM
>> > > ensures that the non-SVE FPSIMD bits _are_ restored by itself.
>> > >
>> > > I think my proposal is that this hook might take on the role of
>> > > actually saving the state too, if we move that out of the KVM host
>> > > context save/restore code.
>> > >
>> > > Perhaps we could even replace
>> > >
>> > >   preempt_disable();
>> > >   kvm_fpsimd_flush_cpu_state();
>> > >   /* ... */
>> > >   preempt_enable();
>> > >
>> > > with
>> > >
>> > >   kernel_neon_begin();
>> > >   /* ... */
>> > >   kernel_neon_end();
>> >
>> > I'm not entirely sure where the begin and end points would be in the
>> > context of KVM?
>>
>> Hmmm, actually there's a bug in your VHE changes now I look more
>> closely in this area:
>>
>> You assume that the only way for the FPSIMD regs to get unexpectedly
>> dirtied is through a context switch, but actually this is not the case:
>> a softirq can use kernel-mode NEON any time that softirqs are enabled.
>>
>> This means that in between kvm_arch_vcpu_load() and _put() (whether via
>> preempt notification or not), the guest's FPSIMD state in the regs may
>> be trashed by a softirq.
>
> ouch.
>
>>
>> The simplest fix is to disable softirqs and preemption for that whole
>> region, but since we can stay in it indefinitely that's obviously not
>> the right approach.  Putting kernel_neon_begin() in _load() and
>> kernel_neon_end() in _put() achieves the same without disabling
>> softirq, but preemption is still disabled throughout, which is bad.
>> This effectively makes the run ioctl nonpreemptible...
>>
>> A better fix would be to set the cpu's kernel_neon_busy flag, which
>> makes softirq code use non-NEON fallback code.
>>
>> We could expose an interface from fpsimd.c to support that.
>>
>> It still comes at a cost though: due to the switching from NEON to
>> fallback code in softirq handlers, we may get a big performance
>> regression in setups that rely heavily on NEON in softirq for
>> performance.
>>
>
> I wasn't aware that softirqs would use fpsimd.
>

It is not common but it is permitted by the API, and there is mac80211
code and IPsec code that does this.

Performance penalties incurred by switching from accelerated h/w
instruction based crypto to scalar code can be as high as 20x, so we
should really avoid this if we can.

>>
>> Alternatively we could do something like the following, but it's a
>> rather gross abstraction violation:
>>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 2e43f9d..6a1ff3a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -746,9 +746,24 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>* the effect of taking the interrupt again, in SVC
>>* mode this time.
>>*/
>> + local_bh_disable();
>>   local_irq_enable();
>>
>>   /*
>> +  * If we exited due to one or mode pending interrupts, they
>> +  * have now been handled.  If such an interrupt pended a
>> +  * softirq, we shouldn't prevent that softirq from using
>> +  * kernel-mode NEON indefinitely: instead, give FPSIMD back to
>> +  * the host to manage as it likes.  We'll grab it again on the
>> +  * next FPSIMD trap from the guest (if any).
>> +  */
>> + if (local_softirq_pending() && FPSIMD untrapped for guest) {
>> + /* save vcpu FPSIMD context */
>> + /* enable FPSIMD trap for guest */
>> + }
>> + local_bh_enable();
>> +
>> + /*
>>* We do local_irq_enable() before calling guest_exit() so
>>* that if a timer interrupt hits while running the guest we
>>* account that tick as being spent in the guest.  We enable
>>
>> [...]
>>
>
> I can't see this working, what if an IRQ comes in and a softirq gets
> pending immediately after local_bh_enable() above?
>
> And as you say, it's really not pretty.
>
> This is really making me think that I'll drop this part of the

Re: [PATCH v3 14/18] firmware/psci: Expose SMCCC version through psci_ops

2018-02-01 Thread Ard Biesheuvel
On 1 February 2018 at 11:46, Marc Zyngier  wrote:
> Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed,
> let's do that at boot time, and expose the version of the calling
> convention as part of the psci_ops structure.
>
> Acked-by: Lorenzo Pieralisi 
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/firmware/psci.c | 19 +++
>  include/linux/psci.h|  6 ++
>  2 files changed, 25 insertions(+)
>
> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index e9493da2b111..8631906c414c 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -61,6 +61,7 @@ bool psci_tos_resident_on(int cpu)
>
>  struct psci_operations psci_ops = {
> .conduit = PSCI_CONDUIT_NONE,
> +   .smccc_version = SMCCC_VERSION_1_0,
>  };
>
>  typedef unsigned long (psci_fn)(unsigned long, unsigned long,
> @@ -511,6 +512,23 @@ static void __init psci_init_migrate(void)
> pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid);
>  }
>
> +static void __init psci_init_smccc(u32 ver)
> +{
> +   int feature;
> +
> +   feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID);
> +
> +   if (feature != PSCI_RET_NOT_SUPPORTED) {
> +   ver = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0);
> +   if (ver != ARM_SMCCC_VERSION_1_1)
> +   psci_ops.smccc_version = SMCCC_VERSION_1_0;
> +   else
> +   psci_ops.smccc_version = SMCCC_VERSION_1_1;
> +   }
> +
> +   pr_info("SMC Calling Convention v1.%d\n", psci_ops.smccc_version);

This is a bit nasty: you are returning the numeric value of the enum
as the minor number, and hardcoding the major version number as 1,
while the return value of ARM_SMCCC_VERSION_FUNC_ID gives you the
exact numbers. I assume nobody is expecting SMCCC v2.3 anytime soon,
but it would still be a lot nicer to simply decode the value of 'ver'
(and make it default to ARM_SMCCC_VERSION_1_0 if the PSCI feature call
fails)


> +}
> +
>  static void __init psci_0_2_set_functions(void)
>  {
> pr_info("Using standard PSCI v0.2 function IDs\n");
> @@ -559,6 +577,7 @@ static int __init psci_probe(void)
> psci_init_migrate();
>
> if (PSCI_VERSION_MAJOR(ver) >= 1) {
> +   psci_init_smccc(ver);
> psci_init_cpu_suspend();
> psci_init_system_suspend();
> }
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index f2679e5faa4f..8b1b3b5935ab 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -31,6 +31,11 @@ enum psci_conduit {
> PSCI_CONDUIT_HVC,
>  };
>
> +enum smccc_version {
> +   SMCCC_VERSION_1_0,
> +   SMCCC_VERSION_1_1,
> +};
> +
>  struct psci_operations {
> u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);
> @@ -41,6 +46,7 @@ struct psci_operations {
> unsigned long lowest_affinity_level);
> int (*migrate_info_type)(void);
> enum psci_conduit conduit;
> +   enum smccc_version smccc_version;
>  };
>
>  extern struct psci_operations psci_ops;
> --
> 2.14.2
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 00/18] arm64: Add SMCCC v1.1 support and CVE-2017-5715 (Spectre variant 2) mitigation

2018-02-01 Thread Ard Biesheuvel
On 1 February 2018 at 11:46, Marc Zyngier <marc.zyng...@arm.com> wrote:
> ARM has recently published a SMC Calling Convention (SMCCC)
> specification update[1] that provides an optimised calling convention
> and optional, discoverable support for mitigating CVE-2017-5715. ARM
> Trusted Firmware (ATF) has already gained such an implementation[2].
>
> This series addresses a few things:
>
> - It provides a KVM implementation of PSCI v1.0, which is a
>   prerequisite for being able to discover SMCCC v1.1, together with a
>   new userspace API to control the PSCI revision number that the guest
>   sees.
>
> - It allows KVM to advertise SMCCC v1.1, which is de-facto supported
>   already (it never corrupts any of the guest registers).
>
> - It implements KVM support for the ARCH_WORKAROUND_1 function that is
>   used to mitigate CVE-2017-5715 in a guest (if such mitigation is
>   available on the host).
>
> - It implements SMCCC v1.1 and ARCH_WORKAROUND_1 discovery support in
>   the kernel itself.
>
> - It finally provides firmware callbacks for CVE-2017-5715 for both
>   kernel and KVM and drop the initial PSCI_GET_VERSION based
>   mitigation.
>
> Patch 1 is already merged, and included here for reference. Patches on
> top of arm64/for-next/core. Tested on Seattle and Juno, the latter
> with ATF implementing SMCCC v1.1.
>
> [1]: https://developer.arm.com/support/security-update/downloads/
>
> [2]: https://github.com/ARM-software/arm-trusted-firmware/pull/1240
>
> * From v2:
>   - Fixed SMC handling in KVM
>   - PSCI fixes and tidying up
>   - SMCCC primitive rework for better code generation (both efficiency
>   and correctness)
>   - Remove PSCI_GET_VERSION as a mitigation vector
>
> * From v1:
>   - Fixed 32bit build
>   - Fix function number sign extension (Ard)
>   - Inline SMCCC v1.1 primitives (cpp soup)
>   - Prevent SMCCC spamming on feature probing
>   - Random fixes and tidying up
>
> Marc Zyngier (18):
>   arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
>   arm: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
>   arm64: KVM: Increment PC after handling an SMC trap
>   arm/arm64: KVM: Consolidate the PSCI include files
>   arm/arm64: KVM: Add PSCI_VERSION helper
>   arm/arm64: KVM: Add smccc accessors to PSCI code
>   arm/arm64: KVM: Implement PSCI 1.0 support
>   arm/arm64: KVM: Add PSCI version selection API
>   arm/arm64: KVM: Advertise SMCCC v1.1
>   arm/arm64: KVM: Turn kvm_psci_version into a static inline
>   arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support
>   arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
>   firmware/psci: Expose PSCI conduit
>   firmware/psci: Expose SMCCC version through psci_ops
>   arm/arm64: smccc: Make function identifiers an unsigned quantity
>   arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
>   arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
>   arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
>

I have given this a spin on my Overdrive, and everything seems to work
as expected, both in the host and in the guest (I single stepped
through the guest to ensure that it gets the expected answer from the
SMCCC feature info call)

Tested-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 15/18] arm/arm64: smccc: Make function identifiers an unsigned quantity

2018-02-01 Thread Ard Biesheuvel
On 1 February 2018 at 12:40, Robin Murphy <robin.mur...@arm.com> wrote:
> On 01/02/18 11:46, Marc Zyngier wrote:
>>
>> Function identifiers are a 32bit, unsigned quantity. But we never
>> tell so to the compiler, resulting in the following:
>>
>>   4ac:   b26187e0mov x0, #0x8001
>>
>> We thus rely on the firmware narrowing it for us, which is not
>> always a reasonable expectation.
>
>
> I think technically it might be OK, since SMCCC states "A Function
> Identifier is passed in register W0.", which implies that a conforming
> implementation should also read w0, not x0, but it's certainly far easier to
> be completely right than to justify being possibly wrong.
>
> Reviewed-by: Robin Murphy <robin.mur...@arm.com>
>

In my case, the function identifier wasn't the issue, but the
argument, which, for SMCCC_ARCH_FEATURES is also defined as uint32_t,
but did end up being interpreted incorrectly by the SMCCCv1.1
implementation that is now upstream in ARM-TF



>
>> Cc: sta...@vger.kernel.org
>> Reported-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> Tested-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>> ---
>>   include/linux/arm-smccc.h | 6 --
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
>> index e1ef944ef1da..dd44d8458c04 100644
>> --- a/include/linux/arm-smccc.h
>> +++ b/include/linux/arm-smccc.h
>> @@ -14,14 +14,16 @@
>>   #ifndef __LINUX_ARM_SMCCC_H
>>   #define __LINUX_ARM_SMCCC_H
>>   +#include 
>> +
>>   /*
>>* This file provides common defines for ARM SMC Calling Convention as
>>* specified in
>>* http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html
>>*/
>>   -#define ARM_SMCCC_STD_CALL   0
>> -#define ARM_SMCCC_FAST_CALL1
>> +#define ARM_SMCCC_STD_CALL _AC(0,U)
>> +#define ARM_SMCCC_FAST_CALL_AC(1,U)
>>   #define ARM_SMCCC_TYPE_SHIFT  31
>> #define ARM_SMCCC_SMC_320
>>
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 16/16] arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

2018-01-31 Thread Ard Biesheuvel
On 31 January 2018 at 14:35, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> On 31 January 2018 at 14:11, Marc Zyngier <marc.zyng...@arm.com> wrote:
>> On 31/01/18 13:56, Hanjun Guo wrote:
>>> Hi Marc,
>>>
>>> On 2018/1/30 1:45, Marc Zyngier wrote:
>>>>  static int enable_psci_bp_hardening(void *data)
>>>>  {
>>>>  const struct arm64_cpu_capabilities *entry = data;
>>>>
>>>> -if (psci_ops.get_version)
>>>> +if (psci_ops.get_version) {
>>>> +if (check_smccc_arch_workaround_1(entry))
>>>> +return 0;
>>>
>>> If I'm using the new version SMCCC, the firmware have the 
>>> choicARM_SMCCC_ARCH_WORKAROUND_1e to decide
>>> whether this machine needs the workaround, even if the CPU is vulnerable
>>> for CVE-2017-5715, but..
>>>
>>>> +
>>>>  install_bp_hardening_cb(entry,
>>>> 
>>>> (bp_hardening_cb_t)psci_ops.get_version,
>>>> __psci_hyp_bp_inval_start,
>>>> __psci_hyp_bp_inval_end);
>>>
>>> ..the code above seems will enable get_psci_version() for CPU and will
>>> trap to trust firmware even the new version of firmware didn't say
>>> we need the workaround, did I understand it correctly?
>>
>> Well, you only get there if we've established that your CPU is affected
>> (it has an entry matching its MIDR with the HARDEN_BRANCH_PREDICTOR
>> capability), and that entry points to enable_psci_bp_hardening. It is
>> not the firmware that decides whether we need hardening, but the kernel.
>> The firmware merely provides a facility to apply the hardening.
>>
>>> I'm ask this because some platform will not expose to users to
>>> take advantage of CVE-2017-5715, and we can use different firmware
>>> to report we need such workaround or not, then use a single kernel
>>> image for both vulnerable platforms and no vulnerable ones.
>>
>> You cannot have your cake and eat it. If you don't want to workaround
>> the issue, you can disable the hardening. But asking for the same kernel
>> to do both depending on what the firmware reports doesn't make much
>> sense to me.
>
> The SMCCC v1.1. document does appear to imply that systems that
> implement SMCCC v1.1 but don't implement ARM_SMCCC_ARCH_WORKAROUND_1
> should be assumed to be unaffected.
>
> """
> If the discovery call returns NOT_SUPPORTED:
> • SMCCC_ARCH_WORKAROUND_1 must not be invoked on any PE in the system, and
> • none of the PEs in the system require firmware mitigation for CVE-2017-5715.
> """
>
> How to deal with conflicting information in this regard (quirk table
> vs firmware implementation) is a matter of policy, of course.

... and actually, perhaps it makes sense for the
SMCCC_ARCH_WORKAROUND_1 check to be completely independent of MIDR
based errata matching?

I.e., if SMCCC v1.1 and SMCCC_ARCH_WORKAROUND_1 are both implemented,
we should probably invoke it even if the MIDR is not known to belong
to an affected implementation.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 16/16] arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

2018-01-31 Thread Ard Biesheuvel
On 31 January 2018 at 14:11, Marc Zyngier  wrote:
> On 31/01/18 13:56, Hanjun Guo wrote:
>> Hi Marc,
>>
>> On 2018/1/30 1:45, Marc Zyngier wrote:
>>>  static int enable_psci_bp_hardening(void *data)
>>>  {
>>>  const struct arm64_cpu_capabilities *entry = data;
>>>
>>> -if (psci_ops.get_version)
>>> +if (psci_ops.get_version) {
>>> +if (check_smccc_arch_workaround_1(entry))
>>> +return 0;
>>
>> If I'm using the new version SMCCC, the firmware have the 
>> choicARM_SMCCC_ARCH_WORKAROUND_1e to decide
>> whether this machine needs the workaround, even if the CPU is vulnerable
>> for CVE-2017-5715, but..
>>
>>> +
>>>  install_bp_hardening_cb(entry,
>>> (bp_hardening_cb_t)psci_ops.get_version,
>>> __psci_hyp_bp_inval_start,
>>> __psci_hyp_bp_inval_end);
>>
>> ..the code above seems will enable get_psci_version() for CPU and will
>> trap to trust firmware even the new version of firmware didn't say
>> we need the workaround, did I understand it correctly?
>
> Well, you only get there if we've established that your CPU is affected
> (it has an entry matching its MIDR with the HARDEN_BRANCH_PREDICTOR
> capability), and that entry points to enable_psci_bp_hardening. It is
> not the firmware that decides whether we need hardening, but the kernel.
> The firmware merely provides a facility to apply the hardening.
>
>> I'm ask this because some platform will not expose to users to
>> take advantage of CVE-2017-5715, and we can use different firmware
>> to report we need such workaround or not, then use a single kernel
>> image for both vulnerable platforms and no vulnerable ones.
>
> You cannot have your cake and eat it. If you don't want to workaround
> the issue, you can disable the hardening. But asking for the same kernel
> to do both depending on what the firmware reports doesn't make much
> sense to me.

The SMCCC v1.1. document does appear to imply that systems that
implement SMCCC v1.1 but don't implement ARM_SMCCC_ARCH_WORKAROUND_1
should be assumed to be unaffected.

"""
If the discovery call returns NOT_SUPPORTED:
• SMCCC_ARCH_WORKAROUND_1 must not be invoked on any PE in the system, and
• none of the PEs in the system require firmware mitigation for CVE-2017-5715.
"""

How to deal with conflicting information in this regard (quirk table
vs firmware implementation) is a matter of policy, of course.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 15/16] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive

2018-01-30 Thread Ard Biesheuvel
On 30 January 2018 at 12:27, Marc Zyngier <marc.zyng...@arm.com> wrote:
> On 29/01/18 21:45, Ard Biesheuvel wrote:
>> On 29 January 2018 at 17:45, Marc Zyngier <marc.zyng...@arm.com> wrote:
>>> One of the major improvement of SMCCC v1.1 is that it only clobbers
>>> the first 4 registers, both on 32 and 64bit. This means that it
>>> becomes very easy to provide an inline version of the SMC call
>>> primitive, and avoid performing a function call to stash the
>>> registers that would otherwise be clobbered by SMCCC v1.0.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>>> ---
>>>  include/linux/arm-smccc.h | 157 
>>> ++
>>>  1 file changed, 157 insertions(+)
>>>
>>> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
>>> index dd44d8458c04..bc5843728909 100644
>>> --- a/include/linux/arm-smccc.h
>>> +++ b/include/linux/arm-smccc.h
>>> @@ -150,5 +150,162 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, 
>>> unsigned long a1,
>>>
>>>  #define arm_smccc_hvc_quirk(...) __arm_smccc_hvc(__VA_ARGS__)
>>>
>>> +/* SMCCC v1.1 implementation madness follows */
>>> +#ifdef CONFIG_ARM64
>>> +
>>> +#define SMCCC_SMC_INST "smc#0"
>>> +#define SMCCC_HVC_INST "hvc#0"
>>> +
>>> +#define __arm_smccc_1_1_prologue(inst) \
>>> +   inst "\n"   \
>>> +   "cbz%[ptr], 1f\n"   \
>>> +   "stp%x[r0], %x[r1], %[ra0]\n"   \
>>> +   "stp%x[r2], %x[r3], %[ra2]\n"   \
>>> +   "1:\n"  \
>>> +   : [ra0] "=Ump" (*(&___res->a0)),\
>>> + [ra2] "=Ump" (*(&___res->a2)),
>>> +
>>> +#define __arm_smccc_1_1_epilogue   : "memory"
>>> +
>>> +#endif
>>> +
>>> +#ifdef CONFIG_ARM
>>> +#include 
>>> +#include 
>>> +
>>> +#define SMCCC_SMC_INST __SMC(0)
>>> +#define SMCCC_HVC_INST __HVC(0)
>>> +
>>> +#define __arm_smccc_1_1_prologue(inst) \
>>> +   inst "\n"   \
>>> +   "cmp%[ptr], #0\n"   \
>>> +   "stmne  %[ptr], {%[r0], %[r1], %[r2], %[r3]}\n" \
>>> +   : "=m" (*___res),
>>> +
>>> +#define __arm_smccc_1_1_epilogue   : "memory", "cc"
>>> +
>>> +#endif
>>> +
>>> +#define __constraint_write_0   \
>>> +   [r0] "+r" (r0), [r1] "=r" (r1), [r2] "=r" (r2), [r3] "=r" (r3)
>>> +#define __constraint_write_1   \
>>> +   [r0] "+r" (r0), [r1] "+r" (r1), [r2] "=r" (r2), [r3] "=r" (r3)
>>> +#define __constraint_write_2   \
>>> +   [r0] "+r" (r0), [r1] "+r" (r1), [r2] "+r" (r2), [r3] "=r" (r3)
>>> +#define __constraint_write_3   \
>>> +   [r0] "+r" (r0), [r1] "+r" (r1), [r2] "+r" (r2), [r3] "+r" (r3)
>>
>> It seems you need +r for all arguments, otherwise the compiler will
>> notice that the value is never used, and may assign the register to
>> 'res' instead, i.e.,
>>
>>  3e4:   320107e0mov w0, #0x8001 // 
>> #-2147483647
>>  3e8:   320183e1mov w1, #0x80008000 // 
>> #-2147450880
>>  3ec:   910123a2add x2, x29, #0x48
>>  3f0:   d402hvc #0x0
>>  3f4:   b462cbz x2, 400 <enable_psci_bp_hardening+0x88>
>>  3f8:   a90487a0stp x0, x1, [x29, #72]
>>  3fc:   a9058fa2stp x2, x3, [x29, #88]
>>
>> (for the code generated in the next patch)
>
> Well spotted.
>
> I think this is because of the lack of early-clobber for the unassigned
> registers. The compiler assumes the whole sequence is a single
> instruction, with the output registers being affected at the end. If we
> mark those with '=', we will prevent GCC from emitting this kind of
> horror.
>

Tried that, actually, but it still doesn't help. It simply notices
that r2 (in this case) is not referenced after the asm () [nor before]
and so it simply seems to forget all about it, and reassigns it to
something else.

> Note that with Robin's trick of moving the assignment back to C code,
> this is a bit moot as this is really a single instruction (smc/hvc), and
> there is no intermediate register evaluation.
>

Yeah, that does look much better.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 15/16] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive

2018-01-29 Thread Ard Biesheuvel
On 29 January 2018 at 17:45, Marc Zyngier  wrote:
> One of the major improvement of SMCCC v1.1 is that it only clobbers
> the first 4 registers, both on 32 and 64bit. This means that it
> becomes very easy to provide an inline version of the SMC call
> primitive, and avoid performing a function call to stash the
> registers that would otherwise be clobbered by SMCCC v1.0.
>
> Signed-off-by: Marc Zyngier 
> ---
>  include/linux/arm-smccc.h | 157 
> ++
>  1 file changed, 157 insertions(+)
>
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index dd44d8458c04..bc5843728909 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -150,5 +150,162 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, 
> unsigned long a1,
>
>  #define arm_smccc_hvc_quirk(...) __arm_smccc_hvc(__VA_ARGS__)
>
> +/* SMCCC v1.1 implementation madness follows */
> +#ifdef CONFIG_ARM64
> +
> +#define SMCCC_SMC_INST "smc#0"
> +#define SMCCC_HVC_INST "hvc#0"
> +
> +#define __arm_smccc_1_1_prologue(inst) \
> +   inst "\n"   \
> +   "cbz%[ptr], 1f\n"   \
> +   "stp%x[r0], %x[r1], %[ra0]\n"   \
> +   "stp%x[r2], %x[r3], %[ra2]\n"   \
> +   "1:\n"  \
> +   : [ra0] "=Ump" (*(&___res->a0)),\
> + [ra2] "=Ump" (*(&___res->a2)),
> +
> +#define __arm_smccc_1_1_epilogue   : "memory"
> +
> +#endif
> +
> +#ifdef CONFIG_ARM
> +#include 
> +#include 
> +
> +#define SMCCC_SMC_INST __SMC(0)
> +#define SMCCC_HVC_INST __HVC(0)
> +
> +#define __arm_smccc_1_1_prologue(inst) \
> +   inst "\n"   \
> +   "cmp%[ptr], #0\n"   \
> +   "stmne  %[ptr], {%[r0], %[r1], %[r2], %[r3]}\n" \
> +   : "=m" (*___res),
> +
> +#define __arm_smccc_1_1_epilogue   : "memory", "cc"
> +
> +#endif
> +
> +#define __constraint_write_0   \
> +   [r0] "+r" (r0), [r1] "=r" (r1), [r2] "=r" (r2), [r3] "=r" (r3)
> +#define __constraint_write_1   \
> +   [r0] "+r" (r0), [r1] "+r" (r1), [r2] "=r" (r2), [r3] "=r" (r3)
> +#define __constraint_write_2   \
> +   [r0] "+r" (r0), [r1] "+r" (r1), [r2] "+r" (r2), [r3] "=r" (r3)
> +#define __constraint_write_3   \
> +   [r0] "+r" (r0), [r1] "+r" (r1), [r2] "+r" (r2), [r3] "+r" (r3)

It seems you need +r for all arguments, otherwise the compiler will
notice that the value is never used, and may assign the register to
'res' instead, i.e.,

 3e4:   320107e0mov w0, #0x8001 // #-2147483647
 3e8:   320183e1mov w1, #0x80008000 // #-2147450880
 3ec:   910123a2add x2, x29, #0x48
 3f0:   d402hvc #0x0
 3f4:   b462cbz x2, 400 
 3f8:   a90487a0stp x0, x1, [x29, #72]
 3fc:   a9058fa2stp x2, x3, [x29, #88]

(for the code generated in the next patch)

> +#define __constraint_write_4   __constraint_write_3
> +#define __constraint_write_5   __constraint_write_3
> +#define __constraint_write_6   __constraint_write_3
> +#define __constraint_write_7   __constraint_write_3
> +
> +#define __constraint_read_0: [ptr] "r" (___res)
> +#define __constraint_read_1__constraint_read_0
> +#define __constraint_read_2__constraint_read_0
> +#define __constraint_read_3__constraint_read_0
> +#define __constraint_read_4__constraint_read_3, "r" (r4)
> +#define __constraint_read_5__constraint_read_4, "r" (r5)
> +#define __constraint_read_6__constraint_read_5, "r" (r6)
> +#define __constraint_read_7__constraint_read_6, "r" (r7)
> +
> +#define ___count_args(_0, _1, _2, _3, _4, _5, _6, _7, _8, x, ...) x
> +
> +#define __count_args(...)  \
> +   ___count_args(__VA_ARGS__, 7, 6, 5, 4, 3, 2, 1, 0)
> +
> +#define __declare_arg_0(a0, res)   \
> +   struct arm_smccc_res   *___res = res;   \
> +   register u32   r0 asm("r0") = a0;   \
> +   register unsigned long r1 asm("r1");\
> +   register unsigned long r2 asm("r2");\
> +   register unsigned long r3 asm("r3")
> +
> +#define __declare_arg_1(a0, a1, res)   \
> +   struct arm_smccc_res   *___res = res;   \
> +   

Re: [PATCH v2 16/16] arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

2018-01-29 Thread Ard Biesheuvel
On 29 January 2018 at 17:45, Marc Zyngier  wrote:
> Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.
> It is lovely. Really.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_psci.h | 63 
>  arch/arm64/kernel/bpi.S   | 20 
>  arch/arm64/kernel/cpu_errata.c| 68 
> ++-
>  3 files changed, 150 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/include/asm/kvm_psci.h
>
> diff --git a/arch/arm64/include/asm/kvm_psci.h 
> b/arch/arm64/include/asm/kvm_psci.h

Did you mean to add this file? It is mostly identical to include/kvm/arm_psci.h

> new file mode 100644
> index ..f553e3795a4e
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_psci.h
> @@ -0,0 +1,63 @@
> +/*
> + * Copyright (C) 2012,2013 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#ifndef __ARM64_KVM_PSCI_H__
> +#define __ARM64_KVM_PSCI_H__
> +
> +#include 
> +
> +#define PSCI_VERSION(x,y)  x) & 0x7fff) << 16) | ((y) & 0x))
> +#define KVM_ARM_PSCI_0_1   PSCI_VERSION(0, 1)
> +#define KVM_ARM_PSCI_0_2   PSCI_VERSION(0, 2)
> +#define KVM_ARM_PSCI_1_0   PSCI_VERSION(1, 0)
> +
> +#define KVM_ARM_PSCI_LATESTKVM_ARM_PSCI_1_0
> +
> +/*
> + * We need the KVM pointer independently from the vcpu as we can call
> + * this from HYP, and need to apply kern_hyp_va on it...
> + */
> +static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
> +{
> +   /*
> +* Our PSCI implementation stays the same across versions from
> +* v0.2 onward, only adding the few mandatory functions (such
> +* as FEATURES with 1.0) that are required by newer
> +* revisions. It is thus safe to return the latest, unless
> +* userspace has instructed us otherwise.
> +*/
> +   if (test_bit(KVM_ARM_VCPU_PSCI_0_2, vcpu->arch.features)) {
> +   if (kvm->arch.psci_version)
> +   return kvm->arch.psci_version;
> +
> +   return KVM_ARM_PSCI_LATEST;
> +   }
> +
> +   return KVM_ARM_PSCI_0_1;
> +}
> +
> +
> +int kvm_hvc_call_handler(struct kvm_vcpu *vcpu);
> +
> +struct kvm_one_reg;
> +
> +int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu);
> +int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> +int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> +int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
> +
> +#endif /* __ARM64_KVM_PSCI_H__ */
> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
> index 76225c2611ea..fdeed629f2c6 100644
> --- a/arch/arm64/kernel/bpi.S
> +++ b/arch/arm64/kernel/bpi.S
> @@ -17,6 +17,7 @@
>   */
>
>  #include 
> +#include 
>
>  .macro ventry target
> .rept 31
> @@ -85,3 +86,22 @@ ENTRY(__qcom_hyp_sanitize_link_stack_start)
> .endr
> ldp x29, x30, [sp], #16
>  ENTRY(__qcom_hyp_sanitize_link_stack_end)
> +
> +.macro smccc_workaround_1 inst
> +   sub sp, sp, #(8 * 4)
> +   stp x2, x3, [sp, #(8 * 0)]
> +   stp x0, x1, [sp, #(8 * 2)]
> +   mov w0, #ARM_SMCCC_ARCH_WORKAROUND_1
> +   \inst   #0
> +   ldp x2, x3, [sp, #(8 * 0)]
> +   ldp x0, x1, [sp, #(8 * 2)]
> +   add sp, sp, #(8 * 4)
> +.endm
> +
> +ENTRY(__smccc_workaround_1_smc_start)
> +   smccc_workaround_1  smc
> +ENTRY(__smccc_workaround_1_smc_end)
> +
> +ENTRY(__smccc_workaround_1_hvc_start)
> +   smccc_workaround_1  hvc
> +ENTRY(__smccc_workaround_1_hvc_end)
> diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> index ed6881882231..36cff870d5d7 100644
> --- a/arch/arm64/kernel/cpu_errata.c
> +++ b/arch/arm64/kernel/cpu_errata.c
> @@ -70,6 +70,10 @@ DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, 
> bp_hardening_data);
>  extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
>  extern char __qcom_hyp_sanitize_link_stack_start[];
>  extern char __qcom_hyp_sanitize_link_stack_end[];
> +extern char __smccc_workaround_1_smc_start[];
> +extern char __smccc_workaround_1_smc_end[];
> +extern char __smccc_workaround_1_hvc_start[];
> +extern char __smccc_workaround_1_hvc_end[];
>
>  static void 

Re: [PATCH 14/14] arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

2018-01-29 Thread Ard Biesheuvel
On 29 January 2018 at 10:07, Marc Zyngier <marc.zyng...@arm.com> wrote:
> On 29/01/18 09:42, Ard Biesheuvel wrote:
>> On 29 January 2018 at 09:36, Marc Zyngier <marc.zyng...@arm.com> wrote:
>>> On 28/01/18 23:08, Ard Biesheuvel wrote:
>>>> On 26 January 2018 at 14:28, Marc Zyngier <marc.zyng...@arm.com> wrote:
>>>>> Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.
>>>>> It is lovely. Really.
>>>>>
>>>>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>>>>> ---
>>>>>  arch/arm64/kernel/bpi.S| 20 
>>>>>  arch/arm64/kernel/cpu_errata.c | 71 
>>>>> +-
>>>>>  2 files changed, 90 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/arm64/kernel/bpi.S b/arch/arm64/kernel/bpi.S
>>>>> index 76225c2611ea..add7e08a018d 100644
>>>>> --- a/arch/arm64/kernel/bpi.S
>>>>> +++ b/arch/arm64/kernel/bpi.S
>>>>> @@ -17,6 +17,7 @@
>>>>>   */
>>>>>
>>>>>  #include 
>>>>> +#include 
>>>>>
>>>>>  .macro ventry target
>>>>> .rept 31
>>>>> @@ -85,3 +86,22 @@ ENTRY(__qcom_hyp_sanitize_link_stack_start)
>>>>> .endr
>>>>> ldp x29, x30, [sp], #16
>>>>>  ENTRY(__qcom_hyp_sanitize_link_stack_end)
>>>>> +
>>>>> +.macro smccc_workaround_1 inst
>>>>> +   sub sp, sp, #(8 * 4)
>>>>> +   stp x2, x3, [sp, #(16 * 0)]
>>>>> +   stp x0, x1, [sp, #(16 * 1)]
>>>>> +   orr w0, wzr, #ARM_SMCCC_ARCH_WORKAROUND_1
>>>>> +   \inst   #0
>>>>> +   ldp x2, x3, [sp, #(16 * 0)]
>>>>> +   ldp x0, x1, [sp, #(16 * 1)]
>>>>> +   add sp, sp, #(8 * 4)
>>>>> +.endm
>>>>> +
>>>>> +ENTRY(__smccc_workaround_1_smc_start)
>>>>> +   smccc_workaround_1  smc
>>>>> +ENTRY(__smccc_workaround_1_smc_end)
>>>>> +
>>>>> +ENTRY(__smccc_workaround_1_hvc_start)
>>>>> +   smccc_workaround_1  hvc
>>>>> +ENTRY(__smccc_workaround_1_hvc_end)
>>>>> diff --git a/arch/arm64/kernel/cpu_errata.c 
>>>>> b/arch/arm64/kernel/cpu_errata.c
>>>>> index ed6881882231..f1501873f2e4 100644
>>>>> --- a/arch/arm64/kernel/cpu_errata.c
>>>>> +++ b/arch/arm64/kernel/cpu_errata.c
>>>>> @@ -70,6 +70,10 @@ DEFINE_PER_CPU_READ_MOSTLY(struct bp_hardening_data, 
>>>>> bp_hardening_data);
>>>>>  extern char __psci_hyp_bp_inval_start[], __psci_hyp_bp_inval_end[];
>>>>>  extern char __qcom_hyp_sanitize_link_stack_start[];
>>>>>  extern char __qcom_hyp_sanitize_link_stack_end[];
>>>>> +extern char __smccc_workaround_1_smc_start[];
>>>>> +extern char __smccc_workaround_1_smc_end[];
>>>>> +extern char __smccc_workaround_1_hvc_start[];
>>>>> +extern char __smccc_workaround_1_hvc_end[];
>>>>>
>>>>>  static void __copy_hyp_vect_bpi(int slot, const char *hyp_vecs_start,
>>>>> const char *hyp_vecs_end)
>>>>> @@ -116,6 +120,10 @@ static void 
>>>>> __install_bp_hardening_cb(bp_hardening_cb_t fn,
>>>>>  #define __psci_hyp_bp_inval_endNULL
>>>>>  #define __qcom_hyp_sanitize_link_stack_start   NULL
>>>>>  #define __qcom_hyp_sanitize_link_stack_end NULL
>>>>> +#define __smccc_workaround_1_smc_start NULL
>>>>> +#define __smccc_workaround_1_smc_end   NULL
>>>>> +#define __smccc_workaround_1_hvc_start NULL
>>>>> +#define __smccc_workaround_1_hvc_end   NULL
>>>>>
>>>>>  static void __install_bp_hardening_cb(bp_hardening_cb_t fn,
>>>>>   const char *hyp_vecs_start,
>>>>> @@ -142,17 +150,78 @@ static void  install_bp_hardening_cb(const struct 
>>>>> arm64_cpu_capabilities *entry,
>>>>> __install_bp_hardening_cb(fn, hyp_vecs_start, hyp_vecs_end);
>>>>>  }
>>>>>
>>>>> +#include 
>>>>> +#include 
>>>>>  #include 
>>>>>
>>>>&

Re: [PATCH 12/14] firmware/psci: Expose PSCI conduit

2018-01-29 Thread Ard Biesheuvel
On 26 January 2018 at 14:28, Marc Zyngier  wrote:
> In order to call into the firmware to apply workarounds, it is
> useful to find out whether we're using HVC or SMC. Let's expose
> this through the psci_ops.
>
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/firmware/psci.c | 26 +-
>  include/linux/psci.h|  7 +++
>  2 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index 8b25d31e8401..570187e5d084 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -59,7 +59,9 @@ bool psci_tos_resident_on(int cpu)
> return cpu == resident_cpu;
>  }
>
> -struct psci_operations psci_ops;
> +struct psci_operations psci_ops = {
> +   .conduit = PSCI_CONDUIT_NONE,
> +};
>
>  typedef unsigned long (psci_fn)(unsigned long, unsigned long,
> unsigned long, unsigned long);
> @@ -210,6 +212,20 @@ static unsigned long psci_migrate_info_up_cpu(void)
>   0, 0, 0);
>  }
>
> +static void set_conduit(enum psci_conduit conduit)
> +{
> +   switch (conduit) {
> +   case PSCI_CONDUIT_HVC:
> +   invoke_psci_fn = __invoke_psci_fn_hvc;
> +   break;
> +   case PSCI_CONDUIT_SMC:
> +   invoke_psci_fn = __invoke_psci_fn_smc;
> +   break;

I get a GCC warning here about PSCI_CONDUIT_NONE not being handled.

> +   }
> +
> +   psci_ops.conduit = conduit;
> +}
> +
>  static int get_set_conduit_method(struct device_node *np)
>  {
> const char *method;
> @@ -222,9 +238,9 @@ static int get_set_conduit_method(struct device_node *np)
> }
>
> if (!strcmp("hvc", method)) {
> -   invoke_psci_fn = __invoke_psci_fn_hvc;
> +   set_conduit(PSCI_CONDUIT_HVC);
> } else if (!strcmp("smc", method)) {
> -   invoke_psci_fn = __invoke_psci_fn_smc;
> +   set_conduit(PSCI_CONDUIT_SMC);
> } else {
> pr_warn("invalid \"method\" property: %s\n", method);
> return -EINVAL;
> @@ -654,9 +670,9 @@ int __init psci_acpi_init(void)
> pr_info("probing for conduit method from ACPI.\n");
>
> if (acpi_psci_use_hvc())
> -   invoke_psci_fn = __invoke_psci_fn_hvc;
> +   set_conduit(PSCI_CONDUIT_HVC);
> else
> -   invoke_psci_fn = __invoke_psci_fn_smc;
> +   set_conduit(PSCI_CONDUIT_SMC);
>
> return psci_probe();
>  }
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index f724fd8c78e8..f2679e5faa4f 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -25,6 +25,12 @@ bool psci_tos_resident_on(int cpu);
>  int psci_cpu_init_idle(unsigned int cpu);
>  int psci_cpu_suspend_enter(unsigned long index);
>
> +enum psci_conduit {
> +   PSCI_CONDUIT_NONE,
> +   PSCI_CONDUIT_SMC,
> +   PSCI_CONDUIT_HVC,
> +};
> +
>  struct psci_operations {
> u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);
> @@ -34,6 +40,7 @@ struct psci_operations {
> int (*affinity_info)(unsigned long target_affinity,
> unsigned long lowest_affinity_level);
> int (*migrate_info_type)(void);
> +   enum psci_conduit conduit;
>  };
>
>  extern struct psci_operations psci_ops;
> --
> 2.14.2
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 13/14] firmware/psci: Expose SMCCC version through psci_ops

2018-01-29 Thread Ard Biesheuvel
On 26 January 2018 at 14:28, Marc Zyngier  wrote:
> Since PSCI 1.0 allows the SMCCC version to be (indirectly) probed,
> let's do that at boot time, and expose the version of the calling
> convention as part of the psci_ops structure.
>
> Signed-off-by: Marc Zyngier 
> ---
>  drivers/firmware/psci.c | 22 ++
>  include/linux/psci.h|  6 ++
>  2 files changed, 28 insertions(+)
>
> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index 570187e5d084..b260bbf637a2 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -509,6 +509,27 @@ static void __init psci_init_migrate(void)
> pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid);
>  }
>
> +static void __init psci_init_smccc(u32 ver)
> +{
> +   int feature = PSCI_RET_NOT_SUPPORTED;
> +
> +   if (PSCI_VERSION_MAJOR(ver) > 1 ||
> +   (PSCI_VERSION_MAJOR(ver) == 1 && PSCI_VERSION_MINOR(ver) >= 0))

'PSCI_VERSION_MAJOR(ver) >= 1' should be sufficient here, no?

> +   feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID);
> +
> +   if (feature == PSCI_RET_NOT_SUPPORTED) {
> +   psci_ops.variant = SMCCC_VARIANT_1_0;
> +   } else {
> +   ver = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0);
> +   if (ver != ARM_SMCCC_VERSION_1_1)
> +   psci_ops.variant = SMCCC_VARIANT_1_0;
> +   else
> +   psci_ops.variant = SMCCC_VARIANT_1_1;
> +   }
> +
> +   pr_info("SMC Calling Convention v1.%d\n", psci_ops.variant);
> +}
> +
>  static void __init psci_0_2_set_functions(void)
>  {
> pr_info("Using standard PSCI v0.2 function IDs\n");
> @@ -555,6 +576,7 @@ static int __init psci_probe(void)
> psci_0_2_set_functions();
>
> psci_init_migrate();
> +   psci_init_smccc(ver);
>
> if (PSCI_VERSION_MAJOR(ver) >= 1) {
> psci_init_cpu_suspend();
> diff --git a/include/linux/psci.h b/include/linux/psci.h
> index f2679e5faa4f..83fd16a37be3 100644
> --- a/include/linux/psci.h
> +++ b/include/linux/psci.h
> @@ -31,6 +31,11 @@ enum psci_conduit {
> PSCI_CONDUIT_HVC,
>  };
>
> +enum smccc_variant {
> +   SMCCC_VARIANT_1_0,
> +   SMCCC_VARIANT_1_1,
> +};
> +
>  struct psci_operations {
> u32 (*get_version)(void);
> int (*cpu_suspend)(u32 state, unsigned long entry_point);
> @@ -41,6 +46,7 @@ struct psci_operations {
> unsigned long lowest_affinity_level);
> int (*migrate_info_type)(void);
> enum psci_conduit conduit;
> +   enum smccc_variant variant;
>  };
>
>  extern struct psci_operations psci_ops;
> --
> 2.14.2
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


  1   2   3   >