Re: [RFC/RFT PATCH 3/3] arm64: KVM: keep trapping of VM sysreg writes enabled

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 13:40, Marc Zyngier marc.zyng...@arm.com wrote:
 On 19/02/15 10:54, Ard Biesheuvel wrote:
 ---
  arch/arm/kvm/mmu.c   | 2 +-
  arch/arm64/include/asm/kvm_arm.h | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index 136662547ca6..fa8ec55220ea 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -1530,7 +1530,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool 
 was_enabled)
   stage2_flush_vm(vcpu-kvm);

   /* Caches are now on, stop trapping VM ops (until a S/W op) */
 - if (now_enabled)
 + if (0)//now_enabled)
   vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu)  ~HCR_TVM);

   trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 diff --git a/arch/arm64/include/asm/kvm_arm.h 
 b/arch/arm64/include/asm/kvm_arm.h
 index 8afb863f5a9e..437e1ec17539 100644
 --- a/arch/arm64/include/asm/kvm_arm.h
 +++ b/arch/arm64/include/asm/kvm_arm.h
 @@ -75,7 +75,7 @@
   * FMO:  Override CPSR.F and enable signaling with VF
   * SWIO: Turn set/way invalidates into set/way clean+invalidate
   */
 -#define HCR_GUEST_FLAGS (HCR_TSC | HCR_TSW | HCR_TWE | HCR_TWI | HCR_VM | \
 +#define HCR_GUEST_FLAGS (HCR_TSC | /* HCR_TSW | */ HCR_TWE | HCR_TWI | 
 HCR_VM | \

 Why do we stop to trap S/W ops here? We can't let the guest issue those
 without doing anything, as this will break anything that expects the
 data to make it to memory. Think of the 32bit kernel decompressor, for
 example.


TBH patch #3 is just a q'n'd hack to ensure that the TVM bit remains
set in HCR. I was assuming that cleaning the entire cache on mmu
enable/disable would be sufficient to quantify the performance impact
and check whether patch #2 works as advertised.

I was wondering: isn't calling stage2_flush_vm() for each set of each
way very costly?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-20 Thread Ard Biesheuvel
On 20 February 2015 at 14:29, Andrew Jones drjo...@redhat.com wrote:
 On Thu, Feb 19, 2015 at 06:57:24PM +0100, Paolo Bonzini wrote:


 On 19/02/2015 18:55, Andrew Jones wrote:
(I don't have an exact number for how many times it went to EL1 
because
 access_mair() doesn't have a trace point.)
(I got the 62873 number by testing a 3rd kernel build that only had 
patch
 3/3 applied to the base, and counting kvm_toggle_cache events.)
(The number 50 is the number of kvm_toggle_cache events *without* 3/3
 applied.)
   
I consider this bad news because, even considering it only goes to 
EL2,
it goes a ton more than it used to. I realize patch 3/3 isn't the 
final
plan for enabling traps though.

 If a full guest boots, can you try timing a kernel compile?


 Guests boot. I used an 8 vcpu, 14G memory guest; compiled the kernel 4
 times inside the guest for each host kernel; base and mair. I dropped
 the time from the first run of each set, and captured the other 3.
 Command line used below. Time is from the
   Elapsed (wall clock) time (h:mm:ss or m:ss):
 output of /usr/bin/time - the host's wall clock.

   /usr/bin/time --verbose ssh $VM 'cd kernel  make -s clean  make -s -j8'

 Results:
 base: 3:06.11 3:07.00 3:10.93
 mair: 3:08.47 3:06.75 3:04.76

 So looks like the 3 orders of magnitude greater number of traps
 (only to el2) don't impact kernel compiles.


OK, good! That was what I was hoping for, obviously.

 Then I thought I'd be able to quick measure the number of cycles
 a trap to el2 takes with this kvm-unit-tests test

 int main(void)
 {
 unsigned long start, end;
 unsigned int sctlr;

 asm volatile(
mrs %0, sctlr_el1\n
msr pmcr_el0, %1\n
 : =r (sctlr) : r (5));

 asm volatile(
mrs %0, pmccntr_el0\n
msr sctlr_el1, %2\n
mrs %1, pmccntr_el0\n
 : =r (start), =r (end) : r (sctlr));

 printf(%llx\n, end - start);
 return 0;
 }

 after applying this patch to kvm

 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index bb91b6fc63861..5de39d740aa58 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -770,7 +770,7 @@

 mrs x2, mdcr_el2
 and x2, x2, #MDCR_EL2_HPMN_MASK
 -   orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
 +// orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
 orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)

 // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap

 But I get zero for the cycle count. Not sure what I'm missing.


No clue tbh. Does the counter work as expected in the host?

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-19 Thread Ard Biesheuvel
On 19 February 2015 at 14:50, Alexander Graf ag...@suse.de wrote:


 On 19.02.15 11:54, Ard Biesheuvel wrote:
 This is a 0th order approximation of how we could potentially force the guest
 to avoid uncached mappings, at least from the moment the MMU is on. (Before
 that, all of memory is implicitly classified as Device-nGnRnE)

 The idea (patch #2) is to trap writes to MAIR_EL1, and replace uncached 
 mappings
 with cached ones. This way, there is no need to mangle any guest page tables.

 Would you mind to give a brief explanation on what this does? What
 happens to actually assigned devices that need to be mapped as uncached?
 What happens to DMA from such devices when the guest assumes that it's
 accessing RAM uncached and then triggers DMA?


On ARM, stage 2 mappings that are more strict will supersede stage 1
mappings, so the idea is to use cached mappings exclusively for stage
1 so that the host is fully in control of the actual memory attributes
by setting the attributes at stage 2. This also makes sense because
the host will ultimately know better whether some range that the guest
thinks is a device is actually a device or just emulated (no stage 2
mapping), backed by host memory (such as the NOR flash read case) or
backed by a passthrough device.

-- 
Ard.



 The downside is that, to do this correctly, we need to always trap writes to
 the VM sysreg group, which includes registers that the guest may write to 
 very
 often. To reduce the associated performance hit, patch #1 introduces a fast 
 path
 for EL2 to perform trivial sysreg writes on behalf of the guest, without the
 need for a full world switch to the host and back.

 The main purpose of these patches is to quantify the performance hit, and
 verify whether the MAIR_EL1 handling works correctly.

 Ard Biesheuvel (3):
   arm64: KVM: handle some sysreg writes in EL2
   arm64: KVM: mangle MAIR register to prevent uncached guest mappings
   arm64: KVM: keep trapping of VM sysreg writes enabled

  arch/arm/kvm/mmu.c   |   2 +-
  arch/arm64/include/asm/kvm_arm.h |   2 +-
  arch/arm64/kvm/hyp.S | 101 
 +++
  arch/arm64/kvm/sys_regs.c|  63 
  4 files changed, 156 insertions(+), 12 deletions(-)

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH roundup 1/4] arm64: mm: increase VA range of identity map

2015-03-16 Thread Ard Biesheuvel
On 16 March 2015 at 15:28, Christoffer Dall christoffer.d...@linaro.org wrote:
 On Fri, Mar 06, 2015 at 03:34:39PM +0100, Ard Biesheuvel wrote:
 The page size and the number of translation levels, and hence the supported
 virtual address range, are build-time configurables on arm64 whose optimal
 values are use case dependent. However, in the current implementation, if
 the system's RAM is located at a very high offset, the virtual address range
 needs to reflect that merely because the identity mapping, which is only used
 to enable or disable the MMU, requires the extended virtual range to map the
 physical memory at an equal virtual offset.

 This patch relaxes that requirement, by increasing the number of translation
 levels for the identity mapping only, and only when actually needed, i.e.,
 when system RAM's offset is found to be out of reach at runtime.

 Tested-by: Laura Abbott lau...@codeaurora.org
 Reviewed-by: Catalin Marinas catalin.mari...@arm.com
 Tested-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
 ---
  arch/arm64/include/asm/mmu_context.h   | 43 
 ++
  arch/arm64/include/asm/page.h  |  6 +++--
  arch/arm64/include/asm/pgtable-hwdef.h |  7 +-
  arch/arm64/kernel/head.S   | 38 ++
  arch/arm64/kernel/smp.c|  1 +
  arch/arm64/mm/mmu.c|  7 +-
  arch/arm64/mm/proc-macros.S| 11 +
  arch/arm64/mm/proc.S   |  3 +++
  8 files changed, 112 insertions(+), 4 deletions(-)

 diff --git a/arch/arm64/include/asm/mmu_context.h 
 b/arch/arm64/include/asm/mmu_context.h
 index a9eee33dfa62..ecf2d060036b 100644
 --- a/arch/arm64/include/asm/mmu_context.h
 +++ b/arch/arm64/include/asm/mmu_context.h
 @@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
   : r (ttbr));
  }

 +/*
 + * TCR.T0SZ value to use when the ID map is active. Usually equals
 + * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
 + * physical memory, in which case it will be smaller.
 + */
 +extern u64 idmap_t0sz;
 +
 +static inline bool __cpu_uses_extended_idmap(void)
 +{
 + return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) 
 + unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
 +}
 +
 +static inline void __cpu_set_tcr_t0sz(u64 t0sz)
 +{
 + unsigned long tcr;
 +
 + if (__cpu_uses_extended_idmap())
 + asm volatile (
 +mrs %0, tcr_el1 ;
 +bfi %0, %1, %2, %3  ;
 +msr tcr_el1, %0 ;
 +isb
 + : =r (tcr)
 + : r(t0sz), I(TCR_T0SZ_OFFSET), I(TCR_TxSZ_WIDTH));
 +}
 +
 +/*
 + * Set TCR.T0SZ to the value appropriate for activating the identity map.
 + */
 +static inline void cpu_set_idmap_tcr_t0sz(void)
 +{
 + __cpu_set_tcr_t0sz(idmap_t0sz);
 +}
 +
 +/*
 + * Set TCR.T0SZ to its default value (based on VA_BITS)
 + */
 +static inline void cpu_set_default_tcr_t0sz(void)
 +{
 + __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
 +}
 +
  static inline void switch_new_context(struct mm_struct *mm)
  {
   unsigned long flags;
 diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
 index 22b16232bd60..3d02b1869eb8 100644
 --- a/arch/arm64/include/asm/page.h
 +++ b/arch/arm64/include/asm/page.h
 @@ -33,7 +33,9 @@
   * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
   * map the kernel. With the 64K page configuration, swapper and idmap need 
 to
   * map to pte level. The swapper also maps the FDT (see __create_page_tables
 - * for more information).
 + * for more information). Note that the number of ID map translation levels
 + * could be increased on the fly if system RAM is out of reach for the 
 default
 + * VA range, so 3 pages are reserved in all cases.
   */
  #ifdef CONFIG_ARM64_64K_PAGES
  #define SWAPPER_PGTABLE_LEVELS   (CONFIG_ARM64_PGTABLE_LEVELS)
 @@ -42,7 +44,7 @@
  #endif

  #define SWAPPER_DIR_SIZE (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
 -#define IDMAP_DIR_SIZE   (SWAPPER_DIR_SIZE)
 +#define IDMAP_DIR_SIZE   (3 * PAGE_SIZE)

  #ifndef __ASSEMBLY__

 diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
 b/arch/arm64/include/asm/pgtable-hwdef.h
 index 5f930cc9ea83..847e864202cc 100644
 --- a/arch/arm64/include/asm/pgtable-hwdef.h
 +++ b/arch/arm64/include/asm/pgtable-hwdef.h
 @@ -143,7 +143,12 @@
  /*
   * TCR flags.
   */
 -#define TCR_TxSZ(x)  (((UL(64) - (x))  16) | ((UL(64) - (x))  
 0))
 +#define TCR_T0SZ_OFFSET  0
 +#define TCR_T1SZ_OFFSET  16
 +#define TCR_T0SZ(x)  ((UL(64) - (x))  TCR_T0SZ_OFFSET)
 +#define TCR_T1SZ(x)  ((UL(64) - (x))  TCR_T1SZ_OFFSET)
 +#define TCR_TxSZ(x)  (TCR_T0SZ(x) | TCR_T1SZ(x))
 +#define TCR_TxSZ_WIDTH   6
  #define TCR_IRGN_NC  ((UL(0)  8) | (UL(0)  24))
  #define TCR_IRGN_WBWA

[PATCH roundup 3/4] ARM, arm64: kvm: get rid of the bounce page

2015-03-06 Thread Ard Biesheuvel
The HYP init bounce page is a runtime construct that ensures that the
HYP init code does not cross a page boundary. However, this is something
we can do perfectly well at build time, by aligning the code appropriately.

For arm64, we just align to 4 KB, and enforce that the code size is less
than 4 KB, regardless of the chosen page size.

For ARM, the whole code is less than 256 bytes, so we tweak the linker
script to align at a power of 2 upper bound of the code size

Note that this also fixes a benign off-by-one error in the original bounce
page code, where a bounce page would be allocated unnecessarily if the code
was exactly 1 page in size.

Tested-by: Marc Zyngier marc.zyng...@arm.com
Reviewed-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/kernel/vmlinux.lds.S   | 26 ++---
 arch/arm/kvm/init.S |  3 +++
 arch/arm/kvm/mmu.c  | 42 +
 arch/arm64/kernel/vmlinux.lds.S | 18 --
 4 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 2787eb8d3616..85db1669bfe3 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -26,12 +26,28 @@
 
 #define IDMAP_RODATA   \
.rodata : { \
-   . = ALIGN(32);  \
+   . = ALIGN(HYP_IDMAP_ALIGN); \
VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
*(.hyp.idmap.text)  \
VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;   \
}
 
+/*
+ * If the HYP idmap .text section is populated, it needs to be positioned
+ * such that it will not cross a page boundary in the final output image.
+ * So align it to the section size rounded up to the next power of 2.
+ * If __hyp_idmap_size is undefined, the section will be empty so define
+ * it as 0 in that case.
+ */
+PROVIDE(__hyp_idmap_size = 0);
+
+#define HYP_IDMAP_ALIGN
\
+   __hyp_idmap_size == 0 ? 0 : \
+   __hyp_idmap_size = 0x100 ? 0x100 : \
+   __hyp_idmap_size = 0x200 ? 0x200 : \
+   __hyp_idmap_size = 0x400 ? 0x400 : \
+   __hyp_idmap_size = 0x800 ? 0x800 : 0x1000
+
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
 #define ARM_CPU_KEEP(x)x
@@ -351,8 +367,12 @@ SECTIONS
  */
 ASSERT((__proc_info_end - __proc_info_begin), missing CPU support)
 ASSERT((__arch_info_end - __arch_info_begin), no machine record defined)
+
 /*
- * The HYP init code can't be more than a page long.
+ * The HYP init code can't be more than a page long,
+ * and should not cross a page boundary.
  * The above comment applies as well.
  */
-ASSERT(((__hyp_idmap_text_end - __hyp_idmap_text_start) = PAGE_SIZE), HYP 
init code too big)
+ASSERT(((__hyp_idmap_text_end - 1)  PAGE_MASK) -
+   (__hyp_idmap_text_start  PAGE_MASK) == 0,
+   HYP init code too big or unaligned)
diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
index 3988e72d16ff..11fb1d56f449 100644
--- a/arch/arm/kvm/init.S
+++ b/arch/arm/kvm/init.S
@@ -157,3 +157,6 @@ target: @ We're now in the trampoline code, switch page 
tables
 __kvm_hyp_init_end:
 
.popsection
+
+   .global __hyp_idmap_size
+   .set__hyp_idmap_size, __kvm_hyp_init_end - __kvm_hyp_init
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859bc3e11..42a24d6b003b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -37,7 +37,6 @@ static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
-static void *init_bounce_page;
 static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
@@ -405,9 +404,6 @@ void free_boot_hyp_pgd(void)
if (hyp_pgd)
unmap_range(NULL, hyp_pgd, TRAMPOLINE_VA, PAGE_SIZE);
 
-   free_page((unsigned long)init_bounce_page);
-   init_bounce_page = NULL;
-
mutex_unlock(kvm_hyp_pgd_mutex);
 }
 
@@ -1498,39 +1494,11 @@ int kvm_mmu_init(void)
hyp_idmap_end = kvm_virt_to_phys(__hyp_idmap_text_end);
hyp_idmap_vector = kvm_virt_to_phys(__kvm_hyp_init);
 
-   if ((hyp_idmap_start ^ hyp_idmap_end)  PAGE_MASK) {
-   /*
-* Our init code is crossing a page boundary. Allocate
-* a bounce page, copy the code over and use that.
-*/
-   size_t len = __hyp_idmap_text_end - __hyp_idmap_text_start;
-   phys_addr_t phys_base;
-
-   init_bounce_page = (void *)__get_free_page

[PATCH roundup 2/4] ARM: KVM: avoid HYP init code too big error

2015-03-06 Thread Ard Biesheuvel
From: Arnd Bergmann a...@arndb.de

When building large kernels, the linker will emit lots of veneers
into the .hyp.idmap.text section, which causes it to grow beyond
one page, and that triggers the build error.

This moves the section into .rodata instead, which avoids the
veneers and is safe because the code is not executed directly
but remapped by the hypervisor into its own executable address
space.

Signed-off-by: Arnd Bergmann a...@arndb.de
[ardb: move the ALIGN() to .rodata as well, update log s/copied/remapped/]
Tested-by: Marc Zyngier marc.zyng...@arm.com
Reviewed-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/kernel/vmlinux.lds.S | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index b31aa73e8076..2787eb8d3616 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -22,11 +22,15 @@
ALIGN_FUNCTION();   \
VMLINUX_SYMBOL(__idmap_text_start) = .; \
*(.idmap.text)  \
-   VMLINUX_SYMBOL(__idmap_text_end) = .;   \
+   VMLINUX_SYMBOL(__idmap_text_end) = .;
+
+#define IDMAP_RODATA   \
+   .rodata : { \
. = ALIGN(32);  \
VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
*(.hyp.idmap.text)  \
-   VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
+   VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;   \
+   }
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
@@ -124,6 +128,7 @@ SECTIONS
. = ALIGN(1SECTION_SHIFT);
 #endif
RO_DATA(PAGE_SIZE)
+   IDMAP_RODATA
 
. = ALIGN(4);
__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
-- 
1.8.3.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH roundup 4/4] arm64: KVM: use ID map with increased VA range if required

2015-03-06 Thread Ard Biesheuvel
This patch modifies the HYP init code so it can deal with system
RAM residing at an offset which exceeds the reach of VA_BITS.

Like for EL1, this involves configuring an additional level of
translation for the ID map. However, in case of EL2, this implies
that all translations use the extra level, as we cannot seamlessly
switch between translation tables with different numbers of
translation levels.

So add an extra translation table at the root level. Since the
ID map and the runtime HYP map are guaranteed not to overlap, they
can share this root level, and we can essentially merge these two
tables into one.

Tested-by: Marc Zyngier marc.zyng...@arm.com
Reviewed-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++
 arch/arm/kvm/mmu.c   | 27 +--
 arch/arm64/include/asm/kvm_mmu.h | 33 +
 arch/arm64/kvm/hyp-init.S| 26 ++
 4 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37ca2a4c6f09..617a30d00c1d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -270,6 +270,16 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+static inline bool __kvm_cpu_uses_extended_idmap(void)
+{
+   return false;
+}
+
+static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
+  pgd_t *hyp_pgd,
+  pgd_t *merged_hyp_pgd,
+  unsigned long hyp_idmap_start) { }
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 42a24d6b003b..69c2b4ce6160 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -35,6 +35,7 @@ extern char  __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 
 static pgd_t *boot_hyp_pgd;
 static pgd_t *hyp_pgd;
+static pgd_t *merged_hyp_pgd;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
 static unsigned long hyp_idmap_start;
@@ -434,6 +435,11 @@ void free_hyp_pgds(void)
free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
hyp_pgd = NULL;
}
+   if (merged_hyp_pgd) {
+   clear_page(merged_hyp_pgd);
+   free_page((unsigned long)merged_hyp_pgd);
+   merged_hyp_pgd = NULL;
+   }
 
mutex_unlock(kvm_hyp_pgd_mutex);
 }
@@ -1473,12 +1479,18 @@ void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 
 phys_addr_t kvm_mmu_get_httbr(void)
 {
-   return virt_to_phys(hyp_pgd);
+   if (__kvm_cpu_uses_extended_idmap())
+   return virt_to_phys(merged_hyp_pgd);
+   else
+   return virt_to_phys(hyp_pgd);
 }
 
 phys_addr_t kvm_mmu_get_boot_httbr(void)
 {
-   return virt_to_phys(boot_hyp_pgd);
+   if (__kvm_cpu_uses_extended_idmap())
+   return virt_to_phys(merged_hyp_pgd);
+   else
+   return virt_to_phys(boot_hyp_pgd);
 }
 
 phys_addr_t kvm_get_idmap_vector(void)
@@ -1521,6 +1533,17 @@ int kvm_mmu_init(void)
goto out;
}
 
+   if (__kvm_cpu_uses_extended_idmap()) {
+   merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | 
__GFP_ZERO);
+   if (!merged_hyp_pgd) {
+   kvm_err(Failed to allocate extra HYP pgd\n);
+   goto out;
+   }
+   __kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
+   hyp_idmap_start);
+   return 0;
+   }
+
/* Map the very same page at the trampoline VA */
err =   __create_hyp_mappings(boot_hyp_pgd,
  TRAMPOLINE_VA, TRAMPOLINE_VA + PAGE_SIZE,
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b5373142..edfe6864bc28 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -68,6 +68,8 @@
 #include asm/pgalloc.h
 #include asm/cachetype.h
 #include asm/cacheflush.h
+#include asm/mmu_context.h
+#include asm/pgtable.h
 
 #define KERN_TO_HYP(kva)   ((unsigned long)kva - PAGE_OFFSET + 
HYP_PAGE_OFFSET)
 
@@ -305,5 +307,36 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+static inline bool __kvm_cpu_uses_extended_idmap(void)
+{
+   return __cpu_uses_extended_idmap();
+}
+
+static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
+  pgd_t *hyp_pgd,
+  pgd_t *merged_hyp_pgd,
+  unsigned long hyp_idmap_start)
+{
+   int idmap_idx

[PATCH roundup 0/4] extend VA range of ID map for core kernel and KVM

2015-03-06 Thread Ard Biesheuvel
These are the VA range patches presented as a coherent set. The bounce page
and 'HYP init code too big' error are probably not prerequisites anymore now
that I switched from merging the HYP runtime map with the HYP id map rather
than the kernel ID map, but I would strongly prefer to keep it as a single
series.

Ard Biesheuvel (3):
  arm64: mm: increase VA range of identity map
  ARM, arm64: kvm: get rid of the bounce page
  arm64: KVM: use ID map with increased VA range if required

Arnd Bergmann (1):
  ARM: KVM: avoid HYP init code too big error

 arch/arm/include/asm/kvm_mmu.h | 10 +
 arch/arm/kernel/vmlinux.lds.S  | 35 ++---
 arch/arm/kvm/init.S|  3 ++
 arch/arm/kvm/mmu.c | 69 +++---
 arch/arm64/include/asm/kvm_mmu.h   | 33 
 arch/arm64/include/asm/mmu_context.h   | 43 +
 arch/arm64/include/asm/page.h  |  6 ++-
 arch/arm64/include/asm/pgtable-hwdef.h |  7 +++-
 arch/arm64/kernel/head.S   | 38 +++
 arch/arm64/kernel/smp.c|  1 +
 arch/arm64/kernel/vmlinux.lds.S| 18 ++---
 arch/arm64/kvm/hyp-init.S  | 26 +
 arch/arm64/mm/mmu.c|  7 +++-
 arch/arm64/mm/proc-macros.S| 11 ++
 arch/arm64/mm/proc.S   |  3 ++
 15 files changed, 256 insertions(+), 54 deletions(-)

-- 
1.8.3.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH roundup 1/4] arm64: mm: increase VA range of identity map

2015-03-06 Thread Ard Biesheuvel
The page size and the number of translation levels, and hence the supported
virtual address range, are build-time configurables on arm64 whose optimal
values are use case dependent. However, in the current implementation, if
the system's RAM is located at a very high offset, the virtual address range
needs to reflect that merely because the identity mapping, which is only used
to enable or disable the MMU, requires the extended virtual range to map the
physical memory at an equal virtual offset.

This patch relaxes that requirement, by increasing the number of translation
levels for the identity mapping only, and only when actually needed, i.e.,
when system RAM's offset is found to be out of reach at runtime.

Tested-by: Laura Abbott lau...@codeaurora.org
Reviewed-by: Catalin Marinas catalin.mari...@arm.com
Tested-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm64/include/asm/mmu_context.h   | 43 ++
 arch/arm64/include/asm/page.h  |  6 +++--
 arch/arm64/include/asm/pgtable-hwdef.h |  7 +-
 arch/arm64/kernel/head.S   | 38 ++
 arch/arm64/kernel/smp.c|  1 +
 arch/arm64/mm/mmu.c|  7 +-
 arch/arm64/mm/proc-macros.S| 11 +
 arch/arm64/mm/proc.S   |  3 +++
 8 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h 
b/arch/arm64/include/asm/mmu_context.h
index a9eee33dfa62..ecf2d060036b 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -64,6 +64,49 @@ static inline void cpu_set_reserved_ttbr0(void)
: r (ttbr));
 }
 
+/*
+ * TCR.T0SZ value to use when the ID map is active. Usually equals
+ * TCR_T0SZ(VA_BITS), unless system RAM is positioned very high in
+ * physical memory, in which case it will be smaller.
+ */
+extern u64 idmap_t0sz;
+
+static inline bool __cpu_uses_extended_idmap(void)
+{
+   return (!IS_ENABLED(CONFIG_ARM64_VA_BITS_48) 
+   unlikely(idmap_t0sz != TCR_T0SZ(VA_BITS)));
+}
+
+static inline void __cpu_set_tcr_t0sz(u64 t0sz)
+{
+   unsigned long tcr;
+
+   if (__cpu_uses_extended_idmap())
+   asm volatile (
+  mrs %0, tcr_el1 ;
+  bfi %0, %1, %2, %3  ;
+  msr tcr_el1, %0 ;
+  isb
+   : =r (tcr)
+   : r(t0sz), I(TCR_T0SZ_OFFSET), I(TCR_TxSZ_WIDTH));
+}
+
+/*
+ * Set TCR.T0SZ to the value appropriate for activating the identity map.
+ */
+static inline void cpu_set_idmap_tcr_t0sz(void)
+{
+   __cpu_set_tcr_t0sz(idmap_t0sz);
+}
+
+/*
+ * Set TCR.T0SZ to its default value (based on VA_BITS)
+ */
+static inline void cpu_set_default_tcr_t0sz(void)
+{
+   __cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS));
+}
+
 static inline void switch_new_context(struct mm_struct *mm)
 {
unsigned long flags;
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 22b16232bd60..3d02b1869eb8 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -33,7 +33,9 @@
  * image. Both require pgd, pud (4 levels only) and pmd tables to (section)
  * map the kernel. With the 64K page configuration, swapper and idmap need to
  * map to pte level. The swapper also maps the FDT (see __create_page_tables
- * for more information).
+ * for more information). Note that the number of ID map translation levels
+ * could be increased on the fly if system RAM is out of reach for the default
+ * VA range, so 3 pages are reserved in all cases.
  */
 #ifdef CONFIG_ARM64_64K_PAGES
 #define SWAPPER_PGTABLE_LEVELS (CONFIG_ARM64_PGTABLE_LEVELS)
@@ -42,7 +44,7 @@
 #endif
 
 #define SWAPPER_DIR_SIZE   (SWAPPER_PGTABLE_LEVELS * PAGE_SIZE)
-#define IDMAP_DIR_SIZE (SWAPPER_DIR_SIZE)
+#define IDMAP_DIR_SIZE (3 * PAGE_SIZE)
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 5f930cc9ea83..847e864202cc 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -143,7 +143,12 @@
 /*
  * TCR flags.
  */
-#define TCR_TxSZ(x)(((UL(64) - (x))  16) | ((UL(64) - (x))  0))
+#define TCR_T0SZ_OFFSET0
+#define TCR_T1SZ_OFFSET16
+#define TCR_T0SZ(x)((UL(64) - (x))  TCR_T0SZ_OFFSET)
+#define TCR_T1SZ(x)((UL(64) - (x))  TCR_T1SZ_OFFSET)
+#define TCR_TxSZ(x)(TCR_T0SZ(x) | TCR_T1SZ(x))
+#define TCR_TxSZ_WIDTH 6
 #define TCR_IRGN_NC((UL(0)  8) | (UL(0)  24))
 #define TCR_IRGN_WBWA  ((UL(1)  8) | (UL(1)  24))
 #define TCR_IRGN_WT((UL(2)  8) | (UL(2)  24))
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 8ce88e08c030..a3612eadab3c 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch

Re: ARM: KVM/XEN: how should we support virt-what?

2015-03-26 Thread Ard Biesheuvel
On 26 March 2015 at 19:45, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 On Thu, 26 Mar 2015, Andrew Jones wrote:
 On Wed, Mar 25, 2015 at 10:44:42AM +0100, Andrew Jones wrote:
  Hello ARM virt maintainers,
 
  I'd like to start a discussion about supporting virt-what[1]. virt-what
  allows userspace to determine if the system it's running on is running
  in a guest, and of what type (KVM, Xen, etc.). Despite it being a best
  effort tool, see the Caveat emptor in [1], it has become quite a useful
  tool, and is showing up in different places, such as OpenStack. If you
  look at the code[2], specifically [3], then you'll see how it works on
  x86, which is to use the dedicated hypervisor cpuid leaves. I'm
  wondering what equivalent we have, or can develop, for arm.
  Here are some thoughts;
  0) there's already something we can use, and I just need to be told
 about it.
  1) be as similar as possible to x86 by dedicating some currently
 undefined sysreg bits. This would take buy-in from lots of parties,
 so is not likely the way to go.
  2) create a specific DT node that will get exposed through sysfs, or
 somewhere.
  3) same as (2), but just use the nodes currently in mach-virt's DT
 as the indication we're a guest. This would just be a heuristic,
 i.e. have virtio mmio  psci.method == hvc, or something,
 and we'd still need a way to know if we're kvm vs. xen vs. ??.
 
  Thanks,
  drew
 
  [1] http://people.redhat.com/~rjones/virt-what/
  [2] http://git.annexia.org/?p=virt-what.git;a=summary
  [3] 
  http://git.annexia.org/?p=virt-what.git;a=blob_plain;f=virt-what-cpuid-helper.c;hb=HEAD

 Thanks everyone for their responses. So, the current summary seems to
 be;
 1) Xen has both a DT node and an ACPI table, virt-what can learn how
to probe those.
 2) We don't have anything yet for KVM, and we're reluctant to create a
specific DT node. Anyway, we'd still need to address ACPI booted
guests some other way.

 For a short-term, DT-only, approach we could go with a heuristic, one
 that includes Marc's if hypervisor node exists, then xen, else kvm
 condition.

 How about SMBIOS for a long-term solution that works for both DT and
 ACPI? We're not populating SMBIOS for arm guests yet in qemu, but now
 that AAVMF has fw_cfg, we should be able to. On x86 we already have
 smbios populated from qemu, although not in a way that allows us to
 determine kvm vs. xen vs. tcg.

 I don't think that SMBIOS works with DT.


SMBIOS works fine with DT
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Linaro-uefi] UEFI on KVM fails to start on juno on cortex-a57 cluster

2015-03-26 Thread Ard Biesheuvel
On 26 March 2015 at 09:09, Riku Voipio riku.voi...@linaro.org wrote:
 On 25 March 2015 at 21:32, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 On 25 March 2015 at 17:14, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 On 25 March 2015 at 17:14, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 On 25 March 2015 at 07:59, Riku Voipio riku.voi...@linaro.org wrote:
 Hi,

 It appears on juno, I can start kvm with UEFI only on cortex-a53 cores:


 taskset -c 0 qemu-system-aarch64 -m 1024 -cpu host -M virt -bios
 QEMU_EFI.fd -enable-kvm -nographic
 - works:
 UEFI Interactive Shell v2.0
 taskset -c 1 qemu-system-aarch64 -m 1024 -cpu host -M virt -bios
 QEMU_EFI.fd -enable-kvm -nographic
 - hangs at cpu spinning 100%
 ...


 I can reproduce the hang, both with your UEFI binary and my own release 
 build.
 The debug build works fine, unfortunately...


 Tianocore built from master as of today, that is.


 OK, it appears that we were missing some cache maintenance. It is not
 obvious how that should affect A57 only, but with these patches, I can
 now reliably run the release version on my Seattle A57

 https://git.linaro.org/people/ard.biesheuvel/uefi-next.git/shortlog/refs/heads/qemu-xen-cache-maintenance

 Thanks. Do you know when there would be a new build on releases or snapshots?


Let me check with Leif. We have another candidate patch now that he
could perhaps apply and kick off a build?


 Qemu and kernel are latest mainline, kvm from last months Ĺinaro
 release. According to cpuinfo cores 0 and 3-5 are a53 and 1-2 are a57.
 Details:

 # wget 
 http://releases.linaro.org/15.01/components/kernel/uefi-linaro/release/qemu64-intelbds/QEMU_EFI.fd
 uname -a
 # Linux linaro-nano 4.0.0-rc5-linaro-juno #1 SMP PREEMPT Tue Mar 24
 10:46:53 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
 # qemu-system-aarch64 --version
 QEMU emulator version 2.2.91 (Debian 2.3.0~rc1-185linaro+utopic),
 Copyright (c) 2003-2008 Fabrice Bellard
 # grep -E '(processor|part)' /proc/cpuinfo
 processor   : 0
 CPU part: 0xd03
 processor   : 1
 CPU part: 0xd07
 processor   : 2
 CPU part: 0xd07
 processor   : 3
 CPU part: 0xd03
 processor   : 4
 CPU part: 0xd03
 processor   : 5
 CPU part: 0xd03

 ___
 Linaro-uefi mailing list
 linaro-u...@lists.linaro.org
 http://lists.linaro.org/mailman/listinfo/linaro-uefi
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Linaro-uefi] UEFI on KVM fails to start on juno on cortex-a57 cluster

2015-04-14 Thread Ard Biesheuvel
On 14 April 2015 at 13:07, Christoffer Dall christoffer.d...@linaro.org wrote:
 On Mon, Apr 13, 2015 at 11:04:00AM +0200, Ard Biesheuvel wrote:
 On 27 March 2015 at 01:02, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
  On 26 March 2015 at 09:09, Riku Voipio riku.voi...@linaro.org wrote:
  On 25 March 2015 at 21:32, Ard Biesheuvel ard.biesheu...@linaro.org 
  wrote:
  On 25 March 2015 at 17:14, Ard Biesheuvel ard.biesheu...@linaro.org 
  wrote:
  On 25 March 2015 at 17:14, Ard Biesheuvel ard.biesheu...@linaro.org 
  wrote:
  On 25 March 2015 at 07:59, Riku Voipio riku.voi...@linaro.org wrote:
  Hi,
 
  It appears on juno, I can start kvm with UEFI only on cortex-a53 
  cores:
 
 
  taskset -c 0 qemu-system-aarch64 -m 1024 -cpu host -M virt -bios
  QEMU_EFI.fd -enable-kvm -nographic
  - works:
  UEFI Interactive Shell v2.0
  taskset -c 1 qemu-system-aarch64 -m 1024 -cpu host -M virt -bios
  QEMU_EFI.fd -enable-kvm -nographic
  - hangs at cpu spinning 100%
  ...
 
 
  I can reproduce the hang, both with your UEFI binary and my own 
  release build.
  The debug build works fine, unfortunately...
 
 
  Tianocore built from master as of today, that is.
 
 
  OK, it appears that we were missing some cache maintenance. It is not
  obvious how that should affect A57 only, but with these patches, I can
  now reliably run the release version on my Seattle A57
 
  https://git.linaro.org/people/ard.biesheuvel/uefi-next.git/shortlog/refs/heads/qemu-xen-cache-maintenance
 
  Thanks. Do you know when there would be a new build on releases or 
  snapshots?
 
 
  Let me check with Leif. We have another candidate patch now that he
  could perhaps apply and kick off a build?
 

 I now have independent confirmation (from Laszlo Ersek) that the cache
 maintenance patches I am proposing fix the issue on Seattle.

 Which patches are those?  For UEFI?


Yes, for Tianocore.
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/13665

 Hopefully this means Juno is fixed as well.

 I am trying to get a snapshot out asap, today or tomorrow perhaps?

 Thanks,
 -Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: kvm: fix a bad BSYM() usage

2015-05-11 Thread Ard Biesheuvel
On 11 May 2015 at 11:05, Christoffer Dall christoffer.d...@linaro.org wrote:
 On Sat, May 09, 2015 at 10:10:56PM +0200, Ard Biesheuvel wrote:
 On 9 May 2015 at 22:07, Christoffer Dall christoffer.d...@linaro.org wrote:
  On Fri, May 08, 2015 at 05:08:42PM +0100, Russell King wrote:
  BSYM() should only be used when refering to local symbols in the same
  assembly file which are resolved by the assembler, and not for
  linker-fixed up symbols.  The use of BSYM() with panic is incorrect as
  the linker is involved in fixing up this relocation, and it knows
  whether panic() is ARM or Thumb.
 
  Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk
  ---
   arch/arm/kvm/interrupts.S | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
  diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
  index 79caf79b304a..87847d2c5f99 100644
  --- a/arch/arm/kvm/interrupts.S
  +++ b/arch/arm/kvm/interrupts.S
  @@ -309,7 +309,7 @@ ENTRY(kvm_call_hyp)
   THUMB(   orr r2, r2, #PSR_T_BIT  )
msr spsr_cxsf, r2
mrs r1, ELR_hyp
  - ldr r2, =BSYM(panic)
  + ldr r2, =panic
msr ELR_hyp, r2
ldr r0, =\panic_str
clrex   @ Clear exclusive monitor
  --
  1.8.3.1
 
  Indeed, the linker figures it out as it should.  It does seem like the
  right result is produced with the BSYM() macro as well so not sure what
  the harm is.
 

 BSYM() is defined as 'sym + 1' not 'sym | 1', so if the symbol has the
 thumb bit set already, the result is incorrect.

 yeah, but the linker will look at the result of 'sym + 1', so on my
 system it ends up with 'sym + 1' after the linker has done its thing
 (verified by looking at the disassembly of vmlinux);

Hmm, I though had done the same when this was under discussion a
couple of weeks ago, and had arrived at the opposite conclusion, but
now I cannot reproduce anymore so apparently not.
Sorry for the noise.

 I assume the
 linker logic is that it's branching to a thumb function but the target
 is already the +1 so no action necessary, as opposed to just blindly
 adding 1.

 -Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: kvm: fix a bad BSYM() usage

2015-05-09 Thread Ard Biesheuvel
On 9 May 2015 at 22:07, Christoffer Dall christoffer.d...@linaro.org wrote:
 On Fri, May 08, 2015 at 05:08:42PM +0100, Russell King wrote:
 BSYM() should only be used when refering to local symbols in the same
 assembly file which are resolved by the assembler, and not for
 linker-fixed up symbols.  The use of BSYM() with panic is incorrect as
 the linker is involved in fixing up this relocation, and it knows
 whether panic() is ARM or Thumb.

 Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk
 ---
  arch/arm/kvm/interrupts.S | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index 79caf79b304a..87847d2c5f99 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -309,7 +309,7 @@ ENTRY(kvm_call_hyp)
  THUMB(   orr r2, r2, #PSR_T_BIT  )
   msr spsr_cxsf, r2
   mrs r1, ELR_hyp
 - ldr r2, =BSYM(panic)
 + ldr r2, =panic
   msr ELR_hyp, r2
   ldr r0, =\panic_str
   clrex   @ Clear exclusive monitor
 --
 1.8.3.1

 Indeed, the linker figures it out as it should.  It does seem like the
 right result is produced with the BSYM() macro as well so not sure what
 the harm is.


BSYM() is defined as 'sym + 1' not 'sym | 1', so if the symbol has the
thumb bit set already, the result is incorrect.

 Anyway, I've queued this to merge via the KVM tree.

 Thanks,
 -Christoffer

 ___
 linux-arm-kernel mailing list
 linux-arm-ker...@lists.infradead.org
 http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-15 Thread Ard Biesheuvel
On 14 May 2015 at 16:41, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, May 14, 2015 at 04:19:23PM +0200, Laszlo Ersek wrote:
 On 05/14/15 15:48, Michael S. Tsirkin wrote:
  On Thu, May 14, 2015 at 03:32:10PM +0200, Laszlo Ersek wrote:
  On 05/14/15 15:00, Andrew Jones wrote:
  On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote:
  On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote:
  Well, PCI BARs are generally MMIO resources, and hence should not be 
  cached.
 
  As an optimization, OS drivers can mark them as cacheable or
  write-combining or something like that, but in general it's a safe
  default to leave them uncached---one would think.
 
  Isn't this handled by the OS mapping them in the 'prefetchable'
  MMIO window rather than the 'non-prefetchable' one? (QEMU's
  generic-PCIe device doesn't yet support the prefetchable window.)
 
  I was thinking (with my limited PCI knowledge) the same thing, and
  was planning on experimenting with that.
 
  This could be supported in UEFI as well, with the following steps:
  - the DTB that QEMU provides UEFI with should advertise such a
prefetchable window.
  - The driver in UEFI that parses the DTB should understand that DTB
node (well, record type), and store the appropriate base  size into
some new dynamic PCDs (= basically, firmware wide global variables;
PCD = platform configuration database)
  - The entry point of the host bridge driver would call
gDS-AddMemorySpace() twice, separately for the two different windows,
with their appropriate caching attributes.
  - The host bridge driver needs to be extended so that TypePMem32
requests are not rejected (like now); they should be handled
similarly to TypeMem32. Except, the gDS-AllocateMemorySpace() call
should allocate from the prefetchable range (determined by the new
PCDs above).
  - QEMU's emulated devices should then expose their BARs as prefetchable
(so that the above branch would be taken in the host bridge driver).
 
  (Of course, if QEMU intends to emulate PCI devices somewhat
  realistically, then QEMU should claim non-prefetchable for BARs that
  would not be prefetchable on physical hardware either, and then the
  hypervisor should accommodate the firmware's UC mapping and say hey I
  know better, we're virtual in fact, and override the attribute (- use
  WB instead of UC). With which we'd be back to square one...)
 
  Thanks
  Laszlo
 
  Prefetcheable is unrelated to BAR caching or drivers, it's a way to tell
  host bridges they can do limited tweaks to downstream transactions in a
  specific range.
 
  Really non-prefetcheable BARs are mostly those where read has
  side-effects, which is best avoided. this does not mean it's ok to
  reorder transactions or cache them.

 I believe I understood that (although certainly not in the depth that
 you do), because when the idea had come up first (ie. equating cacheable
 with prefetchable, or at least repurposing the latter for the former)
 I had tried to read up on prefetchable (just on the web; no time for
 reading the PCI spec. ... I peeked now, it also mentions write merging
 for bridges.)

 Read up on what it is if you like, it is much weaker than WC not to
 mention cacheable.

 The way I perceived it, the idea was to give the guest a
 hint about caching with the prefetchable bit / DTB entry. Sorry if I was
 mistaken.

 Thanks
 Laszlo

 And what I am saying is that prefetchable bit would be a PV solution -
 on real devices it is not a hint about caching and can't be used as
 such.


On a general note, may I point out that while this discussion now
focuses heavily on PCI and its metadata that could potentially
describe the cached/uncached nature of a region, there are other
emulated devices that are affected as well. Most notably, there is the
emulated NOR flash which is backed by a read-only memslot while in
array mode, but treated as a device by the guest and hence mapped
uncached. Since the NOR flash contains the executable image of the
firmware (in case of UEFI), it must be backed by actual host RAM or
the CPU won't be able to fetch instructions from it (since instruction
fetches cannot be emulated like ordinary loads and stores). On the
other hand, since the guest treats it as a ROM, it is totally
oblivious of any caching concerns that may exist.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [Linaro-uefi] UEFI on KVM fails to start on juno on cortex-a57 cluster

2015-04-13 Thread Ard Biesheuvel
On 27 March 2015 at 01:02, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 On 26 March 2015 at 09:09, Riku Voipio riku.voi...@linaro.org wrote:
 On 25 March 2015 at 21:32, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 On 25 March 2015 at 17:14, Ard Biesheuvel ard.biesheu...@linaro.org wrote:
 On 25 March 2015 at 17:14, Ard Biesheuvel ard.biesheu...@linaro.org 
 wrote:
 On 25 March 2015 at 07:59, Riku Voipio riku.voi...@linaro.org wrote:
 Hi,

 It appears on juno, I can start kvm with UEFI only on cortex-a53 cores:


 taskset -c 0 qemu-system-aarch64 -m 1024 -cpu host -M virt -bios
 QEMU_EFI.fd -enable-kvm -nographic
 - works:
 UEFI Interactive Shell v2.0
 taskset -c 1 qemu-system-aarch64 -m 1024 -cpu host -M virt -bios
 QEMU_EFI.fd -enable-kvm -nographic
 - hangs at cpu spinning 100%
 ...


 I can reproduce the hang, both with your UEFI binary and my own release 
 build.
 The debug build works fine, unfortunately...


 Tianocore built from master as of today, that is.


 OK, it appears that we were missing some cache maintenance. It is not
 obvious how that should affect A57 only, but with these patches, I can
 now reliably run the release version on my Seattle A57

 https://git.linaro.org/people/ard.biesheuvel/uefi-next.git/shortlog/refs/heads/qemu-xen-cache-maintenance

 Thanks. Do you know when there would be a new build on releases or snapshots?


 Let me check with Leif. We have another candidate patch now that he
 could perhaps apply and kick off a build?


I now have independent confirmation (from Laszlo Ersek) that the cache
maintenance patches I am proposing fix the issue on Seattle.
Hopefully this means Juno is fixed as well.

I am trying to get a snapshot out asap, today or tomorrow perhaps?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/2] arm/arm64: KVM: fix two build failured under STRICT_MM_TYPECHECKS

2015-06-30 Thread Ard Biesheuvel
This fixes two instances where a pgprot_t is used as the operand
of a bitwise  operation. In order to comply with STRICT_MM_TYPECHECKS,
bitwise arithmetic on a pgprot_t should go via the pgprot_val()
accessor.

Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org
---
 arch/arm/kvm/mmu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1d5accbd3dcf..a255c7dd534b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -213,7 +213,8 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
/* No need to invalidate the cache for device mappings 
*/
-   if ((pte_val(old_pte)  PAGE_S2_DEVICE) != 
PAGE_S2_DEVICE)
+   if ((pte_val(old_pte)  pgprot_val(PAGE_S2_DEVICE)) !=
+   pgprot_val(PAGE_S2_DEVICE))
kvm_flush_dcache_pte(old_pte);
 
put_page(virt_to_page(pte));
@@ -306,7 +307,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
pte = pte_offset_kernel(pmd, addr);
do {
if (!pte_none(*pte) 
-   (pte_val(*pte)  PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
+   (pte_val(*pte)  pgprot_val(PAGE_S2_DEVICE)) !=
+   pgprot_val(PAGE_S2_DEVICE))
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
-- 
1.9.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-09 Thread Ard Biesheuvel
On 9 November 2015 at 17:21, Christoffer Dall
<christoffer.d...@linaro.org> wrote:
> On Fri, Nov 06, 2015 at 12:43:08PM +0100, Ard Biesheuvel wrote:
>> The open coded tests for checking whether a PTE maps a page as
>> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> which is not guaranteed to work since the type of a mapping is an
>> index into the MAIR table, not a set of mutually exclusive bits.
>>
>> Considering that, on arm64, the S2 type definitions use the following
>> MAIR indexes
>>
>> #define MT_S2_NORMAL0xf
>> #define MT_S2_DEVICE_nGnRE  0x1
>>
>> we have been getting lucky merely because the S2 device mappings also
>> have the PTE_UXN bit set, which means that a device PTE still does not
>> equal a normal PTE after masking with the former type.
>>
>> Instead, implement proper checking against the MAIR indexes that are
>> known to define uncached memory attributes.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h   | 11 +++
>>  arch/arm/kvm/mmu.c   |  5 ++---
>>  arch/arm64/include/asm/kvm_mmu.h | 12 
>>  3 files changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 405aa1883307..422973835d41 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -279,6 +279,17 @@ static inline void __kvm_extend_hypmap(pgd_t 
>> *boot_hyp_pgd,
>>  pgd_t *merged_hyp_pgd,
>>  unsigned long hyp_idmap_start) { }
>>
>> +static inline bool __kvm_pte_is_uncached(pte_t pte)
>> +{
>> + switch (pte_val(pte) & L_PTE_MT_MASK) {
>> + case L_PTE_MT_UNCACHED:
>> + case L_PTE_MT_BUFFERABLE:
>> + case L_PTE_MT_DEV_SHARED:
>> + return true;
>> + }
>
> so PTEs created by setting PAGE_S2_DEVICE will end up hitting in one of
> these because L_PTE_S2_MT_DEV_SHARED is the same as L_PTE_MT_BUFFERABLE
> for stage-2 mappings and PAGE_HYP_DEVICE end up using
> L_PTE_MT_DEV_SHARED.
>
> Totally obvious.
>

Hmm, perhaps not. Would you prefer all aliases of the L_PTE_MT_xx
constants that map to device permissions to be listed here?

>> + return false;
>> +}
>> +
>>  #endif   /* !__ASSEMBLY__ */
>>
>>  #endif /* __ARM_KVM_MMU_H__ */
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 6984342da13d..eb9a06e3dbee 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -213,7 +213,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>>
>>   /* No need to invalidate the cache for device mappings 
>> */
>> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> PAGE_S2_DEVICE)
>> + if (!__kvm_pte_is_uncached(old_pte))
>>   kvm_flush_dcache_pte(old_pte);
>>
>>   put_page(virt_to_page(pte));
>> @@ -305,8 +305,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) &&
>> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> + if (!pte_none(*pte) && !__kvm_pte_is_uncached(*pte))
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h 
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index 61505676d085..5806f412a47a 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -302,5 +302,17 @@ static inline void __kvm_extend_hypmap(pgd_t 
>> *boot_hyp_pgd,
>>   merged_hyp_pgd[idmap_idx] = __pgd(__pa(boot_hyp_pgd) | PMD_TYPE_TABLE);
>>  }
>>
>> +static inline bool __kvm_pte_is_uncached(pte_t pte)
>> +{
>> + switch (pte_val(pte) & PTE_ATTRINDX_MASK) {
>> + case PTE_ATTRINDX(MT_DEVICE_nGnRnE):
>> + case PTE_ATTRINDX(MT_DEVICE_nGnRE):
>> + case PTE_ATTRINDX(MT_DEVICE_GRE):
>> + case PTE_ATTRINDX(MT_NORMAL_NC):
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>>  #endif /* __ASSEMBLY__ */
>>  #endif /* __ARM64_KVM_MMU_H__ */
>> --
>> 1.9.1
>>
>
> Thanks for this patch, I'll queue it.
>
> -Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-09 Thread Ard Biesheuvel
On 9 November 2015 at 17:35, Christoffer Dall
<christoffer.d...@linaro.org> wrote:
> On Mon, Nov 09, 2015 at 05:27:40PM +0100, Ard Biesheuvel wrote:
>> On 9 November 2015 at 17:21, Christoffer Dall
>> <christoffer.d...@linaro.org> wrote:
>> > On Fri, Nov 06, 2015 at 12:43:08PM +0100, Ard Biesheuvel wrote:
>> >> The open coded tests for checking whether a PTE maps a page as
>> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> >> which is not guaranteed to work since the type of a mapping is an
>> >> index into the MAIR table, not a set of mutually exclusive bits.
>> >>
>> >> Considering that, on arm64, the S2 type definitions use the following
>> >> MAIR indexes
>> >>
>> >> #define MT_S2_NORMAL0xf
>> >> #define MT_S2_DEVICE_nGnRE  0x1
>> >>
>> >> we have been getting lucky merely because the S2 device mappings also
>> >> have the PTE_UXN bit set, which means that a device PTE still does not
>> >> equal a normal PTE after masking with the former type.
>> >>
>> >> Instead, implement proper checking against the MAIR indexes that are
>> >> known to define uncached memory attributes.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> >> ---
>> >>  arch/arm/include/asm/kvm_mmu.h   | 11 +++
>> >>  arch/arm/kvm/mmu.c   |  5 ++---
>> >>  arch/arm64/include/asm/kvm_mmu.h | 12 
>> >>  3 files changed, 25 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/arch/arm/include/asm/kvm_mmu.h 
>> >> b/arch/arm/include/asm/kvm_mmu.h
>> >> index 405aa1883307..422973835d41 100644
>> >> --- a/arch/arm/include/asm/kvm_mmu.h
>> >> +++ b/arch/arm/include/asm/kvm_mmu.h
>> >> @@ -279,6 +279,17 @@ static inline void __kvm_extend_hypmap(pgd_t 
>> >> *boot_hyp_pgd,
>> >>  pgd_t *merged_hyp_pgd,
>> >>  unsigned long hyp_idmap_start) { }
>> >>
>> >> +static inline bool __kvm_pte_is_uncached(pte_t pte)
>> >> +{
>> >> + switch (pte_val(pte) & L_PTE_MT_MASK) {
>> >> + case L_PTE_MT_UNCACHED:
>> >> + case L_PTE_MT_BUFFERABLE:
>> >> + case L_PTE_MT_DEV_SHARED:
>> >> + return true;
>> >> + }
>> >
>> > so PTEs created by setting PAGE_S2_DEVICE will end up hitting in one of
>> > these because L_PTE_S2_MT_DEV_SHARED is the same as L_PTE_MT_BUFFERABLE
>> > for stage-2 mappings and PAGE_HYP_DEVICE end up using
>> > L_PTE_MT_DEV_SHARED.
>> >
>> > Totally obvious.
>> >
>>
>> Hmm, perhaps not. Would you prefer all aliases of the L_PTE_MT_xx
>> constants that map to device permissions to be listed here?
>>
>
> Meh, there's no great solution and this code is all the kind of code
> that you just need to take the time to understand.  We could add a
> comment I suppose, if I got the above correct, I can throw something in?
>

Actually, I think the patch is wrong, and so is the commit message.

I got confused between HYP mappings and stage 2 mappings. HYP mappings
use an index into the MAIR (which HYP inherits from the kernel) but
the stage 2 mappings have a bit fiield describing the type.

So for one, I think that means that __kvm_pte_is_uncached() cannot be
used for both HYP and stage-2 PTE's, or we'd need to add a parameter
to distinguish between them.

For HYP mappings, we need to compare the MAIR index to values that are
known to refer to device or uncached mappings (as the patch does)
For S2 mappings, we need to mask the MemAttr[5:2] field, and interpret
it according to the description in the ARM ARM, i.e., MemAttr[3:2] ==
0b00 indicates device, MemAttr[3:0] == 0b0101 is uncached memory,
anything else requires cache maintenance.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
(adding lists)

On 10 November 2015 at 10:45, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> Hi all,
>
> I wonder if this is a better way to address the problem. It looks at
> the nature of the memory rather than the nature of the mapping, which
> is probably a more reliable indicator of whether cache maintenance is
> required when performing the unmap.
>
>
> ---8<
> The open coded tests for checking whether a PTE maps a page as
> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> which is not guaranteed to work since the type of a mapping is
> not a set of mutually exclusive bits
>
> For HYP mappings, the type is an index into the MAIR table (i.e, the
> index itself does not contain any information whatsoever about the
> type of the mapping), and for stage-2 mappings it is a bit field where
> normal memory and device types are defined as follows:
>
> #define MT_S2_NORMAL0xf
> #define MT_S2_DEVICE_nGnRE  0x1
>
> I.e., masking *and* comparing with the latter matches on the former,
> and we have been getting lucky merely because the S2 device mappings
> also have the PTE_UXN bit set, or we would misidentify memory mappings
> as device mappings.
>
> Since the unmap_range() code path (which contains one instance of the
> flawed test) is used both for HYP mappings and stage-2 mappings, and
> considering the difference between the two, it is non-trivial to fix
> this by rewriting the tests in place, as it would involve passing
> down the type of mapping through all the functions.
>
> However, since HYP mappings and stage-2 mappings both deal with host
> physical addresses, we can simply check whether the mapping is backed
> by memory that is managed by the host kernel, and only perform the
> D-cache maintenance if this is the case.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> ---
>  arch/arm/kvm/mmu.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 6984342da13d..7dace909d5cf 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> __kvm_flush_dcache_pud(pud);
>  }
>
> +static bool kvm_is_device_pfn(unsigned long pfn)
> +{
> +   return !pfn_valid(pfn);
> +}
> +
>  /**
>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>   * @kvm:   pointer to kvm structure.
> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> /* No need to invalidate the cache for device 
> mappings */
> -   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> PAGE_S2_DEVICE)
> +   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> kvm_flush_dcache_pte(old_pte);
>
> put_page(virt_to_page(pte));
> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) &&
> -   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> +   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> return kvm_vcpu_dabt_iswrite(vcpu);
>  }
>
> -static bool kvm_is_device_pfn(unsigned long pfn)
> -{
> -   return !pfn_valid(pfn);
> -}
> -
>  /**
>   * stage2_wp_ptes - write protect PMD range
>   * @pmd:   pointer to pmd entry
> --
> 1.9.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 15/21] arm64: KVM: Add panic handling

2015-11-16 Thread Ard Biesheuvel
On 16 November 2015 at 14:11, Marc Zyngier  wrote:
> Add the panic handler, together with the small bits of assembly
> code to call the kernel's panic implementation.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/hyp-entry.S | 11 ++-
>  arch/arm64/kvm/hyp/hyp.h   |  1 +
>  arch/arm64/kvm/hyp/switch.c| 35 +++
>  3 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index e11a129..7218eed 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -141,7 +141,16 @@ el1_irq:
> mov x1, #ARM_EXCEPTION_IRQ
> b   __guest_exit
>
> -.macro invalid_vector  label, target = __kvm_hyp_panic
> +ENTRY(__hyp_do_panic)
> +   mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\
> + PSR_MODE_EL1h)
> +   msr spsr_el2, lr
> +   ldr lr, =panic
> +   msr elr_el2, lr
> +   eret
> +ENDPROC(__hyp_do_panic)
> +
> +.macro invalid_vector  label, target = __hyp_panic
> .align  2
>  \label:
> b \target
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 240fb79..d5d500d 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -74,6 +74,7 @@ void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
>  void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
>
>  u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
> +void __noreturn __hyp_do_panic(unsigned long, ...);
>
>  #endif /* __ARM64_KVM_HYP_H__ */
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 06d3e20..cdc2a96 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -140,3 +140,38 @@ int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
>
> return exit_code;
>  }
> +
> +static const char *__hyp_panic_string = "HYP panic:\nPS:%08x PC:%p 
> ESR:%p\nFAR:%p HPFAR:%p PAR:%p\nVCPU:%p\n";
> +

Re separating the HYP text from the kernel proper: this is exactly the
thing that is likely to cause trouble when you execute the kernel text
from HYP.

__hyp_panic_string is a non-const char pointer containing the absolute
address of the string in the initializer, as seen from the high kernel
virtual mapping.
Better use 'static const char __hyp_panic_string[]' instead.

(If it currenty works fine, it is only because the compiler optimizes
the entire variable away, and performs a relative access in the place
where the variable is referenced.)


> +void __hyp_text __noreturn __hyp_panic(void)
> +{
> +   u64 spsr = read_sysreg(spsr_el2);
> +   u64 elr = read_sysreg(elr_el2);
> +   u64 par = read_sysreg(par_el1);
> +
> +   if (read_sysreg(vttbr_el2)) {
> +   struct kvm_vcpu *vcpu;
> +   struct kvm_cpu_context *host_ctxt;
> +
> +   vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
> +   host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +   __deactivate_traps(vcpu);
> +   __deactivate_vm(vcpu);
> +   __sysreg_restore_state(host_ctxt);
> +
> +   write_sysreg(host_ctxt->gp_regs.sp_el1, sp_el1);
> +   }
> +
> +   /* Call panic for real */
> +   while (1) {
> +   unsigned long str_va = (unsigned long)__hyp_panic_string;
> +
> +   str_va -= HYP_PAGE_OFFSET;
> +   str_va += PAGE_OFFSET;
> +   __hyp_do_panic(str_va,
> +  spsr,  elr,
> +  read_sysreg(esr_el2),   read_sysreg(far_el2),
> +  read_sysreg(hpfar_el2), par,
> +  read_sysreg(tpidr_el2));
> +   }
> +}
> --
> 2.1.4
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
On 10 November 2015 at 13:22, Christoffer Dall
<christoffer.d...@linaro.org> wrote:
> On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> Hi all,
>>
>> I wonder if this is a better way to address the problem. It looks at
>> the nature of the memory rather than the nature of the mapping, which
>> is probably a more reliable indicator of whether cache maintenance is
>> required when performing the unmap.
>>
>>
>> ---8<
>> The open coded tests for checking whether a PTE maps a page as
>> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> which is not guaranteed to work since the type of a mapping is
>> not a set of mutually exclusive bits
>>
>> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> index itself does not contain any information whatsoever about the
>> type of the mapping), and for stage-2 mappings it is a bit field where
>> normal memory and device types are defined as follows:
>>
>> #define MT_S2_NORMAL0xf
>> #define MT_S2_DEVICE_nGnRE  0x1
>>
>> I.e., masking *and* comparing with the latter matches on the former,
>> and we have been getting lucky merely because the S2 device mappings
>> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> as device mappings.
>>
>> Since the unmap_range() code path (which contains one instance of the
>> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> considering the difference between the two, it is non-trivial to fix
>> this by rewriting the tests in place, as it would involve passing
>> down the type of mapping through all the functions.
>>
>> However, since HYP mappings and stage-2 mappings both deal with host
>> physical addresses, we can simply check whether the mapping is backed
>> by memory that is managed by the host kernel, and only perform the
>> D-cache maintenance if this is the case.
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> ---
>>  arch/arm/kvm/mmu.c | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 6984342da13d..7dace909d5cf 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>>   __kvm_flush_dcache_pud(pud);
>>  }
>>
>> +static bool kvm_is_device_pfn(unsigned long pfn)
>> +{
>> + return !pfn_valid(pfn);
>> +}
>> +
>>  /**
>>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>>   * @kvm: pointer to kvm structure.
>> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>>
>>   /* No need to invalidate the cache for device mappings 
>> */
>> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> PAGE_S2_DEVICE)
>> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>>   kvm_flush_dcache_pte(old_pte);
>>
>>   put_page(virt_to_page(pte));
>> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) &&
>> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> + if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>>   return kvm_vcpu_dabt_iswrite(vcpu);
>>  }
>>
>> -static bool kvm_is_device_pfn(unsigned long pfn)
>> -{
>> - return !pfn_valid(pfn);
>> -}
>> -
>>  /**
>>   * stage2_wp_ptes - write protect PMD range
>>   * @pmd: pointer to pmd entry
>> --
>> 1.9.1
>>
>
> So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
> PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
> regions where the vma has (vm_flags & VM_PFNMAP).
>
> Will these, and only these, cases be covered by the pfn_valid check?
>

The pfn_valid() check will ensure that cache maintenance is only
performed on regions that are known to the host as memory, are managed
by the host (i.e., there is a struct page associated with them

Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
On 10 November 2015 at 14:40, Christoffer Dall
<christoffer.d...@linaro.org> wrote:
> On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
>> On 10 November 2015 at 13:22, Christoffer Dall
>> <christoffer.d...@linaro.org> wrote:
>> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> >> Hi all,
>> >>
>> >> I wonder if this is a better way to address the problem. It looks at
>> >> the nature of the memory rather than the nature of the mapping, which
>> >> is probably a more reliable indicator of whether cache maintenance is
>> >> required when performing the unmap.
>> >>
>> >>
>> >> ---8<
>> >> The open coded tests for checking whether a PTE maps a page as
>> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> >> which is not guaranteed to work since the type of a mapping is
>> >> not a set of mutually exclusive bits
>> >>
>> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> >> index itself does not contain any information whatsoever about the
>> >> type of the mapping), and for stage-2 mappings it is a bit field where
>> >> normal memory and device types are defined as follows:
>> >>
>> >> #define MT_S2_NORMAL0xf
>> >> #define MT_S2_DEVICE_nGnRE  0x1
>> >>
>> >> I.e., masking *and* comparing with the latter matches on the former,
>> >> and we have been getting lucky merely because the S2 device mappings
>> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> >> as device mappings.
>> >>
>> >> Since the unmap_range() code path (which contains one instance of the
>> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> >> considering the difference between the two, it is non-trivial to fix
>> >> this by rewriting the tests in place, as it would involve passing
>> >> down the type of mapping through all the functions.
>> >>
>> >> However, since HYP mappings and stage-2 mappings both deal with host
>> >> physical addresses, we can simply check whether the mapping is backed
>> >> by memory that is managed by the host kernel, and only perform the
>> >> D-cache maintenance if this is the case.
>> >>
>> >> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> >> ---
>> >>  arch/arm/kvm/mmu.c | 15 +++
>> >>  1 file changed, 7 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> >> index 6984342da13d..7dace909d5cf 100644
>> >> --- a/arch/arm/kvm/mmu.c
>> >> +++ b/arch/arm/kvm/mmu.c
>> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>> >>   __kvm_flush_dcache_pud(pud);
>> >>  }
>> >>
>> >> +static bool kvm_is_device_pfn(unsigned long pfn)
>> >> +{
>> >> + return !pfn_valid(pfn);
>> >> +}
>> >> +
>> >>  /**
>> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>> >>   * @kvm: pointer to kvm structure.
>> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>> >>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>> >>
>> >>   /* No need to invalidate the cache for device 
>> >> mappings */
>> >> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> >> PAGE_S2_DEVICE)
>> >> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>   kvm_flush_dcache_pte(old_pte);
>> >>
>> >>   put_page(virt_to_page(pte));
>> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> >> *pmd,
>> >>
>> >>   pte = pte_offset_kernel(pmd, addr);
>> >>   do {
>> >> - if (!pte_none(*pte) &&
>> >> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> >> + if (!pte_none(*pte) && 
>> >> !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>   kvm_flush_dcache_pte(*pte);
>> >>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>

Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0

2015-08-31 Thread Ard Biesheuvel
On 30 August 2015 at 15:54, Christoffer Dall
<christoffer.d...@linaro.org> wrote:
> Provide a better quality of implementation and be architecture compliant
> on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> time, ensuring the timer output is not asserted after, for example, a
> PSCI system reset.
>
> This change alone fixes the UEFI reset issue reported by Laszlo back in
> February.
>

Do you have a link to that report? I can't quite remember the details ...

> Cc: Laszlo Ersek <ler...@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Cc: Drew Jones <drjo...@redhat.com>
> Cc: Wei Huang <w...@redhat.com>
> Cc: Peter Maydell <peter.mayd...@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.d...@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 747302f..8a0fdfc 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -255,6 +255,15 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> timer->irq.irq = irq->irq;
>
> /*
> +* The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> +* and to 0 for ARMv7.  We provide an implementation that always
> +* resets the timer to be disabled and unmasked and is compliant with
> +* the ARMv7 architecture.
> +*/
> +   timer->cntv_ctl = 0;
> +   kvm_timer_update_state(vcpu);
> +
> +   /*
>  * Tell the VGIC that the virtual interrupt is tied to a
>  * physical interrupt. We do that once per VCPU.
>  */
> --
> 2.1.2.330.g565301e.dirty
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 00/14] arm64: 16K translation granule support

2015-09-02 Thread Ard Biesheuvel
On 13 August 2015 at 13:33, Suzuki K. Poulose <suzuki.poul...@arm.com> wrote:
> From: "Suzuki K. Poulose" <suzuki.poul...@arm.com>
>
> This series enables the 16K page size support on Linux for arm64.
> This series adds support for 48bit VA(4 level), 47bit VA(3 level) and
> 36bit VA(2 level) with 16K. 16K was a late addition to the architecture
> and is not implemented by all CPUs. Added a check to ensure the
> selected granule size is supported by the CPU, failing which the CPU
> won't proceed with booting.
>
> KVM bits have been tested on a fast model with GICv3 using Andre's kvmtool
> with gicv3 support[1].
>
> Patches 1-7 cleans up the kernel page size handling code.
> Patches 8-11 Fixes some issues with the KVM bits, mainly the fake PGD
>  handling code.
> Patch 12Adds a check to ensure the CPU supports the selected granule size.
> Patch 13-14 Adds the 16k page size support bits.
>
> This series applies on top of for-next/core branch of the aarch64 tree and is
> also available here:
>
> git://linux-arm.org/linux-skp.git  16k/v1
>


Hi Suzuki,

I have given this a spin on the FVP Base model to check UEFI booting,
and everything seems to work fine. (I tested 2-level and 3-level)
I didn't test the KVM changes, so for all patches except those:

Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

Regards,
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 12/14] arm64: Check for selected granule support

2015-09-02 Thread Ard Biesheuvel
On 13 August 2015 at 19:29, Catalin Marinas  wrote:
> On Thu, Aug 13, 2015 at 03:45:07PM +0100, Suzuki K. Poulose wrote:
>> On 13/08/15 13:28, Steve Capper wrote:
>> >On 13 August 2015 at 12:34, Suzuki K. Poulose  
>> >wrote:
>> >>  __enable_mmu:
>> >>+   mrs x1, ID_AA64MMFR0_EL1
>> >>+   ubfxx2, x1, #ID_AA64MMFR0_TGran_SHIFT, 4
>> >>+   cmp x2, #ID_AA64MMFR0_TGran_ENABLED
>> >>+   b.ne__no_granule_support
>> >> ldr x5, =vectors
>> >> msr vbar_el1, x5
>> >> msr ttbr0_el1, x25  // load TTBR0
>> >>@@ -626,3 +643,8 @@ __enable_mmu:
>> >> isb
>> >> br  x27
>> >>  ENDPROC(__enable_mmu)
>> >>+
>> >>+__no_granule_support:
>> >>+   wfe
>> >>+   b __no_granule_support
>> >>+ENDPROC(__no_granule_support)
>> >>--
>> >>1.7.9.5
>> >>
>> >
>> >Is is possible to tell the user that the kernel has failed to boot due
>> >to the kernel granule being unsupported?
>>
>> We don't have anything up at this time. The "looping address" is actually a 
>> clue
>> to the (expert) user. Not sure we can do something, until we get something 
>> like DEBUG_LL(?)
>
> No.
>
>> Or we should let it continue and end in a panic(?). The current situation 
>> can boot a
>> multi-cluster system with boot cluster having the Tgran support(which 
>> doesn't make a
>> strong use case though). I will try out some options and get back to you.
>
> If the boot CPU does not support 16KB pages, in general there isn't much
> we can do since the console printing is done after we enabled the MMU.
> Even mapping the UART address requires fixmap support and the PAGE_SIZE
> is hard-coded in the kernel image. The DT is also mapped at run-time.
>
> While in theory it's possible to fall back to a 4KB page size just
> enough to load the DT and figure out the early console, I suggest we
> just live with the "looping address" clue.
>

Couldn't we allocate some flag bits in the Image header to communicate
the page size to the bootloader?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 12/14] arm64: Check for selected granule support

2015-09-02 Thread Ard Biesheuvel
On 2 September 2015 at 11:48, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> On 13 August 2015 at 19:29, Catalin Marinas <catalin.mari...@arm.com> wrote:
>> On Thu, Aug 13, 2015 at 03:45:07PM +0100, Suzuki K. Poulose wrote:
>>> On 13/08/15 13:28, Steve Capper wrote:
>>> >On 13 August 2015 at 12:34, Suzuki K. Poulose <suzuki.poul...@arm.com> 
>>> >wrote:
>>> >>  __enable_mmu:
>>> >>+   mrs x1, ID_AA64MMFR0_EL1
>>> >>+   ubfxx2, x1, #ID_AA64MMFR0_TGran_SHIFT, 4
>>> >>+   cmp x2, #ID_AA64MMFR0_TGran_ENABLED
>>> >>+   b.ne__no_granule_support
>>> >> ldr x5, =vectors
>>> >> msr vbar_el1, x5
>>> >> msr ttbr0_el1, x25  // load TTBR0
>>> >>@@ -626,3 +643,8 @@ __enable_mmu:
>>> >> isb
>>> >> br  x27
>>> >>  ENDPROC(__enable_mmu)
>>> >>+
>>> >>+__no_granule_support:
>>> >>+   wfe
>>> >>+   b __no_granule_support
>>> >>+ENDPROC(__no_granule_support)
>>> >>--
>>> >>1.7.9.5
>>> >>
>>> >
>>> >Is is possible to tell the user that the kernel has failed to boot due
>>> >to the kernel granule being unsupported?
>>>
>>> We don't have anything up at this time. The "looping address" is actually a 
>>> clue
>>> to the (expert) user. Not sure we can do something, until we get something 
>>> like DEBUG_LL(?)
>>
>> No.
>>
>>> Or we should let it continue and end in a panic(?). The current situation 
>>> can boot a
>>> multi-cluster system with boot cluster having the Tgran support(which 
>>> doesn't make a
>>> strong use case though). I will try out some options and get back to you.
>>
>> If the boot CPU does not support 16KB pages, in general there isn't much
>> we can do since the console printing is done after we enabled the MMU.
>> Even mapping the UART address requires fixmap support and the PAGE_SIZE
>> is hard-coded in the kernel image. The DT is also mapped at run-time.
>>
>> While in theory it's possible to fall back to a 4KB page size just
>> enough to load the DT and figure out the early console, I suggest we
>> just live with the "looping address" clue.
>>
>
> Couldn't we allocate some flag bits in the Image header to communicate
> the page size to the bootloader?

Something like this perhaps?

8<---
diff --git a/Documentation/arm64/booting.txt b/Documentation/arm64/booting.txt
index 7d9d3c2286b2..13a8aaa9a6e9 100644
--- a/Documentation/arm64/booting.txt
+++ b/Documentation/arm64/booting.txt
@@ -104,7 +104,8 @@ Header notes:
 - The flags field (introduced in v3.17) is a little-endian 64-bit field
   composed as follows:
   Bit 0:   Kernel endianness.  1 if BE, 0 if LE.
-  Bits 1-63:   Reserved.
+  Bits 1-2:Kernel page size.   0=unspecified, 1=4K, 2=16K, 3=64K
+  Bits 3-63:   Reserved.

 - When image_size is zero, a bootloader should attempt to keep as much
   memory as possible free for use by the kernel immediately after the
diff --git a/arch/arm64/kernel/image.h b/arch/arm64/kernel/image.h
index 8fae0756e175..5def289bda84 100644
--- a/arch/arm64/kernel/image.h
+++ b/arch/arm64/kernel/image.h
@@ -47,7 +47,9 @@
 #define __HEAD_FLAG_BE 0
 #endif

-#define __HEAD_FLAGS   (__HEAD_FLAG_BE << 0)
+#define __HEAD_FLAG_PAGE_SIZE ((PAGE_SHIFT - 10) / 2)
+
+#define __HEAD_FLAGS   (__HEAD_FLAG_BE << 0) | (__HEAD_FLAG_PAGE_SIZE << 1)

 /*
  * These will output as part of the Image header, which should be little-endian
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 09/15] arm64: Add page size to the kernel image header

2015-10-05 Thread Ard Biesheuvel
On 5 October 2015 at 14:02, Suzuki K. Poulose <suzuki.poul...@arm.com> wrote:
> On 02/10/15 16:49, Catalin Marinas wrote:
>>
>> On Tue, Sep 15, 2015 at 04:41:18PM +0100, Suzuki K. Poulose wrote:
>>>
>>> From: Ard Biesheuvel <ard.biesheu...@linaro.org>
>>>
>>> This patch adds the page size to the arm64 kernel image header
>>> so that one can infer the PAGESIZE used by the kernel. This will
>>> be helpful to diagnose failures to boot the kernel with page size
>>> not supported by the CPU.
>>>
>>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>>
>>
>> This patch needs you signed-off-by as well since you are posting it. And
>> IIRC I acked it as well, I'll check.
>>
>
> Yes, you did mention that you were OK with the patch. But I thought there
> was
> no  'Acked-by' tag added. Hence didn't pick that up.
>

In my version of this patch (which I sent separately before noticing
that Suzuki had already folded it into his series), I took the comment
from Catalin to the email in which I suggested this change as an
implicit Acked-by.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Ard Biesheuvel
Hi Pavel,

Thanks for getting to the bottom of this.

On 1 December 2015 at 14:03, Pavel Fedin  wrote:
> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> values, depending on how much guest and host memory maps match. This
> results in completely broken KVM on some boards. The problem has been
> caught on Samsung proprietary hardware.
>
> Cc: sta...@vger.kernel.org
> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
>

That commit is not in a release yet, so no need for cc stable

> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace90..51ad98f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> +   if (!pte_none(*pte) &&
> +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)

I think your analysis is correct, but does that not apply to both instances?
And instead of reverting, could we fix this properly instead?

> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 2.4.4
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-02 Thread Ard Biesheuvel
On 2 December 2015 at 19:50, Christoffer Dall
 wrote:
> On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
>> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> values, depending on how much guest and host memory maps match. This
>> results in completely broken KVM on some boards. The problem has been
>> caught on Samsung proprietary hardware.
>>
>> Cc: sta...@vger.kernel.org
>
> cc'ing stable doesn't make sense here as the bug was introduced in
> v4.4-rc3 and we didn't release v4.4 yet...
>
>> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
>> uncachedness")
>>
>> Signed-off-by: Pavel Fedin 
>> ---
>>  arch/arm/kvm/mmu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7dace90..51ad98f 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> + if (!pte_none(*pte) &&
>> + (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>
> You are right that there was a bug in the fix, but your fix is not the
> right one.
>
> Either we have to apply an actual mask and the compare against the value
> (yes, I know, because of the UXN bit we get lucky so far, but that's too
> brittle), or we should do a translation fo the gfn to a pfn.  Is there
> anything preventing us to do the following?
>
> if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
>

Yes, that looks better. I got confused by addr being a 'phys_addr_t'
but obviously, the address inside the PTE is the one we need to test
for device-ness, so I think we should replace both instances with this

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-04 Thread Ard Biesheuvel
On 4 December 2015 at 02:58, Ben Hutchings <b...@decadent.org.uk> wrote:
> On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote:
>> Hi Pavel,
>>
>> Thanks for getting to the bottom of this.
>>
>> On 1 December 2015 at 14:03, Pavel Fedin <p.fe...@samsung.com> wrote:
>> > This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> > real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> > values, depending on how much guest and host memory maps match. This
>> > results in completely broken KVM on some boards. The problem has been
>> > caught on Samsung proprietary hardware.
>> >
>> > Cc: sta...@vger.kernel.org
>> > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's 
>> > uncachedness")
>> >
>>
>> That commit is not in a release yet, so no need for cc stable
> [...]
>
> But it is cc'd to stable, so unless it is going to be nacked at review
> stage, any subsequent fixes should also be cc'd.
>

Ah yes, thanks for pointing that out.

But please, don't cc your proposed patches straight to
sta...@vger.kernel.org. I usually leave it up to the maintainer that
merges the patch to add the Cc: line to the commit log.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2] ARM/arm64: KVM: correct PTE uncachedness check

2015-12-03 Thread Ard Biesheuvel
Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's
uncachedness") modified the logic to test whether a HYP or stage-2
mapping needs flushing, from [incorrectly] interpreting the page table
attributes to [incorrectly] checking whether the PFN that backs the
mapping is covered by host system RAM. The PFN number is part of the
output of the translation, not the input, so we have to use pte_pfn()
on the contents of the PTE, not __phys_to_pfn() on the HYP virtual
address or stage-2 intermediate physical address.

Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm/kvm/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace909d5cf..61d96a645ff3 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -218,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
/* No need to invalidate the cache for device mappings 
*/
-   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
+   if (!kvm_is_device_pfn(pte_pfn(old_pte)))
kvm_flush_dcache_pte(old_pte);
 
put_page(virt_to_page(pte));
@@ -310,7 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
pte = pte_offset_kernel(pmd, addr);
do {
-   if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
+   if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
-- 
1.9.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-03 Thread Ard Biesheuvel
On 3 December 2015 at 08:14, Pavel Fedin  wrote:
>  Hello!
>
>> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> > index 7dace90..51ad98f 100644
>> > --- a/arch/arm/kvm/mmu.c
>> > +++ b/arch/arm/kvm/mmu.c
>> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> > *pmd,
>> >
>> > pte = pte_offset_kernel(pmd, addr);
>> > do {
>> > -   if (!pte_none(*pte) && 
>> > !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> > +   if (!pte_none(*pte) &&
>> > +   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>
>> I think your analysis is correct, but does that not apply to both instances?
>
>  No no, another one is correct, since it operates on real PFN (at least looks 
> like so). I have verified my fix against the original problem (crash on 
> Exynos5410 without generic timer), and it still works fine there.
>

I don't think so. Regardless of whether you are manipulating HYP
mappings or stage-2 mappings, the physical address is always the
output, not the input of the translation, so addr is always either a
virtual address or a intermediate physical address, whereas
pfn_valid() operates on host physical addresses.

>> And instead of reverting, could we fix this properly instead?
>
>  Of course, i'm not against alternate approaches, feel free to. I've just 
> suggested what i could, to fix things quickly. I'm indeed no expert in KVM 
> memory management yet. After all, this is what mailing lists are for.
>

OK. I will follow up with a patch, as Christoffer requested. I'd
appreciate it if you could test to see if it also fixes the current
issue, and the original arch timer issue.

Thanks,
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] ARM/arm64: KVM: correct PTE uncachedness check

2015-12-03 Thread Ard Biesheuvel
Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's
uncachedness") modified the logic to test whether a HYP or stage-2
mapping needs flushing, from [incorrectly] interpreting the page table
attributes to [incorrectly] checking whether the PFN that backs the
mapping is covered by host system RAM. The PFN number is part of the
output of the translation, not the input, so we have to use pte_pfn()
on the contents of the PTE, not __phys_to_pfn() on the HYP virtual
address or stage-2 intermediate physical address.

Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm/kvm/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace909d5cf..9708c342795f 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -218,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
/* No need to invalidate the cache for device mappings 
*/
-   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
+   if (!kvm_is_device_pfn(pte_pfn(addr)))
kvm_flush_dcache_pte(old_pte);
 
put_page(virt_to_page(pte));
@@ -310,7 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
pte = pte_offset_kernel(pmd, addr);
do {
-   if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
+   if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(addr)))
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
-- 
1.9.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] ARM/arm64: KVM: correct PTE uncachedness check

2015-12-03 Thread Ard Biesheuvel
PLEASE disregard, this patch is wrong. v2 coming up


On 3 December 2015 at 09:20, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> Commit e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's
> uncachedness") modified the logic to test whether a HYP or stage-2
> mapping needs flushing, from [incorrectly] interpreting the page table
> attributes to [incorrectly] checking whether the PFN that backs the
> mapping is covered by host system RAM. The PFN number is part of the
> output of the translation, not the input, so we have to use pte_pfn()
> on the contents of the PTE, not __phys_to_pfn() on the HYP virtual
> address or stage-2 intermediate physical address.
>
> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> ---
>  arch/arm/kvm/mmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace909d5cf..9708c342795f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -218,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> /* No need to invalidate the cache for device 
> mappings */
> -   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> +   if (!kvm_is_device_pfn(pte_pfn(addr)))
> kvm_flush_dcache_pte(old_pte);
>
> put_page(virt_to_page(pte));
> @@ -310,7 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> +   if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(addr)))
> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 1.9.1
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-24 Thread Ard Biesheuvel
On 24 June 2016 at 16:04, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
[...]
> Note that this issue not only affects framebuffers on PCI cards, it
> also affects emulated USB host controllers (perhaps Alex can remind us
> which one exactly?)

Actually, looking at the QEMU source code, I am not able to spot the
USB hcd emulation code that backs a PCI MMIO BAR using host memory,
and in fact, the only instance I *can* find is vga-pci.c

@Alex: could you please explain which exact issue with USB emulation
is suspected to be caused by this?

@team-RH: are there any other examples beyond VGA PCI where this is a problem?

Thanks,
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] help guest boot up on AArch64 host with GICv2

2016-01-28 Thread Ard Biesheuvel
On 28 January 2016 at 21:12, Chris Metcalf  wrote:
> On 01/27/2016 04:12 AM, Marc Zyngier wrote:
>>
>> On 26/01/16 20:43, Chris Metcalf wrote:
>>>
>>> On 01/18/2016 04:28 AM, Marc Zyngier wrote:

 Hi Chris,

 On 15/01/16 20:02, Chris Metcalf wrote:
>
> We are using GICv2 compatibility mode in the Fast Models/Foundation
> Models simulations we are running because the boot code (ATF/UEFI)
> doesn't support GICv3 in our system at the moment.
>
> However, starting with kernel 4.2, the guest couldn't boot up because
> it
> wasn't getting timer interrupts.  I tracked this down to a kernel
> commit
> that switched to using the "alternatives" mechanism -- rather than
> seeing either a GICv2 or GICv3 and configuring appropriately, the KVM
> code just configured the code that saves/restores the vgic state based
> on the presence of the system register interface to the GIC CPU
> interface.  See the attached patch for a fix that manages this
> differently and allows me to boot up the guest in this configuration.
>
> However, even assuming this patch can be taken into an upstream tree, I
> still have a couple of additional problems:
>
> - I can boot up with the Foundation Models using this change, but not
> with the Fast Models (again, using a v3 GIC but in v2 compatibility
> mode
> in the device tree).  The Fast Models dts looks like it has the same
> configuration for the GIC and the timers so I'm not sure what's going
> on
> here.  Any suggestions appreciated.
>
> - Without this change, I could only boot kernels up to 4.1.  With the
> change, I can boot kernels up to 4.3.  But 4.4 won't boot for me
> either;
> I haven't bisected it down yet.  So any suggestions on what might be
> going wrong here would also be appreciated.
>
> We are planning to eventually use GICv3 mode in our software stack but
> for the time being I assume it is interesting to resolve issues with
> GIC
> v2 compatibility mode on GIC v3.
>
 I'm afraid that this is the wrong approach. Whilst 4.2 was a bit too
 eager to use GICv3 (only checking the CPU capability and ignoring the
 actual state of the EL2/EL3 SRE bits), the fact that 4.4 doesn't boot is
 probably the sign of a broken firmware that enables the system register
 interface at EL3, letting the rest of the software stack to use GICv3 in
 native mode, and yet providing a GICv2 DT.

 This combination is unpredictable, and is likely to  cause issues on
 some HW implementations.

 Could you please point me to the firmware you're using?

 Also, please check the following patches:

 6d32ab2 arm64: Update booting requirements for GICv3 in GICv2 mode
 76e52dd irqchip/gic: Warn if GICv3 system registers are enabled
 963fcd4 arm64: cpufeatures: Check ICC_EL1_SRE.SRE before enabling
 ARM64_HAS_SYSREG_GIC_CPUIF
 7cabd00 irqchip/gic-v3: Make gic_enable_sre an inline function
 d271976 arm64: el2_setup: Make sure ICC_SRE_EL2.SRE sticks before using
 GICv3 sysregs

 Can you point me to the one that prevents you from booting?
>>>
>>> The problematic commit is 963fcd4, because it calls gic_enable_sre()
>>> in the host kernel even with a GICv2 DT specified, and this seems to
>>> put things in a state such that we don't receive virtual timer
>>> interrupts in the guest when we boot it up.  (I'm not that familiar with
>>> the QEMU DT but it is providing a GIC v2 to the guest.)
>>>
>>> With a v4.5-rc1 host, if I "return false" before the code in
>>> gic_enable_sre()
>>> that tries to actually enable the SRE, and then hardcode the
>>> __vgic_v2_XXX_state() save/restore calls into the __vgic_XXX_state()
>>> routines, then my guest boots up OK.
>>
>> What if you just do the "return false"? I bet that it will work as well...
>
>
> Yes, that also works for my case.
>
>>> We are using a modified ARM version of EDK v3.0-rc0, and a modified
>>> ARM Trusted Firmware based on commit 963fcd4 (between v1.1 and 1.2).
>>


What does 'EDK v3.0-rc0' mean? We don't do any versioned releases afaik,

I recently fixed a GIC issue in the FVP EDK2 code, which prevented it
from running the GICv3 in native mode rather than in GICv2
compatibility mode.

33ed33f ArmPkg/ArmGic: fix bug in GICv3 distributor configuration


>>> We certainly haven't touched any of the GIC code in either one.
>>>
>>> I tried to modify the host DT to enable GICv3, but then the host itself
>>> hangs on boot, so clearly more is needed.  (To be fair I've only tested
>>> v4.4 in that configuration, not v4.5-rc1.)  The firmware isn't yet using
>>> GICv3 so perhaps that is part of the problem.
>>
>> That's indeed part of the problem. The firmware running at EL3 insists
>> on using GICv2, but still let EL2 (and EL1) use GICv3 system registers.
>> Could you please dump the content of 

Re: [PATCH v2 06/21] arm64: KVM: VHE: Patch out use of HVC

2016-02-01 Thread Ard Biesheuvel
On 1 February 2016 at 17:20, Marc Zyngier  wrote:
> On 01/02/16 15:36, Catalin Marinas wrote:
>> On Mon, Feb 01, 2016 at 01:34:16PM +, Marc Zyngier wrote:
>>> On 01/02/16 13:16, Christoffer Dall wrote:
 On Mon, Jan 25, 2016 at 03:53:40PM +, Marc Zyngier wrote:
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S 
> b/arch/arm64/kvm/hyp/hyp-entry.S
> index 93e8d983..9e0683f 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -38,6 +38,32 @@
>ldp x0, x1, [sp], #16
>  .endm
>
> +.macro do_el2_call
> +  /*
> +   * Shuffle the parameters before calling the function
> +   * pointed to in x0. Assumes parameters in x[1,2,3].
> +   */
> +  stp lr, xzr, [sp, #-16]!

 remind me why this pair isn't just doing "str" instead of "stp" with the
 xzr ?
>>>
>>> Because SP has to be aligned on a 16 bytes boundary at all times.
>>
>> You could do something like:
>>
>>   sub sp, sp, #16
>>   str lr, [sp]
>>
>
> Ah, fair enough. I'll fold that in.
>

Since we're micro-reviewing: what's wrong with

str lr, [sp, #-16]!

?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: KVM: Turn kvm_ksym_ref into a NOP on VHE

2016-03-21 Thread Ard Biesheuvel
On 18 March 2016 at 18:25, Marc Zyngier <marc.zyng...@arm.com> wrote:
> When running with VHE, there is no need to translate kernel pointers
> to the EL2 memory space, since we're already there (and we have a much
> saner memory map to start with).
>
> Unfortunately, kvm_ksym_ref is getting in the way, and the first
> call into the "hypervisor" section is going to end up in fireworks,
> since we're now branching into nowhereland. Meh.
>
> A potential solution is to test if VHE is engaged or not, and only
> perform the translation in the negative case. With this in place,
> VHE is able to run again.
>
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

I think you need the & when initializing val, otherwise, it will
silently refer to the value rather than the address of a void* symbol
if we ever end up using this macro on one.

That was the whoie point of the opaque struct type in the original
patch that introduced this macro, to disallow references lacking the
&, but unfortunately, that was incompatible with the other VHE
changes.

With that fixed

Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

> ---
>  arch/arm64/include/asm/kvm_asm.h | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 226f49d..282f907 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -26,7 +26,13 @@
>  #define KVM_ARM64_DEBUG_DIRTY_SHIFT0
>  #define KVM_ARM64_DEBUG_DIRTY  (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)
>
> -#define kvm_ksym_ref(sym)  phys_to_virt((u64) - 
> kimage_voffset)
> +#define kvm_ksym_ref(sym)  \
> +   ({  \
> +   void *val = sym;\
> +   if (!is_kernel_in_hyp_mode())   \
> +   val = phys_to_virt((u64) - kimage_voffset); \
> +   val;\
> +})
>
>  #ifndef __ASSEMBLY__
>  struct kvm;
> --
> 2.1.4
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: KVM: Move kvm_call_hyp back to its original localtion

2016-03-01 Thread Ard Biesheuvel
On 1 March 2016 at 14:12, Marc Zyngier <marc.zyng...@arm.com> wrote:
> In order to reduce the risk of a bad merge, let's move the new
> kvm_call_hyp back to its original location in the file. This has
> zero impact from a code point of view.
>
> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>

Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

> ---
> Catalin,
>
> Would you mind merging this on top of arm64/for-next/core? I just
> verified that this prevents yet another mismerge between the arm64 and
> KVM trees.
>
> Thanks,
>
> M.
>
>  arch/arm64/include/asm/kvm_host.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e3d67ff..e465b6d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -308,6 +308,8 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
>  struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
>
>  u64 __kvm_call_hyp(void *hypfn, ...);
> +#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> +
>  void force_vm_exit(const cpumask_t *mask);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>
> @@ -343,6 +345,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>
> -#define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> -
>  #endif /* __ARM64_KVM_HOST_H__ */
> --
> 2.1.4
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: usb keyboard and mouse can't work on QEMU ARM64 with KVM

2016-07-26 Thread Ard Biesheuvel
On 26 July 2016 at 09:34, Shannon Zhao  wrote:
> Hi,
>
> Recently I'm trying to use usb keyboard and mouse with QEMU on ARM64. Below 
> is my QEMU command line,
> host and guest kernel both are 4.7.0-rc7+, and I ran it on Hikey board.
>
> qemu-system-aarch64 \
> -smp 1 -cpu host -enable-kvm \
> -m 256 -M virt \
> -k en-us \
> -nographic \
> -device usb-ehci -device usb-kbd -device usb-mouse -usb\
> -kernel Image \
> -initrd guestfs.cpio.gz \
> -append "rdinit=/sbin/init console=ttyAMA0 root=/dev/ram 
> earlycon=pl011,0x900 rw"
>
> The following guest log shows that usb controller can be probed but the 
> keyboard and mouse can't be
> found.
>
> [1.597433] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [1.599562] ehci-pci: EHCI PCI platform driver
> [1.608082] ehci-pci :00:03.0: EHCI Host Controller
> [1.609485] ehci-pci :00:03.0: new USB bus registered, assigned bus 
> number 1
> [1.611833] ehci-pci :00:03.0: irq 49, io mem 0x10041000
> [1.623599] ehci-pci :00:03.0: USB 2.0 started, EHCI 1.00
> [1.625867] hub 1-0:1.0: USB hub found
> [1.626906] hub 1-0:1.0: 6 ports detected
> [1.628685] ehci-platform: EHCI generic platform driver
> [1.630263] ehci-msm: Qualcomm On-Chip EHCI Host Controller
> [1.631947] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [1.633547] ohci-pci: OHCI PCI platform driver
> [1.634807] ohci-platform: OHCI generic platform driver
> [...]
> [1.939001] usb 1-1: new high-speed USB device number 2 using ehci-pci
> [   17.467040] usb 1-1: device not accepting address 2, error -110
> [   17.579165] usb 1-1: new high-speed USB device number 3 using ehci-pci
> [   32.287242] random: dd urandom read with 7 bits of entropy available
> [   33.110970] usb 1-1: device not accepting address 3, error -110
> [   33.223030] usb 1-1: new high-speed USB device number 4 using ehci-pci
> [   43.635185] usb 1-1: device not accepting address 4, error -110
> [   43.747033] usb 1-1: new high-speed USB device number 5 using ehci-pci
> [   54.159043] usb 1-1: device not accepting address 5, error -110
> [   54.160752] usb usb1-port1: unable to enumerate USB device
> [   54.307290] usb 1-2: new high-speed USB device number 6 using ehci-pci
> [   69.839052] usb 1-2: device not accepting address 6, error -110
> [   69.951249] usb 1-2: new high-speed USB device number 7 using ehci-pci
> [   85.483171] usb 1-2: device not accepting address 7, error -110
> [   85.595035] usb 1-2: new high-speed USB device number 8 using ehci-pci
> [   90.619247] usb 1-2: device descriptor read/8, error -110
> [   95.743482] usb 1-2: device descriptor read/8, error -110
> [   95.959165] usb 1-2: new high-speed USB device number 9 using ehci-pci
> [  106.371177] usb 1-2: device not accepting address 9, error -110
> [  106.372894] usb usb1-port2: unable to enumerate USB device
>
> lsusb shows:
> root@genericarmv8:~# lsusb
> Bus 001 Device 001: ID 1d6b:0002
>
> Besides, I have also tried QEMU TCG without KVM. The guest can successfully 
> probe usb controller,
> keyboard and mouse.
> lsusb shows:
> root@genericarmv8:~# lsusb
> Bus 001 Device 002: ID 0627:0001
> Bus 001 Device 003: ID 0627:0001
> Bus 001 Device 001: ID 1d6b:0002
>
> So it looks like that usb keyboard and mouse don't work with KVM on QEMU 
> ARM64 while they can work
> with TCG. IIUC, all the usb devices are emulated by QEMU, it has nothing with 
> KVM. So it really
> confused me and I'm not familiar with usb devices. Also I have seen someone 
> else reports this issue
> before[1].
>
> [1]https://lists.gnu.org/archive/html/qemu-arm/2016-06/msg00110.html
>
> Any comments and help are welcome. Thanks in advance.
>

Does your QEMU have this patch?
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=5d636e21c44ecf982a22a7bc4ca89186079ac283

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-27 Thread Ard Biesheuvel
On 27 June 2016 at 15:35, Christoffer Dall <christoffer.d...@linaro.org> wrote:
> On Mon, Jun 27, 2016 at 02:30:46PM +0200, Ard Biesheuvel wrote:
>> On 27 June 2016 at 12:34, Christoffer Dall <christoffer.d...@linaro.org> 
>> wrote:
>> > On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>> >> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.d...@linaro.org> 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I'm going to ask some stupid questions here...
>> >> >
>> >> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
>> >> >> Hi all,
>> >> >>
>> >> >> This old subject came up again in a discussion related to PCIe support
>> >> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
>> >> >> regions as cacheable is preventing us from reusing a significant slice
>> >> >> of the PCIe support infrastructure, and so I'd like to bring this up
>> >> >> again, perhaps just to reiterate why we're simply out of luck.
>> >> >>
>> >> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions
>> >> >> for emulated devices may be backed by memory that is mapped cacheable
>> >> >> by the host. Note that this has nothing to do with the device being
>> >> >> DMA coherent or not: in this case, we are dealing with regions that
>> >> >> are not memory from the POV of the guest, and it is reasonable for the
>> >> >> guest to assume that accesses to such a region are not visible to the
>> >> >> device before they hit the actual PCI MMIO window and are translated
>> >> >> into cycles on the PCI bus.
>> >> >
>> >> > For the sake of completeness, why is this reasonable?
>> >> >
>> >>
>> >> Because the whole point of accessing these regions is to communicate
>> >> with the device. It is common to use write combining mappings for
>> >> things like framebuffers to group writes before they hit the PCI bus,
>> >> but any caching just makes it more difficult for the driver state and
>> >> device state to remain synchronized.
>> >>
>> >> > Is this how any real ARM system implementing PCI would actually work?
>> >> >
>> >>
>> >> Yes.
>> >>
>> >> >> That means that mapping such a region
>> >> >> cacheable is a strange thing to do, in fact, and it is unlikely that
>> >> >> patches implementing this against the generic PCI stack in Tianocore
>> >> >> will be accepted by the maintainers.
>> >> >>
>> >> >> Note that this issue not only affects framebuffers on PCI cards, it
>> >> >> also affects emulated USB host controllers (perhaps Alex can remind us
>> >> >> which one exactly?) and likely other emulated generic PCI devices as
>> >> >> well.
>> >> >>
>> >> >> Since the issue exists only for emulated PCI devices whose MMIO
>> >> >> regions are backed by host memory, is there any way we can already
>> >> >> distinguish such memslots from ordinary ones? If we can, is there
>> >> >> anything we could do to treat these specially? Perhaps something like
>> >> >> using read-only memslots so we can at least trap guest writes instead
>> >> >> of having main memory going out of sync with the caches unnoticed? I
>> >> >> am just brainstorming here ...
>> >> >
>> >> > I think the only sensible solution is to make sure that the guest and
>> >> > emulation mappings use the same memory type, either cached or
>> >> > non-cached, and we 'simply' have to find the best way to implement this.
>> >> >
>> >> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the
>> >> > one way.
>> >> >
>> >> > The other way is to use something like what you once wrote that rewrites
>> >> > stage-1 mappings to be cacheable, does that apply here ?
>> >> >
>> >> > Do we have a clear picture of why we'd prefer one way over the other?
>> >> >
>> >>
>> >> So first of all, let me reiterate that I could only find a single
>> >> instance in QEMU where a PCI MMIO region is ba

Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-27 Thread Ard Biesheuvel
On 27 June 2016 at 12:34, Christoffer Dall <christoffer.d...@linaro.org> wrote:
> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>> On 27 June 2016 at 11:16, Christoffer Dall <christoffer.d...@linaro.org> 
>> wrote:
>> > Hi,
>> >
>> > I'm going to ask some stupid questions here...
>> >
>> > On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
>> >> Hi all,
>> >>
>> >> This old subject came up again in a discussion related to PCIe support
>> >> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
>> >> regions as cacheable is preventing us from reusing a significant slice
>> >> of the PCIe support infrastructure, and so I'd like to bring this up
>> >> again, perhaps just to reiterate why we're simply out of luck.
>> >>
>> >> To refresh your memories, the issue is that on ARM, PCI MMIO regions
>> >> for emulated devices may be backed by memory that is mapped cacheable
>> >> by the host. Note that this has nothing to do with the device being
>> >> DMA coherent or not: in this case, we are dealing with regions that
>> >> are not memory from the POV of the guest, and it is reasonable for the
>> >> guest to assume that accesses to such a region are not visible to the
>> >> device before they hit the actual PCI MMIO window and are translated
>> >> into cycles on the PCI bus.
>> >
>> > For the sake of completeness, why is this reasonable?
>> >
>>
>> Because the whole point of accessing these regions is to communicate
>> with the device. It is common to use write combining mappings for
>> things like framebuffers to group writes before they hit the PCI bus,
>> but any caching just makes it more difficult for the driver state and
>> device state to remain synchronized.
>>
>> > Is this how any real ARM system implementing PCI would actually work?
>> >
>>
>> Yes.
>>
>> >> That means that mapping such a region
>> >> cacheable is a strange thing to do, in fact, and it is unlikely that
>> >> patches implementing this against the generic PCI stack in Tianocore
>> >> will be accepted by the maintainers.
>> >>
>> >> Note that this issue not only affects framebuffers on PCI cards, it
>> >> also affects emulated USB host controllers (perhaps Alex can remind us
>> >> which one exactly?) and likely other emulated generic PCI devices as
>> >> well.
>> >>
>> >> Since the issue exists only for emulated PCI devices whose MMIO
>> >> regions are backed by host memory, is there any way we can already
>> >> distinguish such memslots from ordinary ones? If we can, is there
>> >> anything we could do to treat these specially? Perhaps something like
>> >> using read-only memslots so we can at least trap guest writes instead
>> >> of having main memory going out of sync with the caches unnoticed? I
>> >> am just brainstorming here ...
>> >
>> > I think the only sensible solution is to make sure that the guest and
>> > emulation mappings use the same memory type, either cached or
>> > non-cached, and we 'simply' have to find the best way to implement this.
>> >
>> > As Drew suggested, forcing some S2 mappings to be non-cacheable is the
>> > one way.
>> >
>> > The other way is to use something like what you once wrote that rewrites
>> > stage-1 mappings to be cacheable, does that apply here ?
>> >
>> > Do we have a clear picture of why we'd prefer one way over the other?
>> >
>>
>> So first of all, let me reiterate that I could only find a single
>> instance in QEMU where a PCI MMIO region is backed by host memory,
>> which is vga-pci.c. I wonder of there are any other occurrences, but
>> if there aren't any, it makes much more sense to prohibit PCI BARs
>> backed by host memory rather than spend a lot of effort working around
>> it.
>
> Right, ok.  So Marc's point during his KVM Forum talk was basically,
> don't use the legacy VGA adapter on ARM and use virtio graphics, right?
>

Yes. But nothing is preventing you currently from using that, and I
think we should prefer crappy performance but correct operation over
the current situation. So in general, we should either disallow PCI
BARs backed by host memory, or emulate them, but never back them by a
RAM memslot when running under ARM/KVM.

> What is the proposed solution for someone shipping an ARM server

Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-27 Thread Ard Biesheuvel
On 27 June 2016 at 11:16, Christoffer Dall <christoffer.d...@linaro.org> wrote:
> Hi,
>
> I'm going to ask some stupid questions here...
>
> On Fri, Jun 24, 2016 at 04:04:45PM +0200, Ard Biesheuvel wrote:
>> Hi all,
>>
>> This old subject came up again in a discussion related to PCIe support
>> for QEMU/KVM under Tianocore. The fact that we need to map PCI MMIO
>> regions as cacheable is preventing us from reusing a significant slice
>> of the PCIe support infrastructure, and so I'd like to bring this up
>> again, perhaps just to reiterate why we're simply out of luck.
>>
>> To refresh your memories, the issue is that on ARM, PCI MMIO regions
>> for emulated devices may be backed by memory that is mapped cacheable
>> by the host. Note that this has nothing to do with the device being
>> DMA coherent or not: in this case, we are dealing with regions that
>> are not memory from the POV of the guest, and it is reasonable for the
>> guest to assume that accesses to such a region are not visible to the
>> device before they hit the actual PCI MMIO window and are translated
>> into cycles on the PCI bus.
>
> For the sake of completeness, why is this reasonable?
>

Because the whole point of accessing these regions is to communicate
with the device. It is common to use write combining mappings for
things like framebuffers to group writes before they hit the PCI bus,
but any caching just makes it more difficult for the driver state and
device state to remain synchronized.

> Is this how any real ARM system implementing PCI would actually work?
>

Yes.

>> That means that mapping such a region
>> cacheable is a strange thing to do, in fact, and it is unlikely that
>> patches implementing this against the generic PCI stack in Tianocore
>> will be accepted by the maintainers.
>>
>> Note that this issue not only affects framebuffers on PCI cards, it
>> also affects emulated USB host controllers (perhaps Alex can remind us
>> which one exactly?) and likely other emulated generic PCI devices as
>> well.
>>
>> Since the issue exists only for emulated PCI devices whose MMIO
>> regions are backed by host memory, is there any way we can already
>> distinguish such memslots from ordinary ones? If we can, is there
>> anything we could do to treat these specially? Perhaps something like
>> using read-only memslots so we can at least trap guest writes instead
>> of having main memory going out of sync with the caches unnoticed? I
>> am just brainstorming here ...
>
> I think the only sensible solution is to make sure that the guest and
> emulation mappings use the same memory type, either cached or
> non-cached, and we 'simply' have to find the best way to implement this.
>
> As Drew suggested, forcing some S2 mappings to be non-cacheable is the
> one way.
>
> The other way is to use something like what you once wrote that rewrites
> stage-1 mappings to be cacheable, does that apply here ?
>
> Do we have a clear picture of why we'd prefer one way over the other?
>

So first of all, let me reiterate that I could only find a single
instance in QEMU where a PCI MMIO region is backed by host memory,
which is vga-pci.c. I wonder of there are any other occurrences, but
if there aren't any, it makes much more sense to prohibit PCI BARs
backed by host memory rather than spend a lot of effort working around
it.

If we do decide to fix this, the best way would be to use uncached
attributes for the QEMU userland mapping, and force it uncached in the
guest via a stage 2 override (as Drews suggests). The only problem I
see here is that the host's kernel direct mapping has a cached alias
that we need to get rid of. The MAIR hack is just that, a hack, since
there are corner cases that cannot be handled (but please refer to the
old thread for the details)

As for the USB case, I can't really figure out what is going on here,
but I am fairly certain it is a different issue. If this is related to
DMA, I wonder if adding the 'dma-coherent' property to the PCIe root
complex node fixes anything.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-28 Thread Ard Biesheuvel
On 28 June 2016 at 15:10, Catalin Marinas <catalin.mari...@arm.com> wrote:
> On Tue, Jun 28, 2016 at 02:20:43PM +0200, Christoffer Dall wrote:
>> On Tue, Jun 28, 2016 at 01:06:36PM +0200, Laszlo Ersek wrote:
>> > On 06/28/16 12:04, Christoffer Dall wrote:
>> > > On Mon, Jun 27, 2016 at 03:57:28PM +0200, Ard Biesheuvel wrote:
>> > >> So if vga-pci.c is the only problematic device, for which a reasonable
>> > >> alternative exists (virtio-gpu), I think the only feasible solution is
>> > >> to educate QEMU not to allow RAM memslots being exposed via PCI BARs
>> > >> when running under KVM/ARM.
>> > >
>> > > It would be good if we could support vga-pci under KVM/ARM, but if
>> > > there's no other way than rewriting the arm64 kernel's memory mappings
>> > > completely, then probably we're stuck there, unfortunately.
>
> Just to be clear, the behaviour of mismatched memory attributes is
> defined in the ARM ARM and so far Linux worked fine with such cacheable
> vs non-cacheable (as long as only one of them is accessed *or* cache
> maintenance is performed accordingly). I don't think the arm64 kernel
> memory map needs to be rewritten.
>

That would suggest that having an uncached userland mapping in QEMU
and an uncached kernel mapping in the guest would be ok as long as we
don't access the host kernel's cacheable alias?
In that case, Drew's approach would be feasible, and the
pci_register_bar() function in QEMU could be modified to force the
userland mapping and the stage2 mapping to 'device' [when running
under KVM/ARM] if it refers to a memslot that is backed by host
memory.

-- 
Ard.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets

2016-06-30 Thread Ard Biesheuvel
On 30 June 2016 at 12:16, Marc Zyngier  wrote:
> On 30/06/16 10:22, Marc Zyngier wrote:
>> On 28/06/16 13:42, Christoffer Dall wrote:
>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
 As we move towards a selectable HYP VA range, it is obvious that
 we don't want to test a variable to find out if we need to use
 the bottom VA range, the top VA range, or use the address as is
 (for VHE).

 Instead, we can expand our current helpers to generate the right
 mask or nop with code patching. We default to using the top VA
 space, with alternatives to switch to the bottom one or to nop
 out the instructions.

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm64/include/asm/kvm_hyp.h | 27 --
  arch/arm64/include/asm/kvm_mmu.h | 42 
 +---
  2 files changed, 51 insertions(+), 18 deletions(-)

 diff --git a/arch/arm64/include/asm/kvm_hyp.h 
 b/arch/arm64/include/asm/kvm_hyp.h
 index 61d01a9..dd4904b 100644
 --- a/arch/arm64/include/asm/kvm_hyp.h
 +++ b/arch/arm64/include/asm/kvm_hyp.h
 @@ -25,24 +25,21 @@

  #define __hyp_text __section(.hyp.text) notrace

 -static inline unsigned long __kern_hyp_va(unsigned long v)
 -{
 -   asm volatile(ALTERNATIVE("and %0, %0, %1",
 -"nop",
 -ARM64_HAS_VIRT_HOST_EXTN)
 -: "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
 -   return v;
 -}
 -
 -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
 -
  static inline unsigned long __hyp_kern_va(unsigned long v)
  {
 -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
 -"nop",
 +   u64 mask;
 +
 +   asm volatile(ALTERNATIVE("mov %0, %1",
 +"mov %0, %2",
 +ARM64_HYP_OFFSET_LOW)
 +: "=r" (mask)
 +: "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
 +  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
 +   asm volatile(ALTERNATIVE("nop",
 +"mov %0, xzr",
  ARM64_HAS_VIRT_HOST_EXTN)
 -: "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
 -   return v;
 +: "+r" (mask));
 +   return v | mask;
>>>
>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>> address?
>>
>> It has taken be a while, but I think I finally see what you mean. We
>> have no idea whether that bit was set or not.
>>
>>> This is kind of what I asked before only now there's an extra bit not
>>> guaranteed by the architecture to be set for the kernel range, I
>>> think.
>>
>> Yeah, I finally connected the couple of neurons left up there (that's
>> what remains after the whole brexit braindamage). This doesn't work (or
>> rather it only works sometimes). The good new is that I also realized we
>> don't need any of that crap.
>>
>> The only case we currently use a HVA->KVA transformation is to pass the
>> panic string down to panic(), and we can perfectly prevent
>> __kvm_hyp_teardown from ever be evaluated as a HVA with a bit of
>> asm-foo. This allows us to get rid of this whole function.
>
> Here's what I meant by this:
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 437cfad..c19754d 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -299,9 +299,16 @@ static const char __hyp_panic_string[] = "HYP 
> panic:\nPS:%08llx PC:%016llx ESR:%
>
>  static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par)
>  {
> -   unsigned long str_va = (unsigned long)__hyp_panic_string;
> +   unsigned long str_va;
>
> -   __hyp_do_panic(hyp_kern_va(str_va),
> +   /*
> +* Force the panic string to be loaded from the literal pool,
> +* making sure it is a kernel address and not a PC-relative
> +* reference.
> +*/
> +   asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
> +

Wouldn't it suffice to make  __hyp_panic_string a non-static pointer
to const char? That way, it will be statically initialized with a
kernel VA, and the external linkage forces the compiler to evaluate
its value at runtime.


> +   __hyp_do_panic(str_va,
>spsr,  elr,
>read_sysreg(esr_el2),   read_sysreg_el2(far),
>read_sysreg(hpfar_el2), par,
>
> With that in place, we can entirely get rid of hyp_kern_va().
>
> M.
> --
> Jazz is not dead. It just smells funny...
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> 

Re: [PATCH 05/15] arm64: KVM: Refactor kern_hyp_va/hyp_kern_va to deal with multiple offsets

2016-06-30 Thread Ard Biesheuvel
On 30 June 2016 at 13:02, Marc Zyngier <marc.zyng...@arm.com> wrote:
> On 30/06/16 11:42, Ard Biesheuvel wrote:
>> On 30 June 2016 at 12:16, Marc Zyngier <marc.zyng...@arm.com> wrote:
>>> On 30/06/16 10:22, Marc Zyngier wrote:
>>>> On 28/06/16 13:42, Christoffer Dall wrote:
>>>>> On Tue, Jun 07, 2016 at 11:58:25AM +0100, Marc Zyngier wrote:
>>>>>> As we move towards a selectable HYP VA range, it is obvious that
>>>>>> we don't want to test a variable to find out if we need to use
>>>>>> the bottom VA range, the top VA range, or use the address as is
>>>>>> (for VHE).
>>>>>>
>>>>>> Instead, we can expand our current helpers to generate the right
>>>>>> mask or nop with code patching. We default to using the top VA
>>>>>> space, with alternatives to switch to the bottom one or to nop
>>>>>> out the instructions.
>>>>>>
>>>>>> Signed-off-by: Marc Zyngier <marc.zyng...@arm.com>
>>>>>> ---
>>>>>>  arch/arm64/include/asm/kvm_hyp.h | 27 --
>>>>>>  arch/arm64/include/asm/kvm_mmu.h | 42 
>>>>>> +---
>>>>>>  2 files changed, 51 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
>>>>>> b/arch/arm64/include/asm/kvm_hyp.h
>>>>>> index 61d01a9..dd4904b 100644
>>>>>> --- a/arch/arm64/include/asm/kvm_hyp.h
>>>>>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>>>>>> @@ -25,24 +25,21 @@
>>>>>>
>>>>>>  #define __hyp_text __section(.hyp.text) notrace
>>>>>>
>>>>>> -static inline unsigned long __kern_hyp_va(unsigned long v)
>>>>>> -{
>>>>>> -   asm volatile(ALTERNATIVE("and %0, %0, %1",
>>>>>> -"nop",
>>>>>> -ARM64_HAS_VIRT_HOST_EXTN)
>>>>>> -: "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK));
>>>>>> -   return v;
>>>>>> -}
>>>>>> -
>>>>>> -#define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v)))
>>>>>> -
>>>>>>  static inline unsigned long __hyp_kern_va(unsigned long v)
>>>>>>  {
>>>>>> -   asm volatile(ALTERNATIVE("orr %0, %0, %1",
>>>>>> -"nop",
>>>>>> +   u64 mask;
>>>>>> +
>>>>>> +   asm volatile(ALTERNATIVE("mov %0, %1",
>>>>>> +"mov %0, %2",
>>>>>> +ARM64_HYP_OFFSET_LOW)
>>>>>> +: "=r" (mask)
>>>>>> +: "i" (~HYP_PAGE_OFFSET_HIGH_MASK),
>>>>>> +  "i" (~HYP_PAGE_OFFSET_LOW_MASK));
>>>>>> +   asm volatile(ALTERNATIVE("nop",
>>>>>> +"mov %0, xzr",
>>>>>>  ARM64_HAS_VIRT_HOST_EXTN)
>>>>>> -: "+r" (v) : "i" (~HYP_PAGE_OFFSET_MASK));
>>>>>> -   return v;
>>>>>> +: "+r" (mask));
>>>>>> +   return v | mask;
>>>>>
>>>>> If mask is ~HYP_PAGE_OFFSET_LOW_MASK how can you be sure that setting
>>>>> bit (VA_BITS - 1) is always the right thing to do to generate a kernel
>>>>> address?
>>>>
>>>> It has taken be a while, but I think I finally see what you mean. We
>>>> have no idea whether that bit was set or not.
>>>>
>>>>> This is kind of what I asked before only now there's an extra bit not
>>>>> guaranteed by the architecture to be set for the kernel range, I
>>>>> think.
>>>>
>>>> Yeah, I finally connected the couple of neurons left up there (that's
>>>> what remains after the whole brexit braindamage). This doesn't work (or
>>>> rather it only works sometimes). The good new is that I also realized we
>>>> don't need any of that crap.
>>>>
>>>> The only case we currently use a HVA->KVA transformation is to p

Re: issues with emulated PCI MMIO backed by host memory under KVM

2016-06-28 Thread Ard Biesheuvel
On 28 June 2016 at 12:55, Laszlo Ersek <ler...@redhat.com> wrote:
> On 06/27/16 12:34, Christoffer Dall wrote:
>> On Mon, Jun 27, 2016 at 11:47:18AM +0200, Ard Biesheuvel wrote:
>
>>> So first of all, let me reiterate that I could only find a single
>>> instance in QEMU where a PCI MMIO region is backed by host memory,
>>> which is vga-pci.c. I wonder of there are any other occurrences, but
>>> if there aren't any, it makes much more sense to prohibit PCI BARs
>>> backed by host memory rather than spend a lot of effort working around
>>> it.
>>
>> Right, ok.  So Marc's point during his KVM Forum talk was basically,
>> don't use the legacy VGA adapter on ARM and use virtio graphics, right?
>
> The EFI GOP (Graphics Output Protocol) abstraction provides two ways for
> UEFI applications to access the display, and one way for a runtime OS to
> inherit the display hardware from the firmware (without OS native drivers).
>
> (a) For UEFI apps:
> - direct framebuffer access
> - Blt() (block transfer) member function
>
> (b) For runtime OS:
> - direct framebuffer access ("efifb" in Linux)
>
> Virtio-gpu lacks a linear framebuffer by design. Therefore the above
> methods are reduced to the following:
>
> (c) UEFI apps can access virtio-gpu with:
> - GOP.Blt() member function only
>
> (d) The runtime guest OS can access the virtio-gpu device as-inherited
> from the firmware (i.e., without native drivers) with:
> - n/a.
>
> Given that we expect all aarch64 OSes to include native virtio-gpu
> drivers on their install media, (d) is actually not a problem. Whenever
> the OS kernel runs, we except to have no need for "efifb", ever. So
> that's good.
>
> The problem is (c). UEFI boot loaders would have to be taught to call
> GOP.Blt() manually, whenever they need to display something. I'm not
> sure about grub2's current status, but it is free software, so in theory
> it should be doable. However, UEFI windows boot loaders are proprietary
> *and* they require direct framebuffer access (on x86 at least); they
> don't work with Blt()-only. (I found some Microsoft presentations about
> this earlier.)
>
> So, virtio-gpu is an almost universal solution for the problem, but not
> entirely. For any given GOP, offering Blt() *only* (i.e., not exposing a
> linear framebuffer) conforms to the UEFI spec, but some boot loaders are
> known to present further requirements (on x86 anyway).
>

Even if virtio-gpu would expose a linear framebuffer, it would likely
expose it as a PCI BAR, and we would be in the exact same situation.

The only way we can work around this is to emulate a DMA coherent
device that uses a framebuffer in system RAM. I looked at the PL111,
which is already supported both in EDK2 and the Linux kernel, and
would only require minor changes to support DMA coherent devices.
Unfortunately, we would not be able to advertise its presence when
running under ACPI, since it is not a PCI device.

In any case, reconciling software that requires a framebuffer with a
GPU emulation that does not expose one by design is going to be
problematic even without this issue. How is this supposed to work on
x86?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 13/20] ARM: KVM: Implement HVC_RESET_VECTORS stub hypercall

2017-02-19 Thread Ard Biesheuvel
On 17 February 2017 at 15:44, Marc Zyngier  wrote:
> In order to restore HYP mode to its original condition, KVM currently
> implements __kvm_hyp_reset(). As we're moving towards a hyp-stub
> defined API, it becomes necessary to implement HVC_RESET_VECTORS.
>
> This patch adds the HVC_RESET_VECTORS hypercall to the KVM init
> code, which so far lacked any form of hypercall support.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/kernel/hyp-stub.S |  1 +
>  arch/arm/kvm/init.S| 37 +++--
>  2 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
> index cf6d801f89e8..171a09cdf6b3 100644
> --- a/arch/arm/kernel/hyp-stub.S
> +++ b/arch/arm/kernel/hyp-stub.S
> @@ -280,6 +280,7 @@ ENDPROC(__hyp_reset_vectors)
>
>  .align 5
>  __hyp_stub_vectors:
> +.global __hyp_stub_vectors
>  __hyp_stub_reset:  W(b).
>  __hyp_stub_und:W(b).
>  __hyp_stub_svc:W(b).
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index bf89c919efc1..b0138118fac4 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  /
>   * Hypervisor initialization
> @@ -39,6 +40,10 @@
>   * - Setup the page tables
>   * - Enable the MMU
>   * - Profit! (or eret, if you only care about the code).
> + *
> + * Another possibility is to get a HYP stub hypercall.
> + * We discriminate between the two by checking if r0 contains a value
> + * that is less than HVC_STUB_HCALL_NR.
>   */
>
> .text
> @@ -58,6 +63,10 @@ __kvm_hyp_init:
> W(b).
>
>  __do_hyp_init:
> +   @ Check for a stub hypercall
> +   cmp r0, #HVC_STUB_HCALL_NR
> +   blo __kvm_handle_stub_hvc
> +
> @ Set stack pointer
> mov sp, r0
>
> @@ -112,22 +121,38 @@ __do_hyp_init:
>
> eret
>
> -   @ r0 : stub vectors address
> +ENTRY(__kvm_handle_stub_hvc)
> +   cmp r0, #HVC_RESET_VECTORS
> +   bne 1f
>  ENTRY(__kvm_hyp_reset)
> /* We're now in idmap, disable MMU */
> mrc p15, 4, r1, c1, c0, 0   @ HSCTLR
> -   ldr r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I)
> -   bic r1, r1, r2
> +   ldr r0, =(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I)
> +   bic r1, r1, r0
> mcr p15, 4, r1, c1, c0, 0   @ HSCTLR
>
> -   /* Install stub vectors */
> -   mcr p15, 4, r0, c12, c0, 0  @ HVBAR
> -   isb
> +   /*
> +* Install stub vectors. We cannot use 'adr' to get to the
> +* stub vectors, hence having to play the VA->PA game.
> +*/
> +   adr r0, .L__va2pa   @ PA
> +   ldr r1, [r0]@ VA
> +   sub r0, r0, r1  @ PA - VA
> +   ldr r1, =__hyp_stub_vectors

Since we're guaranteed to be on v7, how about something like
0:adr r0, 0b
  movw r1, #:lower16:__hyp_stub_vectors - 0b
  movt r1, #:upper16:__hyp_stub_vectors - 0b

> +   add r1, r1, r0
> +   mcr p15, 4, r1, c12, c0, 0  @ HVBAR
> +   b   exit
> +
> +1: mov r0, #-1
>
> +exit:
> eret
> +ENDPROC(__kvm_handle_stub_hvc)
>  ENDPROC(__kvm_hyp_reset)
>
> .ltorg
> +.L__va2pa:
> +   .word   .
>
> .globl __kvm_hyp_init_end
>  __kvm_hyp_init_end:
> --
> 2.11.0
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 13/20] ARM: KVM: Implement HVC_RESET_VECTORS stub hypercall

2017-02-19 Thread Ard Biesheuvel
On 17 February 2017 at 15:44, Marc Zyngier  wrote:
> In order to restore HYP mode to its original condition, KVM currently
> implements __kvm_hyp_reset(). As we're moving towards a hyp-stub
> defined API, it becomes necessary to implement HVC_RESET_VECTORS.
>
> This patch adds the HVC_RESET_VECTORS hypercall to the KVM init
> code, which so far lacked any form of hypercall support.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/kernel/hyp-stub.S |  1 +
>  arch/arm/kvm/init.S| 37 +++--
>  2 files changed, 32 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
> index cf6d801f89e8..171a09cdf6b3 100644
> --- a/arch/arm/kernel/hyp-stub.S
> +++ b/arch/arm/kernel/hyp-stub.S
> @@ -280,6 +280,7 @@ ENDPROC(__hyp_reset_vectors)
>
>  .align 5
>  __hyp_stub_vectors:
> +.global __hyp_stub_vectors

Oh, and (nit:) perhaps use

ENTRY(__hyp_stub_vectors)

here?

>  __hyp_stub_reset:  W(b).
>  __hyp_stub_und:W(b).
>  __hyp_stub_svc:W(b).
> diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S
> index bf89c919efc1..b0138118fac4 100644
> --- a/arch/arm/kvm/init.S
> +++ b/arch/arm/kvm/init.S
> @@ -23,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  /
>   * Hypervisor initialization
> @@ -39,6 +40,10 @@
>   * - Setup the page tables
>   * - Enable the MMU
>   * - Profit! (or eret, if you only care about the code).
> + *
> + * Another possibility is to get a HYP stub hypercall.
> + * We discriminate between the two by checking if r0 contains a value
> + * that is less than HVC_STUB_HCALL_NR.
>   */
>
> .text
> @@ -58,6 +63,10 @@ __kvm_hyp_init:
> W(b).
>
>  __do_hyp_init:
> +   @ Check for a stub hypercall
> +   cmp r0, #HVC_STUB_HCALL_NR
> +   blo __kvm_handle_stub_hvc
> +
> @ Set stack pointer
> mov sp, r0
>
> @@ -112,22 +121,38 @@ __do_hyp_init:
>
> eret
>
> -   @ r0 : stub vectors address
> +ENTRY(__kvm_handle_stub_hvc)
> +   cmp r0, #HVC_RESET_VECTORS
> +   bne 1f
>  ENTRY(__kvm_hyp_reset)
> /* We're now in idmap, disable MMU */
> mrc p15, 4, r1, c1, c0, 0   @ HSCTLR
> -   ldr r2, =(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I)
> -   bic r1, r1, r2
> +   ldr r0, =(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I)
> +   bic r1, r1, r0
> mcr p15, 4, r1, c1, c0, 0   @ HSCTLR
>
> -   /* Install stub vectors */
> -   mcr p15, 4, r0, c12, c0, 0  @ HVBAR
> -   isb
> +   /*
> +* Install stub vectors. We cannot use 'adr' to get to the
> +* stub vectors, hence having to play the VA->PA game.
> +*/
> +   adr r0, .L__va2pa   @ PA
> +   ldr r1, [r0]@ VA
> +   sub r0, r0, r1  @ PA - VA
> +   ldr r1, =__hyp_stub_vectors
> +   add r1, r1, r0
> +   mcr p15, 4, r1, c12, c0, 0  @ HVBAR
> +   b   exit
> +
> +1: mov r0, #-1
>
> +exit:
> eret
> +ENDPROC(__kvm_handle_stub_hvc)
>  ENDPROC(__kvm_hyp_reset)
>
> .ltorg
> +.L__va2pa:
> +   .word   .
>
> .globl __kvm_hyp_init_end
>  __kvm_hyp_init_end:
> --
> 2.11.0
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 10/20] ARM: hyp-stub: Use r1 for the soft-restart address

2017-02-19 Thread Ard Biesheuvel
On 17 February 2017 at 15:44, Marc Zyngier  wrote:
> It is not really obvious why the restart address should be in r3
> when communicated to the hyp-stub. r1 should be perfectly adequate,
> and consistent with the rest of the code.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/kernel/hyp-stub.S | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
> index 8301db963d83..8fa521bd63d2 100644
> --- a/arch/arm/kernel/hyp-stub.S
> +++ b/arch/arm/kernel/hyp-stub.S
> @@ -214,7 +214,7 @@ __hyp_stub_do_trap:
>
>  1: teq r0, #HVC_SOFT_RESTART
> bne 1f
> -   bx  r3
> +   bx  r1
>
>  1: mov r0, #-1
>
> @@ -258,10 +258,8 @@ ENTRY(__hyp_set_vectors)
>  ENDPROC(__hyp_set_vectors)
>
>  ENTRY(__hyp_soft_restart)
> -   mov r3, r0
> mov r0, #HVC_SOFT_RESTART
> __HVC(0)
> -   mov r0, r3
> ret lr
>  ENDPROC(__hyp_soft_restart)
>

/me confused. How does the address end up in r1?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 5/5] arm64: mmu: apply strict permissions to .init.text and .init.data

2017-02-11 Thread Ard Biesheuvel
To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 ++-
 arch/arm64/kernel/vmlinux.lds.S   | 25 +---
 arch/arm64/mm/mmu.c   | 12 ++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h 
b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], 
__hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
+   __inittext_begin = .;
 
INIT_TEXT_SECTION(8)
.exit.text : {
ARM_EXIT_KEEP(EXIT_TEXT)
}
 
+   . = ALIGN(4);
+   .altinstructions : {
+   __alt_instructions = .;
+   *(.altinstructions)
+   __alt_instructions_end = .;
+   }
+   .altinstr_replacement : {
+   *(.altinstr_replacement)
+   }
+
+   . = ALIGN(PAGE_SIZE);
+   __inittext_end = .;
+   __initdata_begin = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
PERCPU_SECTION(L1_CACHE_BYTES)
 
-   . = ALIGN(4);
-   .altinstructions : {
-   __alt_instructions = .;
-   *(.altinstructions)
-   __alt_instructions_end = .;
-   }
-   .altinstr_replacement : {
-   *(.altinstr_replacement)
-   }
.rela : ALIGN(8) {
*(.rela .rela*)
}
@@ -181,6 +187,7 @@ SECTIONS
__rela_size = SIZEOF(.rela);
 
. = ALIGN(SEGMENT_ALIGN);
+   __initdata_end = .;
__init_end = .;
 
_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e97f1ce967ec..c53c43b4ed3f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -479,12 +479,16 @@ static void __init map_kernel_segment(pgd_t *pgd, void 
*va_start, void *va_end,
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
+   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+   vmlinux_initdata, vmlinux_data;
 
map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_ROX, _text);
-   map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
-   map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-  _init);
+   map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+  _rodata);
+   map_kernel_segment(pgd, __inittext_begin, __inittext_end, 
PAGE_KERNEL_ROX,
+  _inittext);
+   map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+  _initdata);
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, _data);
 
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 0/5] arm64: mmu: avoid writeable-executable mappings

2017-02-11 Thread Ard Biesheuvel
Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Changes since v1:
- add patch to move TLB maintenance into create_mapping_late() and remove it
  from its callers (#2)
- use the true address not the linear alias when patching branch instructions,
  spotted by Suzuki (#3)
- mark mark_linear_text_alias_ro() __init (#3)
- move the .rela section back into __initdata: as it turns out, leaving a hole
  between the segments results in a peculiar situation where other unrelated
  allocations end up right in the middle of the kernel Image, which is
  probably a bad idea (#5). See below for an example.
- add acks

Ard Biesheuvel (5):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data

 arch/arm64/include/asm/mmu.h  |  1 +
 arch/arm64/include/asm/sections.h |  3 +-
 arch/arm64/kernel/alternative.c   |  2 +-
 arch/arm64/kernel/smp.c   |  1 +
 arch/arm64/kernel/vmlinux.lds.S   | 25 +++
 arch/arm64/mm/mmu.c   | 45 +---
 virt/kvm/arm/vgic/vgic.c  |  4 +-
 7 files changed, 53 insertions(+), 28 deletions(-)

-- 
2.7.4

The various kernel segments are vmapped from paging_init() [after inlining]

0xff800808-0xff80088b 8585216 paging_init+0x84/0x584 
phys=4008 vmap
0xff80088b-0xff8008cb 4194304 paging_init+0xa4/0x584 
phys=408b vmap
0xff8008cb-0xff8008d27000  487424 paging_init+0xc4/0x584 
phys=40cb vmap
0xff8008d27000-0xff8008da3000  507904 paging_init+0xe8/0x584 
phys=40d27000 vmap
0xff8008dd1000-0xff8008dd30008192 devm_ioremap_nocache+0x54/0xa8 
phys=a003000 ioremap
0xff8008dd3000-0xff8008dd50008192 devm_ioremap_nocache+0x54/0xa8 
phys=a003000 ioremap
0xff8008dde000-0xff8008de8192 pl031_probe+0x80/0x1e8 
phys=901 ioremap
0xff8008e4c000-0xff8008e5   16384 n_tty_open+0x1c/0xd0 pages=3 
vmalloc
0xff8008e54000-0xff8008e58000   16384 n_tty_open+0x1c/0xd0 pages=3 
vmalloc
0xff8008e8-0xff8008e84000   16384 n_tty_open+0x1c/0xd0 pages=3 
vmalloc
0xff8008e84000-0xff8008e88000   16384 n_tty_open+0x1c/0xd0 pages=3 
vmalloc
0xff8008ea-0xff8008ea20008192 bpf_prog_alloc+0x3c/0xb8 pages=1 
vmalloc
0xff8008ef2000-0xff8008ef6000   16384 n_tty_open+0x1c/0xd0 pages=3 
vmalloc
0xff8008ef6000-0xff8008efa000   16384 n_tty_open+0x1c/0xd0 pages=3 
vmalloc
0xff800901-0xff800914b000 1290240 paging_init+0x10c/0x584 
phys=4101 vmap
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/5] arm: kvm: move kvm_vgic_global_state out of .text section

2017-02-11 Thread Ard Biesheuvel
The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyng...@arm.com>
Reviewed-by: Laura Abbott <labb...@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 6440b56ec90e..2f373455ed4e 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif 
= STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+   .gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/5] arm64: mmu: move TLB maintenance from callers to create_mapping_late()

2017-02-11 Thread Ard Biesheuvel
In preparation of changing the way we invoke create_mapping_late() (which
is currently invoked twice from the same function), move the TLB flushing
it performs from the caller into create_mapping_late() itself, and change
it to a TLB maintenance by VA rather than a full flush, which is more
appropriate here.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2131521ddc24..9e0ec1a8cd3b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -356,6 +356,9 @@ static void create_mapping_late(phys_addr_t phys, unsigned 
long virt,
 
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 NULL, debug_pagealloc_enabled());
+
+   /* flush the TLBs after updating live kernel mappings */
+   flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t 
end)
@@ -438,9 +441,6 @@ void mark_rodata_ro(void)
create_mapping_late(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
section_size, PAGE_KERNEL_RO);
 
-   /* flush the TLBs after updating live kernel mappings */
-   flush_tlb_all();
-
debug_checkwx();
 }
 
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 3/5] arm64: alternatives: apply boot time fixups via the linear mapping

2017-02-11 Thread Ard Biesheuvel
One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/mmu.h|  1 +
 arch/arm64/kernel/alternative.c |  2 +-
 arch/arm64/kernel/smp.c |  1 +
 arch/arm64/mm/mmu.c | 22 +++-
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   unsigned long virt, phys_addr_t size,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8cee29d9bc07 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -128,7 +128,7 @@ static void __apply_alternatives(void *alt_region)
 
for (i = 0; i < nr_inst; i++) {
insn = get_alt_insn(alt, origptr + i, replptr + i);
-   *(origptr + i) = cpu_to_le32(insn);
+   ((u32 *)lm_alias(origptr))[i] = cpu_to_le32(insn);
}
 
flush_icache_range((uintptr_t)origptr,
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index a8ec5da530af..d6307e311a10 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
setup_cpu_features();
hyp_mode_check();
apply_alternatives_all();
+   mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9e0ec1a8cd3b..7ed981c7f4c0 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -398,16 +398,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t 
start, phys_addr_t end
 debug_pagealloc_enabled());
 
/*
-* Map the linear alias of the [_text, __init_begin) interval as
-* read-only/non-executable. This makes the contents of the
-* region accessible to subsystems such as hibernate, but
-* protects it from inadvertent modification or execution.
+* Map the linear alias of the [_text, __init_begin) interval
+* as non-executable now, and remove the write permission in
+* mark_linear_text_alias_ro() below (which will be called after
+* alternative patching has completed). This makes the contents
+* of the region accessible to subsystems such as hibernate,
+* but protects it from inadvertent modification or execution.
 */
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
-kernel_end - kernel_start, PAGE_KERNEL_RO,
+kernel_end - kernel_start, PAGE_KERNEL,
 early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
+void __init mark_linear_text_alias_ro(void)
+{
+   /*
+* Remove the write permissions from the linear alias of .text/.rodata
+*/
+   create_mapping_late(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+   (unsigned long)__init_begin - (unsigned long)_text,
+   PAGE_KERNEL_RO);
+}
+
 static void __init map_mem(pgd_t *pgd)
 {
struct memblock_region *reg;
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 4/5] arm64: mmu: map .text as read-only from the outset

2017-02-11 Thread Ard Biesheuvel
Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Kees Cook <keesc...@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7ed981c7f4c0..e97f1ce967ec 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -442,9 +442,6 @@ void mark_rodata_ro(void)
 {
unsigned long section_size;
 
-   section_size = (unsigned long)_etext - (unsigned long)_text;
-   create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
-   section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
@@ -484,7 +481,7 @@ static void __init map_kernel(pgd_t *pgd)
 {
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
 
-   map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, _text);
+   map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_ROX, _text);
map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
   _init);
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/4] arm64: alternatives: apply boot time fixups via the linear mapping

2017-02-10 Thread Ard Biesheuvel

> On 10 Feb 2017, at 18:49, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:
> 
>> On 10/02/17 17:16, Ard Biesheuvel wrote:
>> One important rule of thumb when designing a secure software system is
>> that memory should never be writable and executable at the same time.
>> We mostly adhere to this rule in the kernel, except at boot time, when
>> regions may be mapped RWX until after we are done applying alternatives
>> or making other one-off changes.
>> 
>> For the alternative patching, we can improve the situation by applying
>> the fixups via the linear mapping, which is never mapped with executable
>> permissions. So map the linear alias of .text with RW- permissions
>> initially, and remove the write permissions as soon as alternative
>> patching has completed.
>> 
>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> ---
>> arch/arm64/include/asm/mmu.h|  1 +
>> arch/arm64/kernel/alternative.c |  6 ++---
>> arch/arm64/kernel/smp.c |  1 +
>> arch/arm64/mm/mmu.c | 25 
>> 4 files changed, 25 insertions(+), 8 deletions(-)
>> 
>> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
>> index 47619411f0ff..5468c834b072 100644
>> --- a/arch/arm64/include/asm/mmu.h
>> +++ b/arch/arm64/include/asm/mmu.h
>> @@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
>> phys_addr_t phys,
>>   unsigned long virt, phys_addr_t size,
>>   pgprot_t prot, bool page_mappings_only);
>> extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
>> +extern void mark_linear_text_alias_ro(void);
>> 
>> #endif
>> diff --git a/arch/arm64/kernel/alternative.c 
>> b/arch/arm64/kernel/alternative.c
>> index 06d650f61da7..eacdbcc45630 100644
>> --- a/arch/arm64/kernel/alternative.c
>> +++ b/arch/arm64/kernel/alternative.c
>> @@ -122,7 +122,7 @@ static void __apply_alternatives(void *alt_region)
>> 
>>pr_info_once("patching kernel code\n");
>> 
>> -origptr = ALT_ORIG_PTR(alt);
>> +origptr = lm_alias(ALT_ORIG_PTR(alt));
>>replptr = ALT_REPL_PTR(alt);
>>nr_inst = alt->alt_len / sizeof(insn);
> 
> Correct me if I am wrong, I think this would make "get_alt_insn" generate 
> branch
> instructions based on the  aliased linear mapped address, which could branch 
> to linear
> address of the branch target which doesn't have Execute permissions set.
> I think we sould use ALT_ORIG_PTR(alt), instead of origptr for the calls to
> get_alt_insn().
> 

Good point, you are probably right.

Will fix
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/4] arm64: mmu: avoid writeable-executable mappings

2017-02-10 Thread Ard Biesheuvel
Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Ard Biesheuvel (4):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data

 arch/arm64/include/asm/mmu.h  |  1 +
 arch/arm64/include/asm/sections.h |  3 +-
 arch/arm64/kernel/alternative.c   |  6 +--
 arch/arm64/kernel/smp.c   |  1 +
 arch/arm64/kernel/vmlinux.lds.S   | 32 ++-
 arch/arm64/mm/init.c  |  3 +-
 arch/arm64/mm/mmu.c   | 42 ++--
 virt/kvm/arm/vgic/vgic.c  |  4 +-
 8 files changed, 64 insertions(+), 28 deletions(-)

-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/4] arm64: alternatives: apply boot time fixups via the linear mapping

2017-02-10 Thread Ard Biesheuvel
One important rule of thumb when designing a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/mmu.h|  1 +
 arch/arm64/kernel/alternative.c |  6 ++---
 arch/arm64/kernel/smp.c |  1 +
 arch/arm64/mm/mmu.c | 25 
 4 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   unsigned long virt, phys_addr_t size,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..eacdbcc45630 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -122,7 +122,7 @@ static void __apply_alternatives(void *alt_region)
 
pr_info_once("patching kernel code\n");
 
-   origptr = ALT_ORIG_PTR(alt);
+   origptr = lm_alias(ALT_ORIG_PTR(alt));
replptr = ALT_REPL_PTR(alt);
nr_inst = alt->alt_len / sizeof(insn);
 
@@ -131,8 +131,8 @@ static void __apply_alternatives(void *alt_region)
*(origptr + i) = cpu_to_le32(insn);
}
 
-   flush_icache_range((uintptr_t)origptr,
-  (uintptr_t)(origptr + nr_inst));
+   flush_icache_range((uintptr_t)ALT_ORIG_PTR(alt),
+  (uintptr_t)(ALT_ORIG_PTR(alt) + nr_inst));
}
 }
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index a8ec5da530af..d6307e311a10 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
setup_cpu_features();
hyp_mode_check();
apply_alternatives_all();
+   mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2131521ddc24..f4b045d1cc53 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -395,16 +395,31 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t 
start, phys_addr_t end
 debug_pagealloc_enabled());
 
/*
-* Map the linear alias of the [_text, __init_begin) interval as
-* read-only/non-executable. This makes the contents of the
-* region accessible to subsystems such as hibernate, but
-* protects it from inadvertent modification or execution.
+* Map the linear alias of the [_text, __init_begin) interval
+* as non-executable now, and remove the write permission in
+* mark_linear_text_alias_ro() below (which will be called after
+* alternative patching has completed). This makes the contents
+* of the region accessible to subsystems such as hibernate,
+* but protects it from inadvertent modification or execution.
 */
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
-kernel_end - kernel_start, PAGE_KERNEL_RO,
+kernel_end - kernel_start, PAGE_KERNEL,
 early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
+void mark_linear_text_alias_ro(void)
+{
+   /*
+* Remove the write permissions from the linear alias of .text/.rodata
+*/
+   create_mapping_late(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+   (unsigned long)__init_begin - (unsigned long)_text,
+   PAGE_KERNEL_RO);
+
+   /* flush the TLBs after updating live kernel mappings */
+   flush_tlb_all();
+}
+
 static void __init map_mem(pgd_t *pgd)
 {
struct memblock_region *reg;
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 3/4] arm64: mmu: map .text as read-only from the outset

2017-02-10 Thread Ard Biesheuvel
Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f4b045d1cc53..5b0dbb9156ce 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -442,9 +442,6 @@ void mark_rodata_ro(void)
 {
unsigned long section_size;
 
-   section_size = (unsigned long)_etext - (unsigned long)_text;
-   create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
-   section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
@@ -487,7 +484,7 @@ static void __init map_kernel(pgd_t *pgd)
 {
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
 
-   map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, _text);
+   map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_ROX, _text);
map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
   _init);
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 4/4] arm64: mmu: apply strict permissions to .init.text and .init.data

2017-02-10 Thread Ard Biesheuvel
To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region. The .rela section does not have to be
mapped at all after applying the relocations, so drop that from the init
mapping entirely.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 +-
 arch/arm64/kernel/vmlinux.lds.S   | 32 ++--
 arch/arm64/mm/init.c  |  3 +-
 arch/arm64/mm/mmu.c   | 12 +---
 4 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h 
b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], 
__hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..fa144d16bc91 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
+   __inittext_begin = .;
 
INIT_TEXT_SECTION(8)
.exit.text : {
ARM_EXIT_KEEP(EXIT_TEXT)
}
 
+   . = ALIGN(4);
+   .altinstructions : {
+   __alt_instructions = .;
+   *(.altinstructions)
+   __alt_instructions_end = .;
+   }
+   .altinstr_replacement : {
+   *(.altinstr_replacement)
+   }
+
+   . = ALIGN(PAGE_SIZE);
+   __inittext_end = .;
+   __initdata_begin = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
@@ -164,15 +179,14 @@ SECTIONS
 
PERCPU_SECTION(L1_CACHE_BYTES)
 
-   . = ALIGN(4);
-   .altinstructions : {
-   __alt_instructions = .;
-   *(.altinstructions)
-   __alt_instructions_end = .;
-   }
-   .altinstr_replacement : {
-   *(.altinstr_replacement)
-   }
+   . = ALIGN(PAGE_SIZE);
+   __initdata_end = .;
+
+   /*
+* The .rela section is not covered by __inittext or __initdata since
+* there is no reason to keep it mapped when we switch to the permanent
+* swapper page tables.
+*/
.rela : ALIGN(8) {
*(.rela .rela*)
}
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 8a2713018f2f..6a55feaf46c8 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -493,7 +493,8 @@ void free_initmem(void)
 * prevents the region from being reused for kernel modules, which
 * is not supported by kallsyms.
 */
-   unmap_kernel_range((u64)__init_begin, (u64)(__init_end - __init_begin));
+   unmap_kernel_range((u64)__inittext_begin,
+  (u64)(__initdata_end - __inittext_begin));
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5b0dbb9156ce..e6a4bf2acd59 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -482,12 +482,16 @@ static void __init map_kernel_segment(pgd_t *pgd, void 
*va_start, void *va_end,
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
+   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+   vmlinux_initdata, vmlinux_data;
 
map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_ROX, _text);
-   map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
-   map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-  _init);
+   map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+  _rodata);
+   map_kernel_segment(pgd, __inittext_begin, __inittext_end, 
PAGE_KERNEL_ROX,
+  _inittext);
+   map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+  _initdata);
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, _data);
 
if (!p

Re: [PATCH v2 4/5] arm64: mmu: map .text as read-only from the outset

2017-02-14 Thread Ard Biesheuvel

> On 14 Feb 2017, at 15:57, Mark Rutland <mark.rutl...@arm.com> wrote:
> 
>> On Sat, Feb 11, 2017 at 08:23:05PM +0000, Ard Biesheuvel wrote:
>> Now that alternatives patching code no longer relies on the primary
>> mapping of .text being writable, we can remove the code that removes
>> the writable permissions post-init time, and map it read-only from
>> the outset.
>> 
>> Reviewed-by: Laura Abbott <labb...@redhat.com>
>> Reviewed-by: Kees Cook <keesc...@chromium.org>
>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> 
> This generally looks good.
> 
> One effect of this is that even with rodata=off, external debuggers
> can't install SW breakpoints via the executable mapping.
> 

Interesting. For the sake of my education, could you elaborate on how that 
works under the hood?

> We might want to allow that to be overridden. e.g. make rodata= an
> early param, and switch the permissions based on that in map_kernel(),
> e.g. have:
> 
>pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX
>: PAGE_KERNEL_EXEC);
> 
> ... and use that for .text and .init.text by default.
> 
> 

Is there any way we could restrict this privilege to external debuggers? Having 
trivial 'off' switches for security features makes me feel uneasy (although 
this is orthogonal to this patch)
>> ---
>> arch/arm64/mm/mmu.c | 5 +
>> 1 file changed, 1 insertion(+), 4 deletions(-)
>> 
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 7ed981c7f4c0..e97f1ce967ec 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -442,9 +442,6 @@ void mark_rodata_ro(void)
>> {
>>unsigned long section_size;
>> 
>> -section_size = (unsigned long)_etext - (unsigned long)_text;
>> -create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
>> -section_size, PAGE_KERNEL_ROX);
>>/*
>> * mark .rodata as read only. Use __init_begin rather than __end_rodata
>> * to cover NOTES and EXCEPTION_TABLE.
>> @@ -484,7 +481,7 @@ static void __init map_kernel(pgd_t *pgd)
>> {
>>static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
>> vmlinux_data;
>> 
>> -map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, _text);
>> +map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_ROX, _text);
>>map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
>> _rodata);
>>map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
>>   _init);
>> -- 
>> 2.7.4
>> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 4/5] arm64: mmu: map .text as read-only from the outset

2017-02-14 Thread Ard Biesheuvel

> On 14 Feb 2017, at 17:40, Mark Rutland <mark.rutl...@arm.com> wrote:
> 
>> On Tue, Feb 14, 2017 at 04:15:11PM +0000, Ard Biesheuvel wrote:
>> 
>>>> On 14 Feb 2017, at 15:57, Mark Rutland <mark.rutl...@arm.com> wrote:
>>>> 
>>>> On Sat, Feb 11, 2017 at 08:23:05PM +, Ard Biesheuvel wrote:
>>>> Now that alternatives patching code no longer relies on the primary
>>>> mapping of .text being writable, we can remove the code that removes
>>>> the writable permissions post-init time, and map it read-only from
>>>> the outset.
>>>> 
>>>> Reviewed-by: Laura Abbott <labb...@redhat.com>
>>>> Reviewed-by: Kees Cook <keesc...@chromium.org>
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>>> 
>>> This generally looks good.
>>> 
>>> One effect of this is that even with rodata=off, external debuggers
>>> can't install SW breakpoints via the executable mapping.
>> 
>> Interesting. For the sake of my education, could you elaborate on how
>> that works under the hood?
> 
> There are details in ARM DDI 0487A.k_iss10775, Chapter H1, "About
> External Debug", page H1-4839 onwards. Otherwise, executive summary
> below.
> 
> An external debugger can place a CPU into debug state. This is
> orthogonal to execution state and exception level, which are unchanged.
> While in this state, the CPU (only) executes instructions fed to it by
> the debugger through a special register.
> 
> To install a SW breakpoint, the debugger makes the CPU enter debug
> state, then issues regular stores, barriers, and cache maintenance.
> These operate in the current execution state at the current EL, using
> the current translation regime.
> 
> The external debugger can also trap exceptions (e.g. those caused by the
> SW breakpoint). The CPU enters debug state when these are trapped.
> 

OK, thanks for the explanation

>>> We might want to allow that to be overridden. e.g. make rodata= an
>>> early param, and switch the permissions based on that in map_kernel(),
>>> e.g. have:
>>> 
>>>   pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX
>>>   : PAGE_KERNEL_EXEC);
>>> 
>>> ... and use that for .text and .init.text by default.
>>> 
>>> 
>> 
>> Is there any way we could restrict this privilege to external
>> debuggers?
> 
> My understanding is that we cannot.
> 
>> Having trivial 'off' switches for security features makes me feel
>> uneasy (although this is orthogonal to this patch)
> 
> From my PoV, external debuggers are the sole reason to allow rodata=off
> for arm64, and we already allow rodata=off.
> 
> 

Indeed. If that is how it works currently, we shouldn't interfere with it. If 
we ever get anywhere with the lockdown patches, we should blacklist this 
parameter (or rather, not whitelist it, since blacklisting kernel params to 
enforce security is infeasible imo)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/5] arm64: mmu: move TLB maintenance from callers to create_mapping_late()

2017-02-14 Thread Ard Biesheuvel
In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.

Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()

Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2131521ddc24..a98419b72a09 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -345,17 +345,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
 pgd_pgtable_alloc, page_mappings_only);
 }
 
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
- phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+   phys_addr_t size, pgprot_t prot)
 {
if (virt < VMALLOC_START) {
-   pr_warn("BUG: not creating mapping for %pa at 0x%016lx - 
outside kernel range\n",
+   pr_warn("BUG: not updating mapping for %pa at 0x%016lx - 
outside kernel range\n",
, virt);
return;
}
 
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 NULL, debug_pagealloc_enabled());
+
+   /* flush the TLBs after updating live kernel mappings */
+   flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t 
end)
@@ -428,19 +431,16 @@ void mark_rodata_ro(void)
unsigned long section_size;
 
section_size = (unsigned long)_etext - (unsigned long)_text;
-   create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+   update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
 */
section_size = (unsigned long)__init_begin - (unsigned 
long)__start_rodata;
-   create_mapping_late(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
+   update_mapping_prot(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
section_size, PAGE_KERNEL_RO);
 
-   /* flush the TLBs after updating live kernel mappings */
-   flush_tlb_all();
-
debug_checkwx();
 }
 
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 3/5] arm64: alternatives: apply boot time fixups via the linear mapping

2017-02-14 Thread Ard Biesheuvel
One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/mmu.h|  1 +
 arch/arm64/kernel/alternative.c |  2 +-
 arch/arm64/kernel/smp.c |  1 +
 arch/arm64/mm/mmu.c | 22 +++-
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   unsigned long virt, phys_addr_t size,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8cee29d9bc07 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -128,7 +128,7 @@ static void __apply_alternatives(void *alt_region)
 
for (i = 0; i < nr_inst; i++) {
insn = get_alt_insn(alt, origptr + i, replptr + i);
-   *(origptr + i) = cpu_to_le32(insn);
+   ((u32 *)lm_alias(origptr))[i] = cpu_to_le32(insn);
}
 
flush_icache_range((uintptr_t)origptr,
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index a8ec5da530af..d6307e311a10 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
setup_cpu_features();
hyp_mode_check();
apply_alternatives_all();
+   mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a98419b72a09..b7ce0b9ad096 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -398,16 +398,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t 
start, phys_addr_t end
 debug_pagealloc_enabled());
 
/*
-* Map the linear alias of the [_text, __init_begin) interval as
-* read-only/non-executable. This makes the contents of the
-* region accessible to subsystems such as hibernate, but
-* protects it from inadvertent modification or execution.
+* Map the linear alias of the [_text, __init_begin) interval
+* as non-executable now, and remove the write permission in
+* mark_linear_text_alias_ro() below (which will be called after
+* alternative patching has completed). This makes the contents
+* of the region accessible to subsystems such as hibernate,
+* but protects it from inadvertent modification or execution.
 */
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
-kernel_end - kernel_start, PAGE_KERNEL_RO,
+kernel_end - kernel_start, PAGE_KERNEL,
 early_pgtable_alloc, debug_pagealloc_enabled());
 }
 
+void __init mark_linear_text_alias_ro(void)
+{
+   /*
+* Remove the write permissions from the linear alias of .text/.rodata
+*/
+   update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
+   (unsigned long)__init_begin - (unsigned long)_text,
+   PAGE_KERNEL_RO);
+}
+
 static void __init map_mem(pgd_t *pgd)
 {
struct memblock_region *reg;
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 5/5] arm64: mmu: apply strict permissions to .init.text and .init.data

2017-02-14 Thread Ard Biesheuvel
To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 ++-
 arch/arm64/kernel/vmlinux.lds.S   | 25 +---
 arch/arm64/mm/mmu.c   | 12 ++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h 
b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], 
__hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
+   __inittext_begin = .;
 
INIT_TEXT_SECTION(8)
.exit.text : {
ARM_EXIT_KEEP(EXIT_TEXT)
}
 
+   . = ALIGN(4);
+   .altinstructions : {
+   __alt_instructions = .;
+   *(.altinstructions)
+   __alt_instructions_end = .;
+   }
+   .altinstr_replacement : {
+   *(.altinstr_replacement)
+   }
+
+   . = ALIGN(PAGE_SIZE);
+   __inittext_end = .;
+   __initdata_begin = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
PERCPU_SECTION(L1_CACHE_BYTES)
 
-   . = ALIGN(4);
-   .altinstructions : {
-   __alt_instructions = .;
-   *(.altinstructions)
-   __alt_instructions_end = .;
-   }
-   .altinstr_replacement : {
-   *(.altinstr_replacement)
-   }
.rela : ALIGN(8) {
*(.rela .rela*)
}
@@ -181,6 +187,7 @@ SECTIONS
__rela_size = SIZEOF(.rela);
 
. = ALIGN(SEGMENT_ALIGN);
+   __initdata_end = .;
__init_end = .;
 
_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 70a492b36fe7..cb9f2716c4b6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -485,14 +485,18 @@ early_param("rodata", parse_rodata);
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
+   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+   vmlinux_initdata, vmlinux_data;
 
pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
 
map_kernel_segment(pgd, _text, _etext, text_prot, _text);
-   map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
-   map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-  _init);
+   map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+  _rodata);
+   map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+  _inittext);
+   map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+  _initdata);
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, _data);
 
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 0/5] arm64: mmu: avoid writeable-executable mappings

2017-02-14 Thread Ard Biesheuvel
Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Changes since v2:
  - ensure that text mappings remain writable under rodata=off
  - rename create_mapping_late() to update_mapping_prot()
  - clarify commit log of #2
  - add acks

Changes since v1:
- add patch to move TLB maintenance into create_mapping_late() and remove it
  from its callers (#2)
- use the true address not the linear alias when patching branch instructions,
  spotted by Suzuki (#3)
- mark mark_linear_text_alias_ro() __init (#3)
- move the .rela section back into __initdata: as it turns out, leaving a hole
  between the segments results in a peculiar situation where other unrelated
  allocations end up right in the middle of the kernel Image, which is
  probably a bad idea (#5). See below for an example.
- add acks

Ard Biesheuvel (5):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data

 arch/arm64/include/asm/mmu.h  |  1 +
 arch/arm64/include/asm/sections.h |  3 +-
 arch/arm64/kernel/alternative.c   |  2 +-
 arch/arm64/kernel/smp.c   |  1 +
 arch/arm64/kernel/vmlinux.lds.S   | 25 +---
 arch/arm64/mm/mmu.c   | 61 +---
 virt/kvm/arm/vgic/vgic.c  |  4 +-
 7 files changed, 65 insertions(+), 32 deletions(-)

-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 1/5] arm: kvm: move kvm_vgic_global_state out of .text section

2017-02-14 Thread Ard Biesheuvel
The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyng...@arm.com>
Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 6440b56ec90e..2f373455ed4e 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif 
= STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+   .gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH V10 03/10] efi: parse ARM processor error

2017-02-16 Thread Ard Biesheuvel
On 15 February 2017 at 19:51, Tyler Baicar <tbai...@codeaurora.org> wrote:
> Add support for ARM Common Platform Error Record (CPER).
> UEFI 2.6 specification adds support for ARM specific
> processor error information to be reported as part of the
> CPER records. This provides more detail on for processor error logs.
>
> Signed-off-by: Tyler Baicar <tbai...@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzh...@codeaurora.org>
> Signed-off-by: Naveen Kaje <nk...@codeaurora.org>
> Reviewed-by: James Morse <james.mo...@arm.com>

Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

> ---
>  drivers/firmware/efi/cper.c | 133 
> 
>  include/linux/cper.h|  54 ++
>  2 files changed, 187 insertions(+)
>
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index 8fa4e23..c2b0a12 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -110,12 +110,15 @@ void cper_print_bits(const char *pfx, unsigned int bits,
>  static const char * const proc_type_strs[] = {
> "IA32/X64",
> "IA64",
> +   "ARM",
>  };
>
>  static const char * const proc_isa_strs[] = {
> "IA32",
> "IA64",
> "X64",
> +   "ARM A32/T32",
> +   "ARM A64",
>  };
>
>  static const char * const proc_error_type_strs[] = {
> @@ -139,6 +142,18 @@ void cper_print_bits(const char *pfx, unsigned int bits,
> "corrected",
>  };
>
> +static const char * const arm_reg_ctx_strs[] = {
> +   "AArch32 general purpose registers",
> +   "AArch32 EL1 context registers",
> +   "AArch32 EL2 context registers",
> +   "AArch32 secure context registers",
> +   "AArch64 general purpose registers",
> +   "AArch64 EL1 context registers",
> +   "AArch64 EL2 context registers",
> +   "AArch64 EL3 context registers",
> +   "Misc. system register structure",
> +};
> +
>  static void cper_print_proc_generic(const char *pfx,
> const struct cper_sec_proc_generic *proc)
>  {
> @@ -184,6 +199,114 @@ static void cper_print_proc_generic(const char *pfx,
> printk("%s""IP: 0x%016llx\n", pfx, proc->ip);
>  }
>
> +static void cper_print_proc_arm(const char *pfx,
> +   const struct cper_sec_proc_arm *proc)
> +{
> +   int i, len, max_ctx_type;
> +   struct cper_arm_err_info *err_info;
> +   struct cper_arm_ctx_info *ctx_info;
> +   char newpfx[64];
> +
> +   printk("%s""section length: %d\n", pfx, proc->section_length);
> +   printk("%s""MIDR: 0x%016llx\n", pfx, proc->midr);
> +
> +   len = proc->section_length - (sizeof(*proc) +
> +   proc->err_info_num * (sizeof(*err_info)));
> +   if (len < 0) {
> +   printk("%s""section length is too small\n", pfx);
> +   printk("%s""firmware-generated error record is incorrect\n", 
> pfx);
> +   printk("%s""ERR_INFO_NUM is %d\n", pfx, proc->err_info_num);
> +   return;
> +   }
> +
> +   if (proc->validation_bits & CPER_ARM_VALID_MPIDR)
> +   printk("%s""MPIDR: 0x%016llx\n", pfx, proc->mpidr);
> +   if (proc->validation_bits & CPER_ARM_VALID_AFFINITY_LEVEL)
> +   printk("%s""error affinity level: %d\n", pfx,
> +   proc->affinity_level);
> +   if (proc->validation_bits & CPER_ARM_VALID_RUNNING_STATE) {
> +   printk("%s""running state: 0x%x\n", pfx, proc->running_state);
> +   printk("%s""PSCI state: %d\n", pfx, proc->psci_state);
> +   }
> +
> +   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
> +
> +   err_info = (struct cper_arm_err_info *)(proc + 1);
> +   for (i = 0; i < proc->err_info_num; i++) {
> +   printk("%s""Error info structure %d:\n", pfx, i);
> +   printk("%s""version:%d\n", newpfx, err_info->version);
> +   printk("%s""length:%d\n", newpfx, err_info->length);
> +   if (err_info->validation_bits &
> +   CPER_ARM_INFO_VALID_MULTI_ERR) {

Re: [PATCH V10 05/10] acpi: apei: handle SEA notification type for ARMv8

2017-02-16 Thread Ard Biesheuvel
On 15 February 2017 at 19:51, Tyler Baicar  wrote:
> ARM APEI extension proposal added SEA (Synchronous External Abort)
> notification type for ARMv8.
> Add a new GHES error source handling function for SEA. If an error
> source's notification type is SEA, then this function can be registered
> into the SEA exception handler. That way GHES will parse and report
> SEA exceptions when they occur.
> An SEA can interrupt code that had interrupts masked and is treated as
> an NMI. To aid this the page of address space for mapping APEI buffers
> while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
> changed to use the helper methods to find the prot_t to map with in
> the same way as ghes_ioremap_pfn_irq().
>
> Signed-off-by: Tyler Baicar 
> Signed-off-by: Jonathan (Zhixiong) Zhang 
> Signed-off-by: Naveen Kaje 
> ---
>  arch/arm64/Kconfig|  2 ++
>  arch/arm64/mm/fault.c | 13 
>  drivers/acpi/apei/Kconfig | 14 +
>  drivers/acpi/apei/ghes.c  | 77 
> +++
>  include/acpi/ghes.h   |  7 +
>  5 files changed, 107 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 1117421..8557556 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -53,6 +53,7 @@ config ARM64
> select HANDLE_DOMAIN_IRQ
> select HARDIRQS_SW_RESEND
> select HAVE_ACPI_APEI if (ACPI && EFI)
> +   select HAVE_ACPI_APEI_SEA if (ACPI && EFI)
> select HAVE_ALIGNED_STRUCT_PAGE if SLUB
> select HAVE_ARCH_AUDITSYSCALL
> select HAVE_ARCH_BITREVERSE
> @@ -88,6 +89,7 @@ config ARM64
> select HAVE_IRQ_TIME_ACCOUNTING
> select HAVE_MEMBLOCK
> select HAVE_MEMBLOCK_NODE_MAP if NUMA
> +   select HAVE_NMI if HAVE_ACPI_APEI_SEA
> select HAVE_PATA_PLATFORM
> select HAVE_PERF_EVENTS
> select HAVE_PERF_REGS
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index d178dc0..4e35c72 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -41,6 +41,8 @@
>  #include 
>  #include 
>
> +#include 
> +
>  static const char *fault_name(unsigned int esr);
>
>  #ifdef CONFIG_KPROBES
> @@ -498,6 +500,17 @@ static int do_sea(unsigned long addr, unsigned int esr, 
> struct pt_regs *regs)
> pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>  fault_name(esr), esr, addr);
>
> +   /*
> +* Synchronous aborts may interrupt code which had interrupts masked.
> +* Before calling out into the wider kernel tell the interested
> +* subsystems.
> +*/
> +   if(IS_ENABLED(HAVE_ACPI_APEI_SEA)) {

Missing space after 'if'

> +   nmi_enter();
> +   ghes_notify_sea();
> +   nmi_exit();
> +   }
> +
> info.si_signo = SIGBUS;
> info.si_errno = 0;
> info.si_code  = 0;
> diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
> index b0140c8..ef7f7bd 100644
> --- a/drivers/acpi/apei/Kconfig
> +++ b/drivers/acpi/apei/Kconfig
> @@ -4,6 +4,20 @@ config HAVE_ACPI_APEI
>  config HAVE_ACPI_APEI_NMI
> bool
>
> +config HAVE_ACPI_APEI_SEA

HAVE_xxx Kconfig options are typically non user selectable, so I
suggest to drop the HAVE_ prefix here. Also, you should probably make
it 'default y' rather than select it elsewhere; this will still honour
the dependency on ARM64 && ACPI_APEI_GHES



> +   bool "APEI Synchronous External Abort logging/recovering support"
> +   depends on ARM64 && ACPI_APEI_GHES
> +   help
> + This option should be enabled if the system supports
> + firmware first handling of SEA (Synchronous External Abort).
> + SEA happens with certain faults of data abort or instruction
> + abort synchronous exceptions on ARMv8 systems. If a system
> + supports firmware first handling of SEA, the platform analyzes
> + and handles hardware error notifications from SEA, and it may then
> + form a HW error record for the OS to parse and handle. This
> + option allows the OS to look for such hardware error record, and
> + take appropriate action.
> +
>  config ACPI_APEI
> bool "ACPI Platform Error Interface (APEI)"
> select MISC_FILESYSTEMS
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b25e7cf..87045dc 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -114,11 +114,7 @@
>   * Two virtual pages are used, one for IRQ/PROCESS context, the other for
>   * NMI context (optionally).
>   */
> -#ifdef CONFIG_HAVE_ACPI_APEI_NMI
>  #define GHES_IOREMAP_PAGES   2
> -#else
> -#define GHES_IOREMAP_PAGES   1
> -#endif
>  #define GHES_IOREMAP_IRQ_PAGE(base)(base)
>  #define GHES_IOREMAP_NMI_PAGE(base)((base) + PAGE_SIZE)
>
> @@ 

Re: [PATCH V10 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1

2017-02-16 Thread Ard Biesheuvel
On 15 February 2017 at 19:51, Tyler Baicar <tbai...@codeaurora.org> wrote:
> Currently when a RAS error is reported it is not timestamped.
> The ACPI 6.1 spec adds the timestamp field to the generic error
> data entry v3 structure. The timestamp of when the firmware
> generated the error is now being reported.
>
> Signed-off-by: Tyler Baicar <tbai...@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzh...@codeaurora.org>
> Signed-off-by: Richard Ruigrok <rruig...@codeaurora.org>
> Signed-off-by: Naveen Kaje <nk...@codeaurora.org>
> Reviewed-by: James Morse <james.mo...@arm.com>

Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

> ---
>  drivers/acpi/apei/ghes.c|  9 ---
>  drivers/firmware/efi/cper.c | 63 
> +++--
>  include/acpi/ghes.h | 22 
>  3 files changed, 77 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 5e1ec41..b25e7cf 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -420,7 +420,8 @@ static void ghes_handle_memory_failure(struct 
> acpi_hest_generic_data *gdata, int
> int flags = -1;
> int sec_sev = ghes_severity(gdata->error_severity);
> struct cper_sec_mem_err *mem_err;
> -   mem_err = (struct cper_sec_mem_err *)(gdata + 1);
> +
> +   mem_err = acpi_hest_generic_data_payload(gdata);
>
> if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> return;
> @@ -457,7 +458,8 @@ static void ghes_do_proc(struct ghes *ghes,
> if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>  CPER_SEC_PLATFORM_MEM)) {
> struct cper_sec_mem_err *mem_err;
> -   mem_err = (struct cper_sec_mem_err *)(gdata+1);
> +
> +   mem_err = acpi_hest_generic_data_payload(gdata);
> ghes_edac_report_mem_error(ghes, sev, mem_err);
>
> arch_apei_report_mem_error(sev, mem_err);
> @@ -467,7 +469,8 @@ static void ghes_do_proc(struct ghes *ghes,
> else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>   CPER_SEC_PCIE)) {
> struct cper_sec_pcie *pcie_err;
> -   pcie_err = (struct cper_sec_pcie *)(gdata+1);
> +
> +   pcie_err = acpi_hest_generic_data_payload(gdata);
> if (sev == GHES_SEV_RECOVERABLE &&
> sec_sev == GHES_SEV_RECOVERABLE &&
> pcie_err->validation_bits & 
> CPER_PCIE_VALID_DEVICE_ID &&
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index d425374..8fa4e23 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -32,6 +32,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>
>  #define INDENT_SP  " "
>
> @@ -386,13 +389,37 @@ static void cper_print_pcie(const char *pfx, const 
> struct cper_sec_pcie *pcie,
> pfx, pcie->bridge.secondary_status, pcie->bridge.control);
>  }
>
> +static void cper_estatus_print_section_v300(const char *pfx,
> +   const struct acpi_hest_generic_data_v300 *gdata)
> +{
> +   __u8 hour, min, sec, day, mon, year, century, *timestamp;
> +
> +   if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) {
> +   timestamp = (__u8 *)&(gdata->time_stamp);
> +   sec = bcd2bin(timestamp[0]);
> +   min = bcd2bin(timestamp[1]);
> +   hour = bcd2bin(timestamp[2]);
> +   day = bcd2bin(timestamp[4]);
> +   mon = bcd2bin(timestamp[5]);
> +   year = bcd2bin(timestamp[6]);
> +   century = bcd2bin(timestamp[7]);
> +   printk("%stime: %7s %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx,
> +   0x01 & *(timestamp + 3) ? "precise" : "", century,
> +   year, mon, day, hour, min, sec);
> +   }
> +}
> +
>  static void cper_estatus_print_section(
> -   const char *pfx, const struct acpi_hest_generic_data *gdata, int 
> sec_no)
> +   const char *pfx, struct acpi_hest_generic_data *gdata, int sec_no)
>  {
> uuid_le *sec_type = (uuid_le *)gdata->section_type;
> __u16 severity;
> char newpfx[64];
>
> +   if (acpi_hest_generic_data_version(gdata) >= 3)
> +   cper_estatus_print_sec

Re: [PATCH v3 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-17 Thread Ard Biesheuvel
On 16 September 2016 at 13:44, Alexander Graf  wrote:
>
>> On 16 Sep 2016, at 14:40, Paolo Bonzini  wrote:
>>
>>
>>
>> On 16/09/2016 14:29, Christoffer Dall wrote:
 It may be useful for migrating a gicv2 VM to a gicv3 host without gicv2 
 emulation as well.
>>>
>>> I don't see why you'd do this; the VGIC hardware can perfectly well be
>>> used for nesting as well, and this works rather well.
>>
>> Can GICv3 emulate GICv2 in a guest?
>
> It depends on the gicv3 configuration. As an SOC vendor you can either enable 
> gicv2 compatibility or disable it. ThunderX for example is gicv3 only. LS2085 
> can handle gicv2 in the guest with gicv3 on the host.
>

Note that 'disabled' here means 'not implemented it in silicon', so
there is no way you will ever be able to re-enable GICv2 compatibility
on a ThunderX. Another thing to keep in mind is that GICv2
compatibility is disabled on the non-secure side if the secure side
elects to configure its view of the GIC as v3 (i.e., in order to
support >8 cores)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 0/2] KVM: ARM: Enable vtimers with user space gic

2016-09-17 Thread Ard Biesheuvel
On 17 September 2016 at 16:38, Peter Maydell <peter.mayd...@linaro.org> wrote:
> On 17 September 2016 at 16:28, Ard Biesheuvel <ard.biesheu...@linaro.org> 
> wrote:
>> Another thing to keep in mind is that GICv2
>> compatibility is disabled on the non-secure side if the secure side
>> elects to configure its view of the GIC as v3 (i.e., in order to
>> support >8 cores)
>
> If I'm reading the 'legacy configurations' chapter of the GICv3
> spec correctly, that is true for the NS host OS (ie the one
> handling physical interrupts) but a guest OS can still use
> the old GICv2-compat interface (assuming it was implemented
> in silicon at all).
>

Ah right, apologies for spreading misinformation. But my first point
is still valid.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/2] arm64: Support systems without FP/ASIMD

2016-10-25 Thread Ard Biesheuvel
Hi Suzuki,

On 25 October 2016 at 14:50, Suzuki K Poulose <suzuki.poul...@arm.com> wrote:
> The arm64 kernel assumes that FP/ASIMD units are always present
> and accesses the FP/ASIMD specific registers unconditionally. This
> could cause problems when they are absent. This patch adds the
> support for kernel handling systems without FP/ASIMD by skipping the
> register access within the kernel. For kvm, we trap the accesses
> to FP/ASIMD and inject an undefined instruction exception to the VM.
>
> The callers of the exported kernel_neon_begin_parital() should
> make sure that the FP/ASIMD is supported.
>
> Cc: Catalin Marinas <catalin.mari...@arm.com>
> Cc: Will Deacon <will.dea...@arm.com>
> Cc: Christoffer Dall <christoffer.d...@linaro.org>
> Cc: Marc Zyngier <marc.zyng...@arm.com>
> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poul...@arm.com>
> ---
> ---
>  arch/arm64/crypto/aes-ce-ccm-glue.c |  2 +-
>  arch/arm64/crypto/aes-ce-cipher.c   |  2 ++
>  arch/arm64/crypto/ghash-ce-glue.c   |  2 ++
>  arch/arm64/crypto/sha1-ce-glue.c|  2 ++
>  arch/arm64/include/asm/cpufeature.h |  8 +++-
>  arch/arm64/include/asm/neon.h   |  3 ++-
>  arch/arm64/kernel/cpufeature.c  | 15 +++
>  arch/arm64/kernel/fpsimd.c  | 14 ++
>  arch/arm64/kvm/handle_exit.c| 11 +++
>  arch/arm64/kvm/hyp/hyp-entry.S  |  9 -
>  arch/arm64/kvm/hyp/switch.c |  5 -
>  11 files changed, 68 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c 
> b/arch/arm64/crypto/aes-ce-ccm-glue.c
> index f4bf2f2..d001b4e 100644
> --- a/arch/arm64/crypto/aes-ce-ccm-glue.c
> +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c
> @@ -296,7 +296,7 @@ static struct aead_alg ccm_aes_alg = {
>
>  static int __init aes_mod_init(void)
>  {
> -   if (!(elf_hwcap & HWCAP_AES))
> +   if (!(elf_hwcap & HWCAP_AES) || !system_supports_fpsimd())

This looks weird to me. All crypto extensionsinstructions except CRC
operate strictly on FP/ASIMD registers, and so support for FP/ASIMD is
implied by having HWCAP_AES. In other words, I think it makes more
sense to sanity check that the info registers are consistent with each
other in core code than modifying each user (which for HWCAP_xxx
includes userland) to double check that the world is sane.

> return -ENODEV;
> return crypto_register_aead(_aes_alg);
>  }
> diff --git a/arch/arm64/crypto/aes-ce-cipher.c 
> b/arch/arm64/crypto/aes-ce-cipher.c
> index f7bd9bf..1a43be2 100644
> --- a/arch/arm64/crypto/aes-ce-cipher.c
> +++ b/arch/arm64/crypto/aes-ce-cipher.c
> @@ -253,6 +253,8 @@ static struct crypto_alg aes_alg = {
>
>  static int __init aes_mod_init(void)
>  {
> +   if (!system_supports_fpsimd())
> +   return -ENODEV;
> return crypto_register_alg(_alg);
>  }
>
> diff --git a/arch/arm64/crypto/ghash-ce-glue.c 
> b/arch/arm64/crypto/ghash-ce-glue.c
> index 833ec1e..2bc518d 100644
> --- a/arch/arm64/crypto/ghash-ce-glue.c
> +++ b/arch/arm64/crypto/ghash-ce-glue.c
> @@ -144,6 +144,8 @@ static struct shash_alg ghash_alg = {
>
>  static int __init ghash_ce_mod_init(void)
>  {
> +   if (!system_supports_fpsimd())
> +   return -ENODEV;
> return crypto_register_shash(_alg);
>  }
>
> diff --git a/arch/arm64/crypto/sha1-ce-glue.c 
> b/arch/arm64/crypto/sha1-ce-glue.c
> index aefda98..9f3427a 100644
> --- a/arch/arm64/crypto/sha1-ce-glue.c
> +++ b/arch/arm64/crypto/sha1-ce-glue.c
> @@ -102,6 +102,8 @@ static struct shash_alg alg = {
>
>  static int __init sha1_ce_mod_init(void)
>  {
> +   if (!system_supports_fpsimd())
> +   return -ENODEV;
> return crypto_register_shash();
>  }
>
> diff --git a/arch/arm64/include/asm/cpufeature.h 
> b/arch/arm64/include/asm/cpufeature.h
> index ae5e994..63d739c 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -38,8 +38,9 @@
>  #define ARM64_HAS_32BIT_EL013
>  #define ARM64_HYP_OFFSET_LOW   14
>  #define ARM64_MISMATCHED_CACHE_LINE_SIZE   15
> +#define ARM64_HAS_NO_FPSIMD16
>
> -#define ARM64_NCAPS16
> +#define ARM64_NCAPS17
>
>  #ifndef __ASSEMBLY__
>
> @@ -236,6 +237,11 @@ static inline bool system_supports_mixed_endian_el0(void)
> return 
> id_aa64mmfr0_mixed_endian_el0(read_system_reg(SYS_ID_AA64MMFR0_EL1));
>  }
>
> +static inline bool system_supports_fpsimd(void)
> +{
> +   return !cpus_have_

Re: [PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate

2017-03-09 Thread Ard Biesheuvel
On 9 March 2017 at 20:33, Mark Rutland <mark.rutl...@arm.com> wrote:
> On Thu, Mar 09, 2017 at 09:25:12AM +0100, Ard Biesheuvel wrote:
>> +static inline u64 pte_cont_addr_end(u64 addr, u64 end)
>> +{
>> + return min((addr + CONT_PTE_SIZE) & CONT_PTE_MASK, end);
>> +}
>> +
>> +static inline u64 pmd_cont_addr_end(u64 addr, u64 end)
>> +{
>> + return min((addr + CONT_PMD_SIZE) & CONT_PMD_MASK, end);
>> +}
>
> These differ structurally from the usual p??_addr_end() macros defined
> in include/asm-generic/pgtable.h. I agree the asm-generic macros aren't
> pretty, but it would be nice to be consistent.
>
> I don't think the above handle a partial contiguous span at the end of
> the address space (e.g. where end is initial PAGE_SIZE away from 2^64),
> whereas the asm-generic form does, AFAICT.
>
> Can we please use:
>
> #define pte_cont_addr_end(addr, end)  
>   \
> ({  unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;  
>   \
> (__boundary - 1 < (end) - 1)? __boundary: (end);  
>   \
> })
>
> #define pmd_cont_addr_end(addr, end)  
>   \
> ({  unsigned long __boundary = ((addr) + CONT_PMD_SIZE) & CONT_PMD_MASK;  
>   \
> (__boundary - 1 < (end) - 1)? __boundary: (end);  
>   \
> })
>
> ... instead?
>

OK, so that's what the -1 is for. Either version is fine by me.

> [...]
>
>> +static void init_pte(pte_t *pte, unsigned long addr, unsigned long end,
>> +  phys_addr_t phys, pgprot_t prot)
>>  {
>> + do {
>> + pte_t old_pte = *pte;
>> +
>> + set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
>> +
>> + /*
>> +  * After the PTE entry has been populated once, we
>> +  * only allow updates to the permission attributes.
>> +  */
>> + BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), 
>> pte_val(*pte)));
>> +
>> + } while (pte++, addr += PAGE_SIZE, phys += PAGE_SIZE, addr != end);
>> +}
>> +
>> +static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
>> + unsigned long end, phys_addr_t phys,
>> + pgprot_t prot,
>> + phys_addr_t (*pgtable_alloc)(void),
>> + int flags)
>> +{
>> + unsigned long next;
>>   pte_t *pte;
>>
>>   BUG_ON(pmd_sect(*pmd));
>> @@ -136,45 +156,30 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long 
>> addr,
>>
>>   pte = pte_set_fixmap_offset(pmd, addr);
>>   do {
>> - pte_t old_pte = *pte;
>> + pgprot_t __prot = prot;
>>
>> - set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
>> - phys += PAGE_SIZE;
>> + next = pte_cont_addr_end(addr, end);
>>
>> - /*
>> -  * After the PTE entry has been populated once, we
>> -  * only allow updates to the permission attributes.
>> -  */
>> - BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), 
>> pte_val(*pte)));
>> + /* use a contiguous mapping if the range is suitably aligned */
>> + if addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
>> + (flags & NO_CONT_MAPPINGS) == 0)
>> + __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>>
>> - } while (pte++, addr += PAGE_SIZE, addr != end);
>> + init_pte(pte, addr, next, phys, __prot);
>> +
>> + phys += next - addr;
>> + pte += (next - addr) / PAGE_SIZE;
>> + } while (addr = next, addr != end);
>>
>>   pte_clear_fixmap();
>>  }
>
> I think it would be preferable to pass the pmd down into
> alloc_init_pte(), so that we don't have to mess with the pte in both
> alloc_init_cont_pte() and alloc_init_pte().
>
> Likewise for alloc_init_cont_pmd() and alloc_init_pmd(), regarding the
> pmd.
>
> I realise we'll redundantly map/unmap the PTE for each contiguous span,
> but I doubt there's a case it has a noticeable impact.
>

OK

> With lots of memory we'll use blocks at a higher level, and for
> debug_pagealloc we'll pass the whole pte down to init_pte() as we
> currently do.
>
> [...]
>
>> + if (pud_none(*pud)) {
>> + phys_addr_t pmd_phys;
>> + BUG_ON(!pgtable

[PATCH v4 2/6] arm64: mmu: move TLB maintenance from callers to create_mapping_late()

2017-03-04 Thread Ard Biesheuvel
In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.

Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()

Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..6cafd8723d1a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -319,17 +319,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
 pgd_pgtable_alloc, page_mappings_only);
 }
 
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
- phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+   phys_addr_t size, pgprot_t prot)
 {
if (virt < VMALLOC_START) {
-   pr_warn("BUG: not creating mapping for %pa at 0x%016lx - 
outside kernel range\n",
+   pr_warn("BUG: not updating mapping for %pa at 0x%016lx - 
outside kernel range\n",
, virt);
return;
}
 
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 NULL, debug_pagealloc_enabled());
+
+   /* flush the TLBs after updating live kernel mappings */
+   flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t 
end)
@@ -402,19 +405,16 @@ void mark_rodata_ro(void)
unsigned long section_size;
 
section_size = (unsigned long)_etext - (unsigned long)_text;
-   create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+   update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
 */
section_size = (unsigned long)__init_begin - (unsigned 
long)__start_rodata;
-   create_mapping_late(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
+   update_mapping_prot(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
section_size, PAGE_KERNEL_RO);
 
-   /* flush the TLBs after updating live kernel mappings */
-   flush_tlb_all();
-
debug_checkwx();
 }
 
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 1/6] arm: kvm: move kvm_vgic_global_state out of .text section

2017-03-04 Thread Ard Biesheuvel
The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyng...@arm.com>
Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..7713d96e85b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif 
= STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+   .gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 5/6] arm64: mmu: apply strict permissions to .init.text and .init.data

2017-03-04 Thread Ard Biesheuvel
To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/sections.h |  3 ++-
 arch/arm64/kernel/vmlinux.lds.S   | 25 +---
 arch/arm64/mm/mmu.c   | 12 ++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h 
b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..22582819b2e5 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,7 +24,8 @@ extern char __hibernate_exit_text_start[], 
__hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
-
 #endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
+   __inittext_begin = .;
 
INIT_TEXT_SECTION(8)
.exit.text : {
ARM_EXIT_KEEP(EXIT_TEXT)
}
 
+   . = ALIGN(4);
+   .altinstructions : {
+   __alt_instructions = .;
+   *(.altinstructions)
+   __alt_instructions_end = .;
+   }
+   .altinstr_replacement : {
+   *(.altinstr_replacement)
+   }
+
+   . = ALIGN(PAGE_SIZE);
+   __inittext_end = .;
+   __initdata_begin = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
PERCPU_SECTION(L1_CACHE_BYTES)
 
-   . = ALIGN(4);
-   .altinstructions : {
-   __alt_instructions = .;
-   *(.altinstructions)
-   __alt_instructions_end = .;
-   }
-   .altinstr_replacement : {
-   *(.altinstr_replacement)
-   }
.rela : ALIGN(8) {
*(.rela .rela*)
}
@@ -181,6 +187,7 @@ SECTIONS
__rela_size = SIZEOF(.rela);
 
. = ALIGN(SEGMENT_ALIGN);
+   __initdata_end = .;
__init_end = .;
 
_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index edd982f88714..0612573ef869 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -459,14 +459,18 @@ early_param("rodata", parse_rodata);
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
+   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+   vmlinux_initdata, vmlinux_data;
 
pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
 
map_kernel_segment(pgd, _text, _etext, text_prot, _text);
-   map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
-   map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-  _init);
+   map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+  _rodata);
+   map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+  _inittext);
+   map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+  _initdata);
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, _data);
 
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate

2017-03-04 Thread Ard Biesheuvel
This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

  granule size |  cont PTE  |  cont PMD  |
  -+++
   4 KB|64 KB   |   32 MB|
  16 KB| 2 MB   |1 GB*   |
  64 KB| 2 MB   |   16 GB*   |

* Only when built for 3 or more levels of translation. This is due to the
  fact that a 2 level configuration only consists of PGDs and PTEs, and the
  added complexity of dealing with folded PMDs is not justified considering
  that 16 GB contiguous ranges are likely to be ignored by the hardware (and
  16k/2 levels is a niche configuration)

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 86 ++--
 1 file changed, 63 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0612573ef869..d0ae2f1f44fc 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
  unsigned long end, unsigned long pfn,
  pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(void))
+ phys_addr_t (*pgtable_alloc)(void),
+ bool may_use_cont)
 {
+   pgprot_t __prot = prot;
pte_t *pte;
 
BUG_ON(pmd_sect(*pmd));
@@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
do {
pte_t old_pte = *pte;
 
-   set_pte(pte, pfn_pte(pfn, prot));
+   /*
+* Set the contiguous bit for the subsequent group of PTEs if
+* its size and alignment are appropriate.
+*/
+   if (may_use_cont &&
+   ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
+   if (end - addr >= CONT_PTE_SIZE)
+   __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
+   else
+   __prot = prot;
+   }
+
+   set_pte(pte, pfn_pte(pfn, __prot));
pfn++;
 
/*
@@ -145,8 +159,10 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
  phys_addr_t phys, pgprot_t prot,
  phys_addr_t (*pgtable_alloc)(void),
- bool page_mappings_only)
+ bool page_mappings_only,
+ bool may_use_cont)
 {
+   pgprot_t __prot = prot;
pmd_t *pmd;
unsigned long next;
 
@@ -173,7 +189,19 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, 
unsigned long end,
/* try section mapping first */
if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
  !page_mappings_only) {
-   pmd_set_huge(pmd, phys, prot);
+   /*
+* Set the contiguous bit for the subsequent group of
+* PMDs if its size and alignment are appropriate.
+*/
+   if (may_use_cont &&
+   ((addr | phys) & ~CONT_PMD_MASK) == 0) {
+   if (end - addr >= CONT_PMD_SIZE)
+   __prot = __pgprot(pgprot_val(prot) |
+ PTE_CONT);
+   else
+   __prot = prot;
+   }
+   pmd_set_huge(pmd, phys, __prot);
 
/*
 * After the PMD entry has been populated once, we
@@ -183,7 +211,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, 
unsigned long end,
  pmd_val(*pmd)));
} else {

[PATCH v4 3/6] arm64: alternatives: apply boot time fixups via the linear mapping

2017-03-04 Thread Ard Biesheuvel
One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/mmu.h|  1 +
 arch/arm64/kernel/alternative.c | 11 +-
 arch/arm64/kernel/smp.c |  1 +
 arch/arm64/mm/mmu.c | 22 +++-
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   unsigned long virt, phys_addr_t size,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8840c109c5d6 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -105,11 +105,11 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 
*insnptr, u32 *altinsnptr)
return insn;
 }
 
-static void __apply_alternatives(void *alt_region)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias)
 {
struct alt_instr *alt;
struct alt_region *region = alt_region;
-   u32 *origptr, *replptr;
+   u32 *origptr, *replptr, *updptr;
 
for (alt = region->begin; alt < region->end; alt++) {
u32 insn;
@@ -124,11 +124,12 @@ static void __apply_alternatives(void *alt_region)
 
origptr = ALT_ORIG_PTR(alt);
replptr = ALT_REPL_PTR(alt);
+   updptr = use_linear_alias ? (u32 *)lm_alias(origptr) : origptr;
nr_inst = alt->alt_len / sizeof(insn);
 
for (i = 0; i < nr_inst; i++) {
insn = get_alt_insn(alt, origptr + i, replptr + i);
-   *(origptr + i) = cpu_to_le32(insn);
+   updptr[i] = cpu_to_le32(insn);
}
 
flush_icache_range((uintptr_t)origptr,
@@ -155,7 +156,7 @@ static int __apply_alternatives_multi_stop(void *unused)
isb();
} else {
BUG_ON(patched);
-   __apply_alternatives();
+   __apply_alternatives(, true);
/* Barriers provided by the cache flushing */
WRITE_ONCE(patched, 1);
}
@@ -176,5 +177,5 @@ void apply_alternatives(void *start, size_t length)
.end= start + length,
};
 
-   __apply_alternatives();
+   __apply_alternatives(, false);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 827d52d78b67..691831c0ba74 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
setup_cpu_features();
hyp_mode_check();
apply_alternatives_all();
+   mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6cafd8723d1a..df377fbe464e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -372,16 +372,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t 
start, phys_addr_t end
 debug_pagealloc_enabled());
 
/*
-* Map the linear alias of the [_text, __init_begin) interval as
-* read-only/non-executable. This makes the contents of the
-* region accessible to subsystems such as hibernate, but
-* protects it from inadvertent modification or execution.
+* Map the linear alias of the [_text, __init_begin) interval
+* as non-executable now, and remove the write permission in
+* mark_linear_text_alias_ro() below (which will be called after
+* alternative patching has completed). This makes the contents
+* of the region accessible to subsystems such as hibernate,
+* but protects it from inadvertent modification or execution.
 */
__create_pgd_mapping(pgd, ke

[PATCH v4 4/6] arm64: mmu: map .text as read-only from the outset

2017-03-04 Thread Ard Biesheuvel
Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

To preserve the existing behavior under rodata=off, which is relied
upon by external debuggers to manage software breakpoints (as pointed
out by Mark), add an early_param() check for rodata=, and use RWX
permissions if it set to 'off'.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Kees Cook <keesc...@chromium.org>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df377fbe464e..edd982f88714 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -416,9 +416,6 @@ void mark_rodata_ro(void)
 {
unsigned long section_size;
 
-   section_size = (unsigned long)_etext - (unsigned long)_text;
-   update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
-   section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
@@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void 
*va_start, void *va_end,
vm_area_add_early(vma);
 }
 
+static int __init parse_rodata(char *arg)
+{
+   return strtobool(arg, _enabled);
+}
+early_param("rodata", parse_rodata);
+
 /*
  * Create fine-grained mappings for the kernel.
  */
@@ -458,7 +461,9 @@ static void __init map_kernel(pgd_t *pgd)
 {
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
 
-   map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, _text);
+   pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
+
+   map_kernel_segment(pgd, _text, _etext, text_prot, _text);
map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
   _init);
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 09/10] arm64/mmu: replace 'page_mappings_only' parameter with flags argument

2017-03-09 Thread Ard Biesheuvel
In preparation of extending the policy for manipulating kernel mappings
with whether or not contiguous hints may be used in the page tables,
replace the bool 'page_mappings_only' with a flags field and a flag
NO_BLOCK_MAPPINGS.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 45 
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a6d7a86dd2b8..9babafa253cf 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -43,6 +43,8 @@
 #include 
 #include 
 
+#define NO_BLOCK_MAPPINGS  BIT(0)
+
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 
 u64 kimage_voffset __ro_after_init;
@@ -153,7 +155,7 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
  phys_addr_t phys, pgprot_t prot,
  phys_addr_t (*pgtable_alloc)(void),
- bool page_mappings_only)
+ int flags)
 {
pmd_t *pmd;
unsigned long next;
@@ -180,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, 
unsigned long end,
 
/* try section mapping first */
if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
- !page_mappings_only) {
+   (flags & NO_BLOCK_MAPPINGS) == 0) {
pmd_set_huge(pmd, phys, prot);
 
/*
@@ -217,7 +219,7 @@ static inline bool use_1G_block(unsigned long addr, 
unsigned long next,
 static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
  phys_addr_t phys, pgprot_t prot,
  phys_addr_t (*pgtable_alloc)(void),
- bool page_mappings_only)
+ int flags)
 {
pud_t *pud;
unsigned long next;
@@ -239,7 +241,8 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, 
unsigned long end,
/*
 * For 4K granule only, attempt to put down a 1GB block
 */
-   if (use_1G_block(addr, next, phys) && !page_mappings_only) {
+   if (use_1G_block(addr, next, phys) &&
+   (flags & NO_BLOCK_MAPPINGS) == 0) {
pud_set_huge(pud, phys, prot);
 
/*
@@ -250,7 +253,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, 
unsigned long end,
  pud_val(*pud)));
} else {
alloc_init_pmd(pud, addr, next, phys, prot,
-  pgtable_alloc, page_mappings_only);
+  pgtable_alloc, flags);
 
BUG_ON(pud_val(old_pud) != 0 &&
   pud_val(old_pud) != pud_val(*pud));
@@ -265,7 +268,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t 
phys,
 unsigned long virt, phys_addr_t size,
 pgprot_t prot,
 phys_addr_t (*pgtable_alloc)(void),
-bool page_mappings_only)
+int flags)
 {
unsigned long addr, length, end, next;
pgd_t *pgd = pgd_offset_raw(pgdir, virt);
@@ -285,7 +288,7 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t 
phys,
do {
next = pgd_addr_end(addr, end);
alloc_init_pud(pgd, addr, next, phys, prot, pgtable_alloc,
-  page_mappings_only);
+  flags);
phys += next - addr;
} while (pgd++, addr = next, addr != end);
 }
@@ -314,17 +317,22 @@ static void __init create_mapping_noalloc(phys_addr_t 
phys, unsigned long virt,
, virt);
return;
}
-   __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
+   __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
   unsigned long virt, phys_addr_t size,
   pgprot_t prot, bool page_mappings_only)
 {
+   int flags;
+
BUG_ON(mm == _mm);
 
+   if (page_mappings_only)
+   flags = NO_BLOCK_MAPPINGS;
+
__create_pgd_mapping(mm->pgd, phys, virt, size, prot,
-pgd_pgtable_alloc, page_mappings_only);
+pgd_pgtable_alloc, flags);
 }
 
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -336,7 +344,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned 
long virt,
   

[PATCH v5 10/10] arm64: mm: set the contiguous bit for kernel mappings where appropriate

2017-03-09 Thread Ard Biesheuvel
This is the third attempt at enabling the use of contiguous hints for
kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
it turned out that updating permission attributes on live contiguous ranges
may result in TLB conflicts. So this time, the contiguous hint is not set
for .rodata or for the linear alias of .text/.rodata, both of which are
mapped read-write initially, and remapped read-only at a later stage.
(Note that the latter region could also be unmapped and remapped again
with updated permission attributes, given that the region, while live, is
only mapped for the convenience of the hibernation code, but that also
means the TLB footprint is negligible anyway, so why bother)

This enables the following contiguous range sizes for the virtual mapping
of the kernel image, and for the linear mapping:

  granule size |  cont PTE  |  cont PMD  |
  -+++
   4 KB|64 KB   |   32 MB|
  16 KB| 2 MB   |1 GB*   |
  64 KB| 2 MB   |   16 GB*   |

* Only when built for 3 or more levels of translation. This is due to the
  fact that a 2 level configuration only consists of PGDs and PTEs, and the
  added complexity of dealing with folded PMDs is not justified considering
  that 16 GB contiguous ranges are likely to be ignored by the hardware (and
  16k/2 levels is a niche configuration)

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/pgtable.h |  10 ++
 arch/arm64/mm/mmu.c  | 154 +---
 2 files changed, 114 insertions(+), 50 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0eef6064bf3b..f10a7bf81849 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -74,6 +74,16 @@ extern unsigned long empty_zero_page[PAGE_SIZE / 
sizeof(unsigned long)];
 #define pte_user_exec(pte) (!(pte_val(pte) & PTE_UXN))
 #define pte_cont(pte)  (!!(pte_val(pte) & PTE_CONT))
 
+static inline u64 pte_cont_addr_end(u64 addr, u64 end)
+{
+   return min((addr + CONT_PTE_SIZE) & CONT_PTE_MASK, end);
+}
+
+static inline u64 pmd_cont_addr_end(u64 addr, u64 end)
+{
+   return min((addr + CONT_PMD_SIZE) & CONT_PMD_MASK, end);
+}
+
 #ifdef CONFIG_ARM64_HW_AFDBM
 #define pte_hw_dirty(pte)  (pte_write(pte) && !(pte_val(pte) & PTE_RDONLY))
 #else
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 9babafa253cf..e2ffab56c1a6 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -44,6 +44,7 @@
 #include 
 
 #define NO_BLOCK_MAPPINGS  BIT(0)
+#define NO_CONT_MAPPINGS   BIT(1)
 
 u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
 
@@ -116,11 +117,30 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
return ((old ^ new) & ~mask) == 0;
 }
 
-static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
- unsigned long end, phys_addr_t phys,
- pgprot_t prot,
- phys_addr_t (*pgtable_alloc)(void))
+static void init_pte(pte_t *pte, unsigned long addr, unsigned long end,
+phys_addr_t phys, pgprot_t prot)
 {
+   do {
+   pte_t old_pte = *pte;
+
+   set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
+
+   /*
+* After the PTE entry has been populated once, we
+* only allow updates to the permission attributes.
+*/
+   BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
+
+   } while (pte++, addr += PAGE_SIZE, phys += PAGE_SIZE, addr != end);
+}
+
+static void alloc_init_cont_pte(pmd_t *pmd, unsigned long addr,
+   unsigned long end, phys_addr_t phys,
+   pgprot_t prot,
+   phys_addr_t (*pgtable_alloc)(void),
+   int flags)
+{
+   unsigned long next;
pte_t *pte;
 
BUG_ON(pmd_sect(*pmd));
@@ -136,45 +156,30 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 
pte = pte_set_fixmap_offset(pmd, addr);
do {
-   pte_t old_pte = *pte;
+   pgprot_t __prot = prot;
 
-   set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
-   phys += PAGE_SIZE;
+   next = pte_cont_addr_end(addr, end);
 
-   /*
-* After the PTE entry has been populated once, we
-* only allow updates to the permission attributes.
-*/
-   BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), pte_val(*pte)));
+   /* use a contiguous mapping if the range is suitably aligned */
+   if addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
+   (flags & NO_CONT_MAPPINGS) == 0)
+  

[PATCH v5 02/10] arm64: mmu: move TLB maintenance from callers to create_mapping_late()

2017-03-09 Thread Ard Biesheuvel
In preparation of refactoring the kernel mapping logic so that text regions
are never mapped writable, which would require adding explicit TLB
maintenance to new call sites of create_mapping_late() (which is currently
invoked twice from the same function), move the TLB maintenance from the
call site into create_mapping_late() itself, and change it from a full
TLB flush into a flush by VA, which is more appropriate here.

Also, given that create_mapping_late() has evolved into a routine that only
updates protection bits on existing mappings, rename it to
update_mapping_prot()

Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d28dbcf596b6..6cafd8723d1a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -319,17 +319,20 @@ void __init create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
 pgd_pgtable_alloc, page_mappings_only);
 }
 
-static void create_mapping_late(phys_addr_t phys, unsigned long virt,
- phys_addr_t size, pgprot_t prot)
+static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
+   phys_addr_t size, pgprot_t prot)
 {
if (virt < VMALLOC_START) {
-   pr_warn("BUG: not creating mapping for %pa at 0x%016lx - 
outside kernel range\n",
+   pr_warn("BUG: not updating mapping for %pa at 0x%016lx - 
outside kernel range\n",
, virt);
return;
}
 
__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
 NULL, debug_pagealloc_enabled());
+
+   /* flush the TLBs after updating live kernel mappings */
+   flush_tlb_kernel_range(virt, virt + size);
 }
 
 static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t 
end)
@@ -402,19 +405,16 @@ void mark_rodata_ro(void)
unsigned long section_size;
 
section_size = (unsigned long)_etext - (unsigned long)_text;
-   create_mapping_late(__pa_symbol(_text), (unsigned long)_text,
+   update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
 */
section_size = (unsigned long)__init_begin - (unsigned 
long)__start_rodata;
-   create_mapping_late(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
+   update_mapping_prot(__pa_symbol(__start_rodata), (unsigned 
long)__start_rodata,
section_size, PAGE_KERNEL_RO);
 
-   /* flush the TLBs after updating live kernel mappings */
-   flush_tlb_all();
-
debug_checkwx();
 }
 
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 03/10] arm64: alternatives: apply boot time fixups via the linear mapping

2017-03-09 Thread Ard Biesheuvel
One important rule of thumb when desiging a secure software system is
that memory should never be writable and executable at the same time.
We mostly adhere to this rule in the kernel, except at boot time, when
regions may be mapped RWX until after we are done applying alternatives
or making other one-off changes.

For the alternative patching, we can improve the situation by applying
the fixups via the linear mapping, which is never mapped with executable
permissions. So map the linear alias of .text with RW- permissions
initially, and remove the write permissions as soon as alternative
patching has completed.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/mmu.h|  1 +
 arch/arm64/kernel/alternative.c | 11 +-
 arch/arm64/kernel/smp.c |  1 +
 arch/arm64/mm/mmu.c | 22 +++-
 4 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 47619411f0ff..5468c834b072 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -37,5 +37,6 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   unsigned long virt, phys_addr_t size,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
+extern void mark_linear_text_alias_ro(void);
 
 #endif
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 06d650f61da7..8840c109c5d6 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -105,11 +105,11 @@ static u32 get_alt_insn(struct alt_instr *alt, u32 
*insnptr, u32 *altinsnptr)
return insn;
 }
 
-static void __apply_alternatives(void *alt_region)
+static void __apply_alternatives(void *alt_region, bool use_linear_alias)
 {
struct alt_instr *alt;
struct alt_region *region = alt_region;
-   u32 *origptr, *replptr;
+   u32 *origptr, *replptr, *updptr;
 
for (alt = region->begin; alt < region->end; alt++) {
u32 insn;
@@ -124,11 +124,12 @@ static void __apply_alternatives(void *alt_region)
 
origptr = ALT_ORIG_PTR(alt);
replptr = ALT_REPL_PTR(alt);
+   updptr = use_linear_alias ? (u32 *)lm_alias(origptr) : origptr;
nr_inst = alt->alt_len / sizeof(insn);
 
for (i = 0; i < nr_inst; i++) {
insn = get_alt_insn(alt, origptr + i, replptr + i);
-   *(origptr + i) = cpu_to_le32(insn);
+   updptr[i] = cpu_to_le32(insn);
}
 
flush_icache_range((uintptr_t)origptr,
@@ -155,7 +156,7 @@ static int __apply_alternatives_multi_stop(void *unused)
isb();
} else {
BUG_ON(patched);
-   __apply_alternatives();
+   __apply_alternatives(, true);
/* Barriers provided by the cache flushing */
WRITE_ONCE(patched, 1);
}
@@ -176,5 +177,5 @@ void apply_alternatives(void *start, size_t length)
.end= start + length,
};
 
-   __apply_alternatives();
+   __apply_alternatives(, false);
 }
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ef1caae02110..d4739552da28 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -434,6 +434,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
setup_cpu_features();
hyp_mode_check();
apply_alternatives_all();
+   mark_linear_text_alias_ro();
 }
 
 void __init smp_prepare_boot_cpu(void)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6cafd8723d1a..df377fbe464e 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -372,16 +372,28 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t 
start, phys_addr_t end
 debug_pagealloc_enabled());
 
/*
-* Map the linear alias of the [_text, __init_begin) interval as
-* read-only/non-executable. This makes the contents of the
-* region accessible to subsystems such as hibernate, but
-* protects it from inadvertent modification or execution.
+* Map the linear alias of the [_text, __init_begin) interval
+* as non-executable now, and remove the write permission in
+* mark_linear_text_alias_ro() below (which will be called after
+* alternative patching has completed). This makes the contents
+* of the region accessible to subsystems such as hibernate,
+* but protects it from inadvertent modification or execution.
 */
__create_pgd_mapping(pgd, ke

[PATCH v5 01/10] arm: kvm: move kvm_vgic_global_state out of .text section

2017-03-09 Thread Ard Biesheuvel
The kvm_vgic_global_state struct contains a static key which is
written to by jump_label_init() at boot time. So in preparation of
making .text regions truly (well, almost truly) read-only, mark
kvm_vgic_global_state __ro_after_init so it moves to the .rodata
section instead.

Acked-by: Marc Zyngier <marc.zyng...@arm.com>
Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..7713d96e85b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -29,7 +29,9 @@
 #define DEBUG_SPINLOCK_BUG_ON(p)
 #endif
 
-struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif 
= STATIC_KEY_FALSE_INIT,};
+struct vgic_global kvm_vgic_global_state __ro_after_init = {
+   .gicv3_cpuif = STATIC_KEY_FALSE_INIT,
+};
 
 /*
  * Locking order is always:
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 00/10] arm64: mmu: avoid W+X mappings and re-enable PTE_CONT for kernel

2017-03-09 Thread Ard Biesheuvel
Having memory that is writable and executable at the same time is a
security hazard, and so we tend to avoid those when we can. However,
at boot time, we keep .text mapped writable during the entire init
phase, and the init region itself is mapped rwx as well.

Let's improve the situation by:
- making the alternatives patching use the linear mapping
- splitting the init region into separate text and data regions

This removes all RWX mappings except the really early one created
in head.S (which we could perhaps fix in the future as well)

Changes since v4:
- the PTE_CONT patch has now spawned four more preparatory patches that clean
  up some of the page table creation code before reintroducing the contiguous
  attribute management
- add Mark's R-b to #4 and #5

Changes since v3:
- use linear alias only when patching the core kernel, and not for modules
- add patch to reintroduce the use of PTE_CONT for kernel mappings, except
  for regions that are remapped read-only later on (i.e, .rodata and the
  linear alias of .text+.rodata)

Changes since v2:
  - ensure that text mappings remain writable under rodata=off
  - rename create_mapping_late() to update_mapping_prot()
  - clarify commit log of #2
  - add acks

Ard Biesheuvel (10):
  arm: kvm: move kvm_vgic_global_state out of .text section
  arm64: mmu: move TLB maintenance from callers to create_mapping_late()
  arm64: alternatives: apply boot time fixups via the linear mapping
  arm64: mmu: map .text as read-only from the outset
  arm64: mmu: apply strict permissions to .init.text and .init.data
  arm64/mmu: align alloc_init_pte prototype with pmd/pud versions
  arm64/mmu: ignore debug_pagealloc for kernel segments
  arm64/mmu: add contiguous bit to sanity bug check
  arm64/mmu: replace 'page_mappings_only' parameter with flags argument
  arm64: mm: set the contiguous bit for kernel mappings where
appropriate

 arch/arm64/include/asm/mmu.h  |   1 +
 arch/arm64/include/asm/pgtable.h  |  10 +
 arch/arm64/include/asm/sections.h |   2 +
 arch/arm64/kernel/alternative.c   |  11 +-
 arch/arm64/kernel/smp.c   |   1 +
 arch/arm64/kernel/vmlinux.lds.S   |  25 +-
 arch/arm64/mm/mmu.c   | 250 ++--
 virt/kvm/arm/vgic/vgic.c  |   4 +-
 8 files changed, 212 insertions(+), 92 deletions(-)

-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 04/10] arm64: mmu: map .text as read-only from the outset

2017-03-09 Thread Ard Biesheuvel
Now that alternatives patching code no longer relies on the primary
mapping of .text being writable, we can remove the code that removes
the writable permissions post-init time, and map it read-only from
the outset.

To preserve the existing behavior under rodata=off, which is relied
upon by external debuggers to manage software breakpoints (as pointed
out by Mark), add an early_param() check for rodata=, and use RWX
permissions if it set to 'off'.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Kees Cook <keesc...@chromium.org>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index df377fbe464e..300e98e8cd63 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -416,9 +416,6 @@ void mark_rodata_ro(void)
 {
unsigned long section_size;
 
-   section_size = (unsigned long)_etext - (unsigned long)_text;
-   update_mapping_prot(__pa_symbol(_text), (unsigned long)_text,
-   section_size, PAGE_KERNEL_ROX);
/*
 * mark .rodata as read only. Use __init_begin rather than __end_rodata
 * to cover NOTES and EXCEPTION_TABLE.
@@ -451,6 +448,12 @@ static void __init map_kernel_segment(pgd_t *pgd, void 
*va_start, void *va_end,
vm_area_add_early(vma);
 }
 
+static int __init parse_rodata(char *arg)
+{
+   return strtobool(arg, _enabled);
+}
+early_param("rodata", parse_rodata);
+
 /*
  * Create fine-grained mappings for the kernel.
  */
@@ -458,7 +461,14 @@ static void __init map_kernel(pgd_t *pgd)
 {
static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
 
-   map_kernel_segment(pgd, _text, _etext, PAGE_KERNEL_EXEC, _text);
+   /*
+* External debuggers may need to write directly to the text
+* mapping to install SW breakpoints. Allow this (only) when
+* explicitly requested with rodata=off.
+*/
+   pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
+
+   map_kernel_segment(pgd, _text, _etext, text_prot, _text);
map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
   _init);
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 08/10] arm64/mmu: add contiguous bit to sanity bug check

2017-03-09 Thread Ard Biesheuvel
A mapping with the contiguous bit cannot be safely manipulated while
live, regardless of whether the bit changes between the old and new
mapping. So take this into account when deciding whether the change
is safe.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d3fecd20a136..a6d7a86dd2b8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -103,7 +103,15 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 */
static const pteval_t mask = PTE_PXN | PTE_RDONLY | PTE_WRITE;
 
-   return old  == 0 || new  == 0 || ((old ^ new) & ~mask) == 0;
+   /* creating or taking down mappings is always safe */
+   if (old == 0 || new == 0)
+   return true;
+
+   /* live contiguous mappings may not be manipulated at all */
+   if ((old | new) & PTE_CONT)
+   return false;
+
+   return ((old ^ new) & ~mask) == 0;
 }
 
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 07/10] arm64/mmu: ignore debug_pagealloc for kernel segments

2017-03-09 Thread Ard Biesheuvel
The debug_pagealloc facility manipulates kernel mappings in the linear
region at page granularity to detect out of bounds or use-after-free
accesses. Since the kernel segments are not allocated dynamically,
there is no point in taking the debug_pagealloc_enabled flag into
account for them, and we can use block mappings unconditionally.

Note that this applies equally to the linear alias of text/rodata:
we will never have dynamic allocations there given that the same
memory is statically in use by the kernel image.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c3963c592ec3..d3fecd20a136 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -328,8 +328,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned 
long virt,
return;
}
 
-   __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-NULL, debug_pagealloc_enabled());
+   __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, false);
 
/* flush the TLBs after updating live kernel mappings */
flush_tlb_kernel_range(virt, virt + size);
@@ -381,7 +380,7 @@ static void __init __map_memblock(pgd_t *pgd, phys_addr_t 
start, phys_addr_t end
 */
__create_pgd_mapping(pgd, kernel_start, __phys_to_virt(kernel_start),
 kernel_end - kernel_start, PAGE_KERNEL,
-early_pgtable_alloc, debug_pagealloc_enabled());
+early_pgtable_alloc, false);
 }
 
 void __init mark_linear_text_alias_ro(void)
@@ -437,7 +436,7 @@ static void __init map_kernel_segment(pgd_t *pgd, void 
*va_start, void *va_end,
BUG_ON(!PAGE_ALIGNED(size));
 
__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot,
-early_pgtable_alloc, debug_pagealloc_enabled());
+early_pgtable_alloc, false);
 
vma->addr   = va_start;
vma->phys_addr  = pa_start;
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 05/10] arm64: mmu: apply strict permissions to .init.text and .init.data

2017-03-09 Thread Ard Biesheuvel
To avoid having mappings that are writable and executable at the same
time, split the init region into a .init.text region that is mapped
read-only, and a .init.data region that is mapped non-executable.

This is possible now that the alternative patching occurs via the linear
mapping, and the linear alias of the init region is always mapped writable
(but never executable).

Since the alternatives descriptions themselves are read-only data, move
those into the .init.text region.

Reviewed-by: Laura Abbott <labb...@redhat.com>
Reviewed-by: Mark Rutland <mark.rutl...@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/include/asm/sections.h |  2 ++
 arch/arm64/kernel/vmlinux.lds.S   | 25 +---
 arch/arm64/mm/mmu.c   | 12 ++
 3 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/sections.h 
b/arch/arm64/include/asm/sections.h
index 4e7e7067afdb..941267caa39c 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -24,6 +24,8 @@ extern char __hibernate_exit_text_start[], 
__hibernate_exit_text_end[];
 extern char __hyp_idmap_text_start[], __hyp_idmap_text_end[];
 extern char __hyp_text_start[], __hyp_text_end[];
 extern char __idmap_text_start[], __idmap_text_end[];
+extern char __initdata_begin[], __initdata_end[];
+extern char __inittext_begin[], __inittext_end[];
 extern char __irqentry_text_start[], __irqentry_text_end[];
 extern char __mmuoff_data_start[], __mmuoff_data_end[];
 
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index b8deffa9e1bf..2c93d259046c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -143,12 +143,27 @@ SECTIONS
 
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
+   __inittext_begin = .;
 
INIT_TEXT_SECTION(8)
.exit.text : {
ARM_EXIT_KEEP(EXIT_TEXT)
}
 
+   . = ALIGN(4);
+   .altinstructions : {
+   __alt_instructions = .;
+   *(.altinstructions)
+   __alt_instructions_end = .;
+   }
+   .altinstr_replacement : {
+   *(.altinstr_replacement)
+   }
+
+   . = ALIGN(PAGE_SIZE);
+   __inittext_end = .;
+   __initdata_begin = .;
+
.init.data : {
INIT_DATA
INIT_SETUP(16)
@@ -164,15 +179,6 @@ SECTIONS
 
PERCPU_SECTION(L1_CACHE_BYTES)
 
-   . = ALIGN(4);
-   .altinstructions : {
-   __alt_instructions = .;
-   *(.altinstructions)
-   __alt_instructions_end = .;
-   }
-   .altinstr_replacement : {
-   *(.altinstr_replacement)
-   }
.rela : ALIGN(8) {
*(.rela .rela*)
}
@@ -181,6 +187,7 @@ SECTIONS
__rela_size = SIZEOF(.rela);
 
. = ALIGN(SEGMENT_ALIGN);
+   __initdata_end = .;
__init_end = .;
 
_data = .;
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 300e98e8cd63..75e21c33caff 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -459,7 +459,8 @@ early_param("rodata", parse_rodata);
  */
 static void __init map_kernel(pgd_t *pgd)
 {
-   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_init, 
vmlinux_data;
+   static struct vm_struct vmlinux_text, vmlinux_rodata, vmlinux_inittext,
+   vmlinux_initdata, vmlinux_data;
 
/*
 * External debuggers may need to write directly to the text
@@ -469,9 +470,12 @@ static void __init map_kernel(pgd_t *pgd)
pgprot_t text_prot = rodata_enabled ? PAGE_KERNEL_ROX : 
PAGE_KERNEL_EXEC;
 
map_kernel_segment(pgd, _text, _etext, text_prot, _text);
-   map_kernel_segment(pgd, __start_rodata, __init_begin, PAGE_KERNEL, 
_rodata);
-   map_kernel_segment(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC,
-  _init);
+   map_kernel_segment(pgd, __start_rodata, __inittext_begin, PAGE_KERNEL,
+  _rodata);
+   map_kernel_segment(pgd, __inittext_begin, __inittext_end, text_prot,
+  _inittext);
+   map_kernel_segment(pgd, __initdata_begin, __initdata_end, PAGE_KERNEL,
+  _initdata);
map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, _data);
 
if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 06/10] arm64/mmu: align alloc_init_pte prototype with pmd/pud versions

2017-03-09 Thread Ard Biesheuvel
Align the function prototype of alloc_init_pte() with its pmd and pud
counterparts by replacing the pfn parameter with the equivalent physical
address.

Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
---
 arch/arm64/mm/mmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75e21c33caff..c3963c592ec3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -107,7 +107,7 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
 }
 
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
- unsigned long end, unsigned long pfn,
+ unsigned long end, phys_addr_t phys,
  pgprot_t prot,
  phys_addr_t (*pgtable_alloc)(void))
 {
@@ -128,8 +128,8 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
do {
pte_t old_pte = *pte;
 
-   set_pte(pte, pfn_pte(pfn, prot));
-   pfn++;
+   set_pte(pte, pfn_pte(__phys_to_pfn(phys), prot));
+   phys += PAGE_SIZE;
 
/*
 * After the PTE entry has been populated once, we
@@ -182,7 +182,7 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, 
unsigned long end,
BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
  pmd_val(*pmd)));
} else {
-   alloc_init_pte(pmd, addr, next, __phys_to_pfn(phys),
+   alloc_init_pte(pmd, addr, next, phys,
   prot, pgtable_alloc);
 
BUG_ON(pmd_val(old_pmd) != 0 &&
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v4 6/6] arm64: mm: set the contiguous bit for kernel mappings where appropriate

2017-03-08 Thread Ard Biesheuvel
On 7 March 2017 at 17:46, Mark Rutland <mark.rutl...@arm.com> wrote:
> Hi,
>
> On Sat, Mar 04, 2017 at 02:30:48PM +, Ard Biesheuvel wrote:
>> This is the third attempt at enabling the use of contiguous hints for
>> kernel mappings. The most recent attempt 0bfc445dec9d was reverted after
>> it turned out that updating permission attributes on live contiguous ranges
>> may result in TLB conflicts. So this time, the contiguous hint is not set
>> for .rodata or for the linear alias of .text/.rodata, both of which are
>> mapped read-write initially, and remapped read-only at a later stage.
>> (Note that the latter region could also be unmapped and remapped again
>> with updated permission attributes, given that the region, while live, is
>> only mapped for the convenience of the hibernation code, but that also
>> means the TLB footprint is negligible anyway, so why bother)
>>
>> This enables the following contiguous range sizes for the virtual mapping
>> of the kernel image, and for the linear mapping:
>>
>>   granule size |  cont PTE  |  cont PMD  |
>>   -+++
>>4 KB|64 KB   |   32 MB|
>>   16 KB| 2 MB   |1 GB*   |
>>   64 KB| 2 MB   |   16 GB*   |
>>
>> * Only when built for 3 or more levels of translation. This is due to the
>>   fact that a 2 level configuration only consists of PGDs and PTEs, and the
>>   added complexity of dealing with folded PMDs is not justified considering
>>   that 16 GB contiguous ranges are likely to be ignored by the hardware (and
>>   16k/2 levels is a niche configuration)
>>
>> Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
>> ---
>>  arch/arm64/mm/mmu.c | 86 ++--
>>  1 file changed, 63 insertions(+), 23 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 0612573ef869..d0ae2f1f44fc 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -109,8 +109,10 @@ static bool pgattr_change_is_safe(u64 old, u64 new)
>>  static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>> unsigned long end, unsigned long pfn,
>> pgprot_t prot,
>> -   phys_addr_t (*pgtable_alloc)(void))
>> +   phys_addr_t (*pgtable_alloc)(void),
>> +   bool may_use_cont)
>
> Maybe we should invert this as single_pages_only, to keep the same
> polarity as page_mappings_only?
>
> That might make the calls to __create_pgd_mapping() in __map_memblock()
> look a little nicer, for instance.
>

Good point

>>  {
>> + pgprot_t __prot = prot;
>>   pte_t *pte;
>>
>>   BUG_ON(pmd_sect(*pmd));
>> @@ -128,7 +130,19 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long 
>> addr,
>>   do {
>>   pte_t old_pte = *pte;
>>
>> - set_pte(pte, pfn_pte(pfn, prot));
>> + /*
>> +  * Set the contiguous bit for the subsequent group of PTEs if
>> +  * its size and alignment are appropriate.
>> +  */
>> + if (may_use_cont &&
>> + ((addr | PFN_PHYS(pfn)) & ~CONT_PTE_MASK) == 0) {
>> + if (end - addr >= CONT_PTE_SIZE)
>> + __prot = __pgprot(pgprot_val(prot) | PTE_CONT);
>> + else
>> + __prot = prot;
>> + }
>> +
>> + set_pte(pte, pfn_pte(pfn, __prot));
>>   pfn++;
>>
>>   /*
>
> While it would require more code, I think it would be better to add a
> function between alloc_init_pte() and alloc_init_pmd(), handling this in
> the usual fashion. e.g.
>
> >8
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 0eef606..2c90925 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -74,6 +74,11 @@
>  #define pte_user_exec(pte) (!(pte_val(pte) & PTE_UXN))
>  #define pte_cont(pte)  (!!(pte_val(pte) & PTE_CONT))
>
> +#define pte_cont_addr_end(addr, end) 
>   \
> +({ unsigned long __boundary = ((addr) + CONT_PTE_SIZE) & CONT_PTE_MASK;  
>   \
> +   (__boundary - 1 < (end) - 1)? __boundary: (end);  
>   \
> +})
> +
>  #ifdef CONFIG_ARM64_

Re: [PATCH] arm64: Add ASM modifier for xN register operands

2017-04-24 Thread Ard Biesheuvel
On 24 April 2017 at 18:00, Will Deacon  wrote:
> Hi Matthias,
>
> On Thu, Apr 20, 2017 at 11:30:53AM -0700, Matthias Kaehlcke wrote:
>> Many inline assembly statements don't include the 'x' modifier when
>> using xN registers as operands. This is perfectly valid, however it
>> causes clang to raise warnings like this:
>>
>> warning: value size does not match register size specified by the
>>   constraint and modifier [-Wasm-operand-widths]
>> ...
>> arch/arm64/include/asm/barrier.h:62:23: note: expanded from macro
>>   '__smp_store_release'
>> asm volatile ("stlr %1, %0"
>
> If I understand this correctly, then the warning is emitted when we pass
> in a value smaller than 64-bit, but refer to % without a modifier
> in the inline asm.
>
> However, if that's the case then I don't understand why:
>
>> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
>> index 0c00c87bb9dd..021e1733da0c 100644
>> --- a/arch/arm64/include/asm/io.h
>> +++ b/arch/arm64/include/asm/io.h
>> @@ -39,33 +39,33 @@
>>  #define __raw_writeb __raw_writeb
>>  static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
>>  {
>> - asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
>> + asm volatile("strb %w0, [%x1]" : : "rZ" (val), "r" (addr));
>
> is necessary. addr is a pointer type, so is 64-bit.
>
> Given that the scattergun nature of this patch implies that you've been
> fixing the places where warnings are reported, then I'm confused as to
> why a warning is generated for the case above.
>
> What am I missing?
>

AIUI, Clang now always complains for missing register width modifiers,
not just for placeholders that resolve to a 32-bit (or smaller)
quantity.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 16/27] arm64/sve: Preserve SVE registers around kernel-mode NEON use

2017-08-15 Thread Ard Biesheuvel
On 9 August 2017 at 13:05, Dave Martin <dave.mar...@arm.com> wrote:
> Kernel-mode NEON will corrupt the SVE vector registers, due to the
> way they alias the FPSIMD vector registers in the hardware.
>
> This patch ensures that any live SVE register content for the task
> is saved by kernel_neon_begin().  The data will be restored in the
> usual way on return to userspace.
>
> Signed-off-by: Dave Martin <dave.mar...@arm.com>
> ---
>  arch/arm64/kernel/fpsimd.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 955c873..b7fb836 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -758,8 +758,10 @@ void kernel_neon_begin(void)
> __this_cpu_write(kernel_neon_busy, true);
>
> /* Save unsaved task fpsimd state, if any: */
> -   if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> -   fpsimd_save_state(>thread.fpsimd_state);
> +   if (current->mm) {
> +   task_fpsimd_save();
> +   set_thread_flag(TIF_FOREIGN_FPSTATE);
> +   }
>
> /* Invalidate any task state remaining in the fpsimd regs: */
> __this_cpu_write(fpsimd_last_state, NULL);

Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 11/27] arm64/sve: Core task context handling

2017-08-15 Thread Ard Biesheuvel
Hi Dave,

On 9 August 2017 at 13:05, Dave Martin  wrote:
> This patch adds the core support for switching and managing the SVE
> architectural state of user tasks.
>
> Calls to the existing FPSIMD low-level save/restore functions are
> factored out as new functions task_fpsimd_{save,load}(), since SVE
> now dynamically may or may not need to be handled at these points
> depending on the kernel configuration, hardware features discovered
> at boot, and the runtime state of the task.  To make these
> decisions as fast as possible, const cpucaps are used where
> feasible, via the system_supports_sve() helper.
>
> The SVE registers are only tracked for threads that have explicitly
> used SVE, indicated by the new thread flag TIF_SVE.  Otherwise, the
> FPSIMD view of the architectural state is stored in
> thread.fpsimd_state as usual.
>
> When in use, the SVE registers are not stored directly in
> thread_struct due to their potentially large and variable size.
> Because the task_struct slab allocator must be configured very
> early during kernel boot, it is also tricky to configure it
> correctly to match the maximum vector length provided by the
> hardware, since this depends on examining secondary CPUs as well as
> the primary.  Instead, a pointer sve_state in thread_struct points
> to a dynamically allocated buffer containing the SVE register data,
> and code is added to allocate, duplicate and free this buffer at
> appropriate times.
>
> TIF_SVE is set when taking an SVE access trap from userspace, if
> suitable hardware support has been detected.  This enables SVE for
> the thread: a subsequent return to userspace will disable the trap
> accordingly.  If such a trap is taken without sufficient hardware
> support, SIGILL is sent to the thread instead as if an undefined
> instruction had been executed: this may happen if userspace tries
> to use SVE in a system where not all CPUs support it for example.
>
> The kernel may clear TIF_SVE and disable SVE for the thread
> whenever an explicit syscall is made by userspace, though this is
> considered an optimisation opportunity rather than a deterministic
> guarantee: the kernel may not do this on every syscall, but it is
> permitted to do so.  For backwards compatibility reasons and
> conformance with the spirit of the base AArch64 procedure call
> standard, the subset of the SVE register state that aliases the
> FPSIMD registers is still preserved across a syscall even if this
> happens.
>
> TIF_SVE is also cleared, and SVE disabled, on exec: this is an
> obvious slow path and a hint that we are running a new binary that
> may not use SVE.
>
> Code is added to sync data between thread.fpsimd_state and
> thread.sve_state whenever enabling/disabling SVE, in a manner
> consistent with the SVE architectural programmer's model.
>
> Signed-off-by: Dave Martin 
> ---
>  arch/arm64/include/asm/fpsimd.h  |  19 +++
>  arch/arm64/include/asm/processor.h   |   2 +
>  arch/arm64/include/asm/thread_info.h |   1 +
>  arch/arm64/include/asm/traps.h   |   2 +
>  arch/arm64/kernel/entry.S|  14 +-
>  arch/arm64/kernel/fpsimd.c   | 241 
> ++-
>  arch/arm64/kernel/process.c  |   6 +-
>  arch/arm64/kernel/traps.c|   4 +-
>  8 files changed, 279 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 026a7c7..72090a1 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -20,6 +20,8 @@
>
>  #ifndef __ASSEMBLY__
>
> +#include 
> +
>  /*
>   * FP/SIMD storage area has:
>   *  - FPSR and FPCR
> @@ -72,6 +74,23 @@ extern void sve_load_state(void const *state, u32 const 
> *pfpsr,
>unsigned long vq_minus_1);
>  extern unsigned int sve_get_vl(void);
>
> +#ifdef CONFIG_ARM64_SVE
> +
> +extern size_t sve_state_size(struct task_struct const *task);
> +
> +extern void sve_alloc(struct task_struct *task);
> +extern void fpsimd_release_thread(struct task_struct *task);
> +extern void fpsimd_dup_sve(struct task_struct *dst,
> +  struct task_struct const *src);
> +
> +#else /* ! CONFIG_ARM64_SVE */
> +
> +static void __maybe_unused sve_alloc(struct task_struct *task) { }
> +static void __maybe_unused fpsimd_release_thread(struct task_struct *task) { 
> }
> +static void __maybe_unused fpsimd_dup_sve(struct task_struct *dst,
> + struct task_struct const *src) { }
> +#endif /* ! CONFIG_ARM64_SVE */
> +
>  /* For use by EFI runtime services calls only */
>  extern void __efi_fpsimd_begin(void);
>  extern void __efi_fpsimd_end(void);
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index b7334f1..969feed 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -85,6 +85,8 @@ struct thread_struct {
> 

Re: [PATCH 17/27] arm64/sve: Preserve SVE registers around EFI runtime service calls

2017-08-15 Thread Ard Biesheuvel
  if (__this_cpu_xchg(efi_fpsimd_state_used, false))
> -   fpsimd_load_state(this_cpu_ptr(_fpsimd_state));
> -   else
> +   if (!__this_cpu_xchg(efi_fpsimd_state_used, false))
> kernel_neon_end();
> +   else
> +   if (system_supports_sve() &&
> +   likely(__this_cpu_read(efi_sve_state_used))) {
> +   char const *sve_state = this_cpu_ptr(efi_sve_state);
> +
> +   sve_load_state(sve_state + sve_ffr_offset(sve_max_vl),
> +      _cpu_ptr(_fpsimd_state)->fpsr,
> +  sve_vq_from_vl(sve_get_vl()) - 1);
> +
> +   __this_cpu_write(efi_sve_state_used, false);
> +   } else
> +   fpsimd_load_state(this_cpu_ptr(_fpsimd_state));

Please use braces for non-trivial if/else conditions

>  }
>
>  #endif /* CONFIG_KERNEL_MODE_NEON */
> --
> 2.1.4
>

With those fixed

Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 05/27] arm64: fpsimd: Simplify uses of {set, clear}_ti_thread_flag()

2017-08-15 Thread Ard Biesheuvel
On 9 August 2017 at 13:05, Dave Martin <dave.mar...@arm.com> wrote:
> The existing FPSIMD context switch code contains a couple of
> instances of {set,clear}_ti_thread(task_thread_info(task)).  Since
> there are thread flag manipulators that operate directly on
> task_struct, this verbosity isn't strictly needed.
>
> For consistency, this patch simplifies the affected calls.  This
> should have no impact on behaviour.
>
> Signed-off-by: Dave Martin <dave.mar...@arm.com>

Acked-by: Ard Biesheuvel <ard.biesheu...@linaro.org>

> ---
>  arch/arm64/kernel/fpsimd.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 138fcfa..9c1f268e 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -159,11 +159,9 @@ void fpsimd_thread_switch(struct task_struct *next)
>
> if (__this_cpu_read(fpsimd_last_state) == st
> && st->cpu == smp_processor_id())
> -   clear_ti_thread_flag(task_thread_info(next),
> -TIF_FOREIGN_FPSTATE);
> +   clear_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
> else
> -   set_ti_thread_flag(task_thread_info(next),
> -  TIF_FOREIGN_FPSTATE);
> +   set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE);
> }
>  }
>
> --
> 2.1.4
>
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 12/15] arm-soc: omap: replace open coded VA->PA calculations

2017-08-10 Thread Ard Biesheuvel
On 9 August 2017 at 22:05, Tony Lindgren <t...@atomide.com> wrote:
> * Ard Biesheuvel <ard.biesheu...@linaro.org> [170809 12:24]:
>> On 9 August 2017 at 20:05, Tony Lindgren <t...@atomide.com> wrote:
>> > * Ard Biesheuvel <ard.biesheu...@linaro.org> [170805 13:54]:
>> >> This replaces a couple of open coded calculations to obtain the
>> >> physical address of a far symbol with calls to the new adr_l etc
>> >> macros.
>> >
>> > I gave this series a quick test and omap3 no longer boots it seems.
>> >
>>
>> Thanks Tony. I will investigate
>
> Thanks. Looks like omap4 still boots with all your patches, but
> omap3 won't boot even with patch 12 left out.
>

Are you using the same image on both? Which .config?
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


  1   2   3   >