Re: [PATCH 1/3] MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer

2019-05-24 Thread Paolo Bonzini
On 24/05/19 16:47, Marc Zyngier wrote:
> From: Christoffer Dall 
> 
> I no longer have time to actively review patches and manage the tree and
> it's time to make that official.

Thanks Christopher for your work, I hope to meet you anyway at KVM Forum!

Paolo

> Huge thanks to the incredible Linux community and all the contributors
> who have put up with me over the past years.
> 
> I also take this opportunity to remove the website link to the Columbia
> web page, as that information is no longer up to date and I don't know
> who manages that anymore.
> 
> Signed-off-by: Christoffer Dall 
> Signed-off-by: Marc Zyngier 
> ---
>  MAINTAINERS | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5cfbea4ce575..4ba271a8e0ef 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8611,14 +8611,12 @@ F:arch/x86/include/asm/svm.h
>  F:   arch/x86/kvm/svm.c
>  
>  KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64)
> -M:   Christoffer Dall 
>  M:   Marc Zyngier 
>  R:   James Morse 
>  R:   Julien Thierry 
>  R:   Suzuki K Pouloze 
>  L:   linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
>  L:   kvmarm@lists.cs.columbia.edu
> -W:   http://systems.cs.columbia.edu/projects/kvm-arm
>  T:   git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
>  S:   Maintained
>  F:   arch/arm/include/uapi/asm/kvm*
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] mm, compaction: Make sure we isolate a valid PFN

2019-05-24 Thread Mel Gorman
On Fri, May 24, 2019 at 04:31:48PM +0100, Suzuki K Poulose wrote:
> When we have holes in a normal memory zone, we could endup having
> cached_migrate_pfns which may not necessarily be valid, under heavy memory
> pressure with swapping enabled ( via __reset_isolation_suitable(), triggered
> by kswapd).
> 
> Later if we fail to find a page via fast_isolate_freepages(), we may
> end up using the migrate_pfn we started the search with, as valid
> page. This could lead to accessing NULL pointer derefernces like below,
> due to an invalid mem_section pointer.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 
> 0008 [47/1825]
>  Mem abort info:
>ESR = 0x9604
>Exception class = DABT (current EL), IL = 32 bits
>SET = 0, FnV = 0
>EA = 0, S1PTW = 0
>  Data abort info:
>ISV = 0, ISS = 0x0004
>CM = 0, WnR = 0
>  user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9
>  [0008] pgd=
>  Internal error: Oops: 9604 [#1] SMP
>  ...
>  CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
>  Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 
> 09/25/2018
>  pstate: 6005 (nZCv daif -PAN -UAO)
>  pc : set_pfnblock_flags_mask+0x58/0xe8
>  lr : compaction_alloc+0x300/0x950
>  [...]
>  Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5)
>  Call trace:
>   set_pfnblock_flags_mask+0x58/0xe8
>   compaction_alloc+0x300/0x950
>   migrate_pages+0x1a4/0xbb0
>   compact_zone+0x750/0xde8
>   compact_zone_order+0xd8/0x118
>   try_to_compact_pages+0xb4/0x290
>   __alloc_pages_direct_compact+0x84/0x1e0
>   __alloc_pages_nodemask+0x5e0/0xe18
>   alloc_pages_vma+0x1cc/0x210
>   do_huge_pmd_anonymous_page+0x108/0x7c8
>   __handle_mm_fault+0xdd4/0x1190
>   handle_mm_fault+0x114/0x1c0
>   __get_user_pages+0x198/0x3c0
>   get_user_pages_unlocked+0xb4/0x1d8
>   __gfn_to_pfn_memslot+0x12c/0x3b8
>   gfn_to_pfn_prot+0x4c/0x60
>   kvm_handle_guest_abort+0x4b0/0xcd8
>   handle_exit+0x140/0x1b8
>   kvm_arch_vcpu_ioctl_run+0x260/0x768
>   kvm_vcpu_ioctl+0x490/0x898
>   do_vfs_ioctl+0xc4/0x898
>   ksys_ioctl+0x8c/0xa0
>   __arm64_sys_ioctl+0x28/0x38
>   el0_svc_common+0x74/0x118
>   el0_svc_handler+0x38/0x78
>   el0_svc+0x8/0xc
>  Code: f8607840 f11f 8b011401 9a801020 (f9400400)
>  ---[ end trace af6a35219325a9b6 ]---
> 
> The issue was reported on an arm64 server with 128GB with holes in the zone
> (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 
> KVM
> guest instances.
> 
> This patch fixes the issue by ensuring that the page belongs to a valid PFN
> when we fallback to using the lower limit of the scan range upon failure in
> fast_isolate_freepages().
> 
> Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a 
> migration target")
> Reported-by: Marc Zyngier 
> Signed-off-by: Suzuki K Poulose 

Reviewed-by: Mel Gorman 

-- 
Mel Gorman
SUSE Labs


[PATCH] mm, compaction: Make sure we isolate a valid PFN

2019-05-24 Thread Suzuki K Poulose
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(), triggered
by kswapd).

Later if we fail to find a page via fast_isolate_freepages(), we may
end up using the migrate_pfn we started the search with, as valid
page. This could lead to accessing NULL pointer derefernces like below,
due to an invalid mem_section pointer.

Unable to handle kernel NULL pointer dereference at virtual address 
0008 [47/1825]
 Mem abort info:
   ESR = 0x9604
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9
 [0008] pgd=
 Internal error: Oops: 9604 [#1] SMP
 ...
 CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 
09/25/2018
 pstate: 6005 (nZCv daif -PAN -UAO)
 pc : set_pfnblock_flags_mask+0x58/0xe8
 lr : compaction_alloc+0x300/0x950
 [...]
 Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5)
 Call trace:
  set_pfnblock_flags_mask+0x58/0xe8
  compaction_alloc+0x300/0x950
  migrate_pages+0x1a4/0xbb0
  compact_zone+0x750/0xde8
  compact_zone_order+0xd8/0x118
  try_to_compact_pages+0xb4/0x290
  __alloc_pages_direct_compact+0x84/0x1e0
  __alloc_pages_nodemask+0x5e0/0xe18
  alloc_pages_vma+0x1cc/0x210
  do_huge_pmd_anonymous_page+0x108/0x7c8
  __handle_mm_fault+0xdd4/0x1190
  handle_mm_fault+0x114/0x1c0
  __get_user_pages+0x198/0x3c0
  get_user_pages_unlocked+0xb4/0x1d8
  __gfn_to_pfn_memslot+0x12c/0x3b8
  gfn_to_pfn_prot+0x4c/0x60
  kvm_handle_guest_abort+0x4b0/0xcd8
  handle_exit+0x140/0x1b8
  kvm_arch_vcpu_ioctl_run+0x260/0x768
  kvm_vcpu_ioctl+0x490/0x898
  do_vfs_ioctl+0xc4/0x898
  ksys_ioctl+0x8c/0xa0
  __arm64_sys_ioctl+0x28/0x38
  el0_svc_common+0x74/0x118
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc
 Code: f8607840 f11f 8b011401 9a801020 (f9400400)
 ---[ end trace af6a35219325a9b6 ]---

The issue was reported on an arm64 server with 128GB with holes in the zone
(e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM
guest instances.

This patch fixes the issue by ensuring that the page belongs to a valid PFN
when we fallback to using the lower limit of the scan range upon failure in
fast_isolate_freepages().

Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a 
migration target")
Reported-by: Marc Zyngier 
Signed-off-by: Suzuki K Poulose 
---
 mm/compaction.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9febc8c..9e1b9ac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
page = pfn_to_page(highest);
cc->free_pfn = highest;
} else {
-   if (cc->direct_compaction) {
+   if (cc->direct_compaction && 
pfn_valid(min_pfn)) {
page = pfn_to_page(min_pfn);
cc->free_pfn = min_pfn;
}
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[GIT PULL] KVM/arm updates for 5.2-rc2

2019-05-24 Thread Marc Zyngier
Paolo, Radim,

This is the first batch of KVM/arm fixes for 5.2. The biggest item on
the menu is Christoffer removing himself from the MAINTAINERS
file. He'll be missed. The rest is a set of fixes moving some code
around to prevent KASAN and co from crashing the kernel on non-VHE
systems.

Please pull,

M.

The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9:

  Linux 5.2-rc1 (2019-05-19 15:47:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvmarm-fixes-for-5.2

for you to fetch changes up to 623e1528d4090bd1abaf93ec46f047dee9a6fb32:

  KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid 
instrumentation (2019-05-24 14:53:20 +0100)


KVM/arm updates for 5.2-rc2

- Correctly annotate HYP-callable code to be non-traceable
- Remove Christoffer from the MAINTAINERS file as his request


Christoffer Dall (1):
  MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer

James Morse (2):
  KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid 
instrumentation
  KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid 
instrumentation

 MAINTAINERS   |   2 -
 arch/arm/kvm/hyp/Makefile |   1 +
 arch/arm64/include/asm/kvm_host.h |   3 -
 arch/arm64/kvm/hyp/Makefile   |   1 +
 arch/arm64/kvm/hyp/switch.c   |  39 +++
 arch/arm64/kvm/pmu.c  |  38 ---
 virt/kvm/arm/aarch32.c| 121 -
 virt/kvm/arm/hyp/aarch32.c| 136 ++
 8 files changed, 177 insertions(+), 164 deletions(-)
 create mode 100644 virt/kvm/arm/hyp/aarch32.c
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/3] MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer

2019-05-24 Thread Marc Zyngier
From: Christoffer Dall 

I no longer have time to actively review patches and manage the tree and
it's time to make that official.

Huge thanks to the incredible Linux community and all the contributors
who have put up with me over the past years.

I also take this opportunity to remove the website link to the Columbia
web page, as that information is no longer up to date and I don't know
who manages that anymore.

Signed-off-by: Christoffer Dall 
Signed-off-by: Marc Zyngier 
---
 MAINTAINERS | 2 --
 1 file changed, 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5cfbea4ce575..4ba271a8e0ef 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8611,14 +8611,12 @@ F:  arch/x86/include/asm/svm.h
 F: arch/x86/kvm/svm.c
 
 KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64)
-M: Christoffer Dall 
 M: Marc Zyngier 
 R: James Morse 
 R: Julien Thierry 
 R: Suzuki K Pouloze 
 L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
 L: kvmarm@lists.cs.columbia.edu
-W: http://systems.cs.columbia.edu/projects/kvm-arm
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
 S: Maintained
 F: arch/arm/include/uapi/asm/kvm*
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 3/3] KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation

2019-05-24 Thread Marc Zyngier
From: James Morse 

KVM has helpers to handle the condition codes of trapped aarch32
instructions. These are marked __hyp_text and used from HYP, but they
aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN
and KCOV instrumentation.

Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting
an aarch32 guest on a host built with the ASAN/KCOV debug options.

Fixes: 021234ef3752f ("KVM: arm64: Make kvm_condition_valid32() accessible from 
EL2")
Fixes: 8cebe750c4d9a ("arm64: KVM: Make kvm_skip_instr32 available to HYP")
Signed-off-by: James Morse 
Signed-off-by: Marc Zyngier 
---
 arch/arm/kvm/hyp/Makefile   |   1 +
 arch/arm64/kvm/hyp/Makefile |   1 +
 virt/kvm/arm/aarch32.c  | 121 
 virt/kvm/arm/hyp/aarch32.c  | 136 
 4 files changed, 138 insertions(+), 121 deletions(-)
 create mode 100644 virt/kvm/arm/hyp/aarch32.c

diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index d2b5ec9c4b92..ba88b1eca93c 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -11,6 +11,7 @@ CFLAGS_ARMV7VE   :=$(call cc-option, 
-march=armv7ve)
 
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
+obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/aarch32.o
 
 obj-$(CONFIG_KVM_ARM_HOST) += tlb.o
 obj-$(CONFIG_KVM_ARM_HOST) += cp15-sr.o
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 82d1904328ad..ea710f674cb6 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -10,6 +10,7 @@ KVM=../../../../virt/kvm
 
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o
+obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/aarch32.o
 
 obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-cpuif-proxy.o
 obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c
index 5abbe9b3c652..6880236974b8 100644
--- a/virt/kvm/arm/aarch32.c
+++ b/virt/kvm/arm/aarch32.c
@@ -25,127 +25,6 @@
 #include 
 #include 
 
-/*
- * stolen from arch/arm/kernel/opcodes.c
- *
- * condition code lookup table
- * index into the table is test code: EQ, NE, ... LT, GT, AL, NV
- *
- * bit position in short is condition code: NZCV
- */
-static const unsigned short cc_map[16] = {
-   0xF0F0, /* EQ == Z set*/
-   0x0F0F, /* NE */
-   0x, /* CS == C set*/
-   0x, /* CC */
-   0xFF00, /* MI == N set*/
-   0x00FF, /* PL */
-   0x, /* VS == V set*/
-   0x, /* VC */
-   0x0C0C, /* HI == C set && Z clear */
-   0xF3F3, /* LS == C clear || Z set */
-   0xAA55, /* GE == (N==V)   */
-   0x55AA, /* LT == (N!=V)   */
-   0x0A05, /* GT == (!Z && (N==V))   */
-   0xF5FA, /* LE == (Z || (N!=V))*/
-   0x, /* AL always  */
-   0   /* NV */
-};
-
-/*
- * Check if a trapped instruction should have been executed or not.
- */
-bool __hyp_text kvm_condition_valid32(const struct kvm_vcpu *vcpu)
-{
-   unsigned long cpsr;
-   u32 cpsr_cond;
-   int cond;
-
-   /* Top two bits non-zero?  Unconditional. */
-   if (kvm_vcpu_get_hsr(vcpu) >> 30)
-   return true;
-
-   /* Is condition field valid? */
-   cond = kvm_vcpu_get_condition(vcpu);
-   if (cond == 0xE)
-   return true;
-
-   cpsr = *vcpu_cpsr(vcpu);
-
-   if (cond < 0) {
-   /* This can happen in Thumb mode: examine IT state. */
-   unsigned long it;
-
-   it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3);
-
-   /* it == 0 => unconditional. */
-   if (it == 0)
-   return true;
-
-   /* The cond for this insn works out as the top 4 bits. */
-   cond = (it >> 4);
-   }
-
-   cpsr_cond = cpsr >> 28;
-
-   if (!((cc_map[cond] >> cpsr_cond) & 1))
-   return false;
-
-   return true;
-}
-
-/**
- * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block
- * @vcpu:  The VCPU pointer
- *
- * When exceptions occur while instructions are executed in Thumb IF-THEN
- * blocks, the ITSTATE field of the CPSR is not advanced (updated), so we have
- * to do this little bit of work manually. The fields map like this:
- *
- * IT[7:0] -> CPSR[26:25],CPSR[15:10]
- */
-static void __hyp_text kvm_adjust_itstate(struct kvm_vcpu *vcpu)
-{
-   unsigned long itbits, cond;
-   unsigned long 

[PATCH 2/3] KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation

2019-05-24 Thread Marc Zyngier
From: James Morse 

KVM's pmu.c contains the __hyp_text needed to switch the pmu registers
between host and guest. Because this isn't covered by the 'hyp' Makefile,
it can be built with kasan and friends when these are enabled in Kconfig.

When starting a guest, this results in:
| Kernel panic - not syncing: HYP panic:
| PS:a3c9 PC:8328ada0 ESR:8607
| FAR:8328ada0 HPFAR:29df5300 PAR:
| VCPU:4e10b7d6
| CPU: 0 PID: 3088 Comm: qemu-system-aar Not tainted 5.2.0-rc1 #11026
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Plat
| Call trace:
|  dump_backtrace+0x0/0x200
|  show_stack+0x20/0x30
|  dump_stack+0xec/0x158
|  panic+0x1ec/0x420
|  panic+0x0/0x420
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x002,25006082
| Memory Limit: none
| ---[ end Kernel panic - not syncing: HYP panic:

This is caused by functions in pmu.c calling the instrumented
code, which isn't mapped to hyp. From objdump -r:
| RELOCATION RECORDS FOR [.hyp.text]:
| OFFSET   TYPE  VALUE
| 0010 R_AARCH64_CALL26  __sanitizer_cov_trace_pc
| 0018 R_AARCH64_CALL26  __asan_load4_noabort
| 0024 R_AARCH64_CALL26  __asan_load4_noabort

Move the affected code to a new file under 'hyp's Makefile.

Fixes: 3d91befbb3a0 ("arm64: KVM: Enable !VHE support for :G/:H perf event 
modifiers")
Cc: Andrew Murray 
Signed-off-by: James Morse 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_host.h |  3 ---
 arch/arm64/kvm/hyp/switch.c   | 39 +++
 arch/arm64/kvm/pmu.c  | 38 --
 3 files changed, 39 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 2a8d3f8ca22c..4bcd9c1291d5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -592,9 +592,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct 
kvm_vcpu *vcpu)
 void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
 void kvm_clr_pmu_events(u32 clr);
 
-void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt);
-bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt);
-
 void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu);
 void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu);
 #else
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 22b4c335e0b2..8799e0c267d4 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -16,6 +16,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -505,6 +506,44 @@ static void __hyp_text 
__set_host_arch_workaround_state(struct kvm_vcpu *vcpu)
 #endif
 }
 
+/**
+ * Disable host events, enable guest events
+ */
+static bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt)
+{
+   struct kvm_host_data *host;
+   struct kvm_pmu_events *pmu;
+
+   host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
+   pmu = >pmu_events;
+
+   if (pmu->events_host)
+   write_sysreg(pmu->events_host, pmcntenclr_el0);
+
+   if (pmu->events_guest)
+   write_sysreg(pmu->events_guest, pmcntenset_el0);
+
+   return (pmu->events_host || pmu->events_guest);
+}
+
+/**
+ * Disable guest events, enable host events
+ */
+static void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt)
+{
+   struct kvm_host_data *host;
+   struct kvm_pmu_events *pmu;
+
+   host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
+   pmu = >pmu_events;
+
+   if (pmu->events_guest)
+   write_sysreg(pmu->events_guest, pmcntenclr_el0);
+
+   if (pmu->events_host)
+   write_sysreg(pmu->events_host, pmcntenset_el0);
+}
+
 /* Switch to the guest for VHE systems running in EL2 */
 int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 3da94a5bb6b7..e71d00bb5271 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -53,44 +53,6 @@ void kvm_clr_pmu_events(u32 clr)
ctx->pmu_events.events_guest &= ~clr;
 }
 
-/**
- * Disable host events, enable guest events
- */
-bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt)
-{
-   struct kvm_host_data *host;
-   struct kvm_pmu_events *pmu;
-
-   host = container_of(host_ctxt, struct kvm_host_data, host_ctxt);
-   pmu = >pmu_events;
-
-   if (pmu->events_host)
-   write_sysreg(pmu->events_host, pmcntenclr_el0);
-
-   if (pmu->events_guest)
-   write_sysreg(pmu->events_guest, pmcntenset_el0);
-
-   return (pmu->events_host || pmu->events_guest);
-}
-
-/**
- * Disable guest events, enable host events
- */
-void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt)
-{
-   struct kvm_host_data *host;
-   struct kvm_pmu_events *pmu;
-
-   host = 

Re: [PATCH v2 05/15] arm64: KVM: add access handler for SPE system registers

2019-05-24 Thread Sudeep Holla
On Fri, May 24, 2019 at 12:36:24PM +0100, Julien Thierry wrote:
> Hi Sudeep,
> 
> On 23/05/2019 11:34, Sudeep Holla wrote:
> > SPE Profiling Buffer owning EL is configurable and when MDCR_EL2.E2PB
> > is configured to provide buffer ownership to EL1, the control registers
> > are trapped.
> > 
> > Add access handlers for the Statistical Profiling Extension(SPE)
> > Profiling Buffer controls registers. This is need to support profiling
> > using SPE in the guests.
> > 
> > Signed-off-by: Sudeep Holla 
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 13 
> >  arch/arm64/kvm/sys_regs.c | 35 +++
> >  include/kvm/arm_spe.h | 15 +
> >  3 files changed, 63 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 611a4884fb6c..559aa6931291 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -147,6 +147,19 @@ enum vcpu_sysreg {
> > MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */
> > DISR_EL1,   /* Deferred Interrupt Status Register */
> >  
> > +   /* Statistical Profiling Extension Registers */
> > +
> > +   PMSCR_EL1,
> > +   PMSICR_EL1,
> > +   PMSIRR_EL1,
> > +   PMSFCR_EL1,
> > +   PMSEVFR_EL1,
> > +   PMSLATFR_EL1,
> > +   PMSIDR_EL1,
> > +   PMBLIMITR_EL1,
> > +   PMBPTR_EL1,
> > +   PMBSR_EL1,
> > +
> > /* Performance Monitors Registers */
> > PMCR_EL0,   /* Control Register */
> > PMSELR_EL0, /* Event Counter Selection Register */
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 857b226bcdde..dbf5056828d3 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -646,6 +646,30 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const 
> > struct sys_reg_desc *r)
> > __vcpu_sys_reg(vcpu, PMCR_EL0) = val;
> >  }
> >  
> > +static bool access_pmsb_val(struct kvm_vcpu *vcpu, struct sys_reg_params 
> > *p,
> > +   const struct sys_reg_desc *r)
> > +{
> > +   if (p->is_write)
> > +   vcpu_write_sys_reg(vcpu, p->regval, r->reg);
> > +   else
> > +   p->regval = vcpu_read_sys_reg(vcpu, r->reg);
> > +
> > +   return true;
> > +}
> > +
> > +static void reset_pmsb_val(struct kvm_vcpu *vcpu, const struct 
> > sys_reg_desc *r)
> > +{
> > +   if (!kvm_arm_support_spe_v1()) {
> > +   __vcpu_sys_reg(vcpu, r->reg) = 0;
> > +   return;
> > +   }
> > +
> > +   if (r->reg == PMSIDR_EL1)
> 
> If only PMSIDR_EL1 has a non-zero reset value, it feels a bit weird to
> share the reset function for all these registers.
>

Ah, right. Initially I did have couple of other registers which were not
needed. So I removed them without observing that I could have just used
reset_val(0) for all except PMSIDR_EL1.

> I would suggest only having a reset_pmsidr() function, and just use
> reset_val() with sys_reg_desc->val set to 0 for all the others.
>

Thanks for pointing this out.

--
Regards,
Sudeep


Re: [PATCH] MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer

2019-05-24 Thread Marc Zyngier
Hi Christoffer,

On 21/05/2019 14:25, Christoffer Dall wrote:
> I no longer have time to actively review patches and manage the tree and
> it's time to make that official.
> 
> Huge thanks to the incredible Linux community and all the contributors
> who have put up with me over the past years.
> 
> I also take this opportunity to remove the website link to the Columbia
> web page, as that information is no longer up to date and I don't know
> who manages that anymore.
> 
> Signed-off-by: Christoffer Dall 
> ---
>  MAINTAINERS | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5cfbea4ce575..4ba271a8e0ef 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8611,14 +8611,12 @@ F:arch/x86/include/asm/svm.h
>  F:   arch/x86/kvm/svm.c
>  
>  KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64)
> -M:   Christoffer Dall 
>  M:   Marc Zyngier 
>  R:   James Morse 
>  R:   Julien Thierry 
>  R:   Suzuki K Pouloze 
>  L:   linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
>  L:   kvmarm@lists.cs.columbia.edu
> -W:   http://systems.cs.columbia.edu/projects/kvm-arm
>  T:   git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
>  S:   Maintained
>  F:   arch/arm/include/uapi/asm/kvm*
> 

With regrets: applied as a fix for 5.2.

Thanks *a lot* for all the great work you've done over the years, you've
been an awesome co-maintainer. Do remember that we know where to find
you, though! ;-)

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: mm/compaction: BUG: NULL pointer dereference

2019-05-24 Thread Anshuman Khandual



On 05/24/2019 06:00 PM, Mel Gorman wrote:
> On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote:
>>
>>
>> On 05/24/2019 02:50 PM, Suzuki K Poulose wrote:
>>> Hi,
>>>
>>> We are hitting NULL pointer dereferences while running stress tests with 
>>> KVM.
>>> See splat [0]. The test is to spawn 100 VMs all doing standard debian
>>> installation (Thanks to Marc's automated scripts, available here [1] ).
>>> The problem has been reproduced with a better rate of success from 5.1-rc6
>>> onwards.
>>>
>>> The issue is only reproducible with swapping enabled and the entire
>>> memory is used up, when swapping heavily. Also this issue is only 
>>> reproducible
>>> on only one server with 128GB, which has the following memory layout:
>>>
>>> [32GB@4GB, hole , 96GB@544GB]
>>>
>>> Here is my non-expert analysis of the issue so far.
>>>
>>> Under extreme memory pressure, the kswapd could trigger 
>>> reset_isolation_suitable()
>>> to figure out the cached values for migrate/free pfn for a zone, by 
>>> scanning through
>>> the entire zone. On our server it does so in the range of [ 0x10_, 
>>> 0xa00_ ],
>>> with the following area of holes : [ 0x20_, 0x880_ ].
>>> In the failing case, we end up setting the cached migrate pfn as : 
>>> 0x508_, which
>>> is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ 
>>> ) / 2,
>>> with reset_migrate = 0x88_4e00, reset_free = 0x10_.
>>>
>>> Now these cached values are used by the fast_isolate_freepages() to find a 
>>> pfn. However,
>>> since we cant find anything during the search we fall back to using the 
>>> page belonging
>>> to the min_pfn (which is the migrate_pfn), without proper checks to see if 
>>> that is valid
>>> PFN or not. This is then passed on to fast_isolate_around() which tries to 
>>> do :
>>> set_pageblock_skip(page) on the page which blows up due to an NULL 
>>> mem_section pointer.
>>>
>>> The following patch seems to fix the issue for me, but I am not quite 
>>> convinced that
>>> it is the right fix. Thoughts ?
>>>
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 9febc8c..9e1b9ac 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
>>> page = pfn_to_page(highest);
>>> cc->free_pfn = highest;
>>> } else {
>>> -   if (cc->direct_compaction) {
>>> +   if (cc->direct_compaction && 
>>> pfn_valid(min_pfn)) {
>>> page = pfn_to_page(min_pfn);
>>
>> pfn_to_online_page() here would be better as it does not add pfn_valid() 
>> cost on
>> architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But 
>> regardless if
>> the compaction is trying to scan pfns in zone holes, then it should be 
>> avoided.
> 
> CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch
> punches holes within a section. As both do a section lookup, the cost is
> similar but pfn_valid in general is less subtle in this case. Normally
> pfn_valid_within is only ok when a pfn_valid check has been made on the
> max_order aligned range as well as a zone boundary check. In this case,
> it's much more straight-forward to leave it as pfn_valid.

Sure, makes sense.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: mm/compaction: BUG: NULL pointer dereference

2019-05-24 Thread Mel Gorman
On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote:
> 
> 
> On 05/24/2019 02:50 PM, Suzuki K Poulose wrote:
> > Hi,
> > 
> > We are hitting NULL pointer dereferences while running stress tests with 
> > KVM.
> > See splat [0]. The test is to spawn 100 VMs all doing standard debian
> > installation (Thanks to Marc's automated scripts, available here [1] ).
> > The problem has been reproduced with a better rate of success from 5.1-rc6
> > onwards.
> > 
> > The issue is only reproducible with swapping enabled and the entire
> > memory is used up, when swapping heavily. Also this issue is only 
> > reproducible
> > on only one server with 128GB, which has the following memory layout:
> > 
> > [32GB@4GB, hole , 96GB@544GB]
> > 
> > Here is my non-expert analysis of the issue so far.
> > 
> > Under extreme memory pressure, the kswapd could trigger 
> > reset_isolation_suitable()
> > to figure out the cached values for migrate/free pfn for a zone, by 
> > scanning through
> > the entire zone. On our server it does so in the range of [ 0x10_, 
> > 0xa00_ ],
> > with the following area of holes : [ 0x20_, 0x880_ ].
> > In the failing case, we end up setting the cached migrate pfn as : 
> > 0x508_, which
> > is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ 
> > ) / 2,
> > with reset_migrate = 0x88_4e00, reset_free = 0x10_.
> > 
> > Now these cached values are used by the fast_isolate_freepages() to find a 
> > pfn. However,
> > since we cant find anything during the search we fall back to using the 
> > page belonging
> > to the min_pfn (which is the migrate_pfn), without proper checks to see if 
> > that is valid
> > PFN or not. This is then passed on to fast_isolate_around() which tries to 
> > do :
> > set_pageblock_skip(page) on the page which blows up due to an NULL 
> > mem_section pointer.
> > 
> > The following patch seems to fix the issue for me, but I am not quite 
> > convinced that
> > it is the right fix. Thoughts ?
> > 
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 9febc8c..9e1b9ac 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
> > page = pfn_to_page(highest);
> > cc->free_pfn = highest;
> > } else {
> > -   if (cc->direct_compaction) {
> > +   if (cc->direct_compaction && 
> > pfn_valid(min_pfn)) {
> > page = pfn_to_page(min_pfn);
> 
> pfn_to_online_page() here would be better as it does not add pfn_valid() cost 
> on
> architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But 
> regardless if
> the compaction is trying to scan pfns in zone holes, then it should be 
> avoided.

CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch
punches holes within a section. As both do a section lookup, the cost is
similar but pfn_valid in general is less subtle in this case. Normally
pfn_valid_within is only ok when a pfn_valid check has been made on the
max_order aligned range as well as a zone boundary check. In this case,
it's much more straight-forward to leave it as pfn_valid.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH v2 12/15] KVM: arm64: add a new vcpu device control group for SPEv1

2019-05-24 Thread Marc Zyngier
On 24/05/2019 12:21, Sudeep Holla wrote:
> On Fri, May 24, 2019 at 11:37:51AM +0100, Marc Zyngier wrote:
>> Hi Sudeep,
>>
>> On 23/05/2019 11:34, Sudeep Holla wrote:
>>> To configure the virtual SPEv1 overflow interrupt number, we use the
>>> vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
>>> attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.
>>>
>>> After configuring the SPEv1, call the vcpu ioctl with attribute
>>> KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.
>>>
>>> Signed-off-by: Sudeep Holla 
>>> ---
>>>  Documentation/virtual/kvm/devices/vcpu.txt |  28 
>>>  arch/arm64/include/asm/kvm_host.h  |   2 +-
>>>  arch/arm64/include/uapi/asm/kvm.h  |   4 +
>>>  arch/arm64/kvm/Makefile|   1 +
>>>  arch/arm64/kvm/guest.c |   9 ++
>>>  arch/arm64/kvm/reset.c |   3 +
>>>  include/kvm/arm_spe.h  |  35 +
>>>  include/uapi/linux/kvm.h   |   1 +
>>>  virt/kvm/arm/arm.c |   1 +
>>>  virt/kvm/arm/spe.c | 163 +
>>>  10 files changed, 246 insertions(+), 1 deletion(-)
>>>  create mode 100644 virt/kvm/arm/spe.c
>>>
>>> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt 
>>> b/Documentation/virtual/kvm/devices/vcpu.txt
>>> index 2b5dab16c4f2..d1ece488aeee 100644
>>> --- a/Documentation/virtual/kvm/devices/vcpu.txt
>>> +++ b/Documentation/virtual/kvm/devices/vcpu.txt
>>> @@ -60,3 +60,31 @@ time to use the number provided for a given timer, 
>>> overwriting any previously
>>>  configured values on other VCPUs.  Userspace should configure the interrupt
>>>  numbers on at least one VCPU after creating all VCPUs and before running 
>>> any
>>>  VCPUs.
>>> +
>>> +3. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
>>> +Architectures: ARM64
>>> +
>>> +1.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
>>> +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow 
>>> interrupt
>>> +   is a pointer to an int
>>> +Returns: -EBUSY: The SPE overflow interrupt is already set
>>> + -ENXIO: The overflow interrupt not set when attempting to get it
>>> + -ENODEV: SPEv1 not supported
>>> + -EINVAL: Invalid SPE overflow interrupt number supplied or
>>> +  trying to set the IRQ number without using an in-kernel
>>> +  irqchip.
>>> +
>>> +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
>>> +interrupt number for this vcpu. This interrupt should be PPI and the 
>>> interrupt
>>> +type and number must be same for each vcpu.
>>> +
>>> +1.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
>>> +Parameters: no additional parameter in kvm_device_attr.addr
>>> +Returns: -ENODEV: SPEv1 not supported or GIC not initialized
>>> + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
>>> + configured as required prior to calling this attribute
>>> + -EBUSY: SPEv1 already initialized
>>> +
>>> +Request the initialization of the SPEv1.  If using the SPEv1 with an 
>>> in-kernel
>>> +virtual GIC implementation, this must be done after initializing the 
>>> in-kernel
>>> +irqchip.
>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index 6921fdfd477b..fc4ead0774b3 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -50,7 +50,7 @@
>>>  
>>>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>>  
>>> -#define KVM_VCPU_MAX_FEATURES 7
>>> +#define KVM_VCPU_MAX_FEATURES 8
>>>  
>>>  #define KVM_REQ_SLEEP \
>>> KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
>>> b/arch/arm64/include/uapi/asm/kvm.h
>>> index 7b7ac0f6cec9..4c9e168de896 100644
>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>> @@ -106,6 +106,7 @@ struct kvm_regs {
>>>  #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
>>>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address 
>>> authentication */
>>>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic 
>>> authentication */
>>> +#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */
>>>  
>>>  struct kvm_vcpu_init {
>>> __u32 target;
>>> @@ -306,6 +307,9 @@ struct kvm_vcpu_events {
>>>  #define KVM_ARM_VCPU_TIMER_CTRL1
>>>  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
>>>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
>>> +#define KVM_ARM_VCPU_SPE_V1_CTRL   2
>>> +#define   KVM_ARM_VCPU_SPE_V1_IRQ  0
>>> +#define   KVM_ARM_VCPU_SPE_V1_INIT 1
>>>  
>>>  /* KVM_IRQ_LINE irq field index values */
>>>  #define KVM_ARM_IRQ_TYPE_SHIFT 24
>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>>> index 3ac1a64d2fb9..1ba6154dd8e1 100644
>>> --- a/arch/arm64/kvm/Makefile
>>> +++ b/arch/arm64/kvm/Makefile
>>> @@ -35,3 +35,4 @@ 

Re: [PATCH v2 12/15] KVM: arm64: add a new vcpu device control group for SPEv1

2019-05-24 Thread Sudeep Holla
On Fri, May 24, 2019 at 11:37:51AM +0100, Marc Zyngier wrote:
> Hi Sudeep,
> 
> On 23/05/2019 11:34, Sudeep Holla wrote:
> > To configure the virtual SPEv1 overflow interrupt number, we use the
> > vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
> > attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.
> > 
> > After configuring the SPEv1, call the vcpu ioctl with attribute
> > KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.
> > 
> > Signed-off-by: Sudeep Holla 
> > ---
> >  Documentation/virtual/kvm/devices/vcpu.txt |  28 
> >  arch/arm64/include/asm/kvm_host.h  |   2 +-
> >  arch/arm64/include/uapi/asm/kvm.h  |   4 +
> >  arch/arm64/kvm/Makefile|   1 +
> >  arch/arm64/kvm/guest.c |   9 ++
> >  arch/arm64/kvm/reset.c |   3 +
> >  include/kvm/arm_spe.h  |  35 +
> >  include/uapi/linux/kvm.h   |   1 +
> >  virt/kvm/arm/arm.c |   1 +
> >  virt/kvm/arm/spe.c | 163 +
> >  10 files changed, 246 insertions(+), 1 deletion(-)
> >  create mode 100644 virt/kvm/arm/spe.c
> > 
> > diff --git a/Documentation/virtual/kvm/devices/vcpu.txt 
> > b/Documentation/virtual/kvm/devices/vcpu.txt
> > index 2b5dab16c4f2..d1ece488aeee 100644
> > --- a/Documentation/virtual/kvm/devices/vcpu.txt
> > +++ b/Documentation/virtual/kvm/devices/vcpu.txt
> > @@ -60,3 +60,31 @@ time to use the number provided for a given timer, 
> > overwriting any previously
> >  configured values on other VCPUs.  Userspace should configure the interrupt
> >  numbers on at least one VCPU after creating all VCPUs and before running 
> > any
> >  VCPUs.
> > +
> > +3. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
> > +Architectures: ARM64
> > +
> > +1.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
> > +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow 
> > interrupt
> > +   is a pointer to an int
> > +Returns: -EBUSY: The SPE overflow interrupt is already set
> > + -ENXIO: The overflow interrupt not set when attempting to get it
> > + -ENODEV: SPEv1 not supported
> > + -EINVAL: Invalid SPE overflow interrupt number supplied or
> > +  trying to set the IRQ number without using an in-kernel
> > +  irqchip.
> > +
> > +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
> > +interrupt number for this vcpu. This interrupt should be PPI and the 
> > interrupt
> > +type and number must be same for each vcpu.
> > +
> > +1.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
> > +Parameters: no additional parameter in kvm_device_attr.addr
> > +Returns: -ENODEV: SPEv1 not supported or GIC not initialized
> > + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
> > + configured as required prior to calling this attribute
> > + -EBUSY: SPEv1 already initialized
> > +
> > +Request the initialization of the SPEv1.  If using the SPEv1 with an 
> > in-kernel
> > +virtual GIC implementation, this must be done after initializing the 
> > in-kernel
> > +irqchip.
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 6921fdfd477b..fc4ead0774b3 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -50,7 +50,7 @@
> >  
> >  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
> >  
> > -#define KVM_VCPU_MAX_FEATURES 7
> > +#define KVM_VCPU_MAX_FEATURES 8
> >  
> >  #define KVM_REQ_SLEEP \
> > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 7b7ac0f6cec9..4c9e168de896 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -106,6 +106,7 @@ struct kvm_regs {
> >  #define KVM_ARM_VCPU_SVE   4 /* enable SVE for this CPU */
> >  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS   5 /* VCPU uses address 
> > authentication */
> >  #define KVM_ARM_VCPU_PTRAUTH_GENERIC   6 /* VCPU uses generic 
> > authentication */
> > +#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */
> >  
> >  struct kvm_vcpu_init {
> > __u32 target;
> > @@ -306,6 +307,9 @@ struct kvm_vcpu_events {
> >  #define KVM_ARM_VCPU_TIMER_CTRL1
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
> > +#define KVM_ARM_VCPU_SPE_V1_CTRL   2
> > +#define   KVM_ARM_VCPU_SPE_V1_IRQ  0
> > +#define   KVM_ARM_VCPU_SPE_V1_INIT 1
> >  
> >  /* KVM_IRQ_LINE irq field index values */
> >  #define KVM_ARM_IRQ_TYPE_SHIFT 24
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index 3ac1a64d2fb9..1ba6154dd8e1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -35,3 +35,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += 

Re: mm/compaction: BUG: NULL pointer dereference

2019-05-24 Thread Anshuman Khandual



On 05/24/2019 02:50 PM, Suzuki K Poulose wrote:
> Hi,
> 
> We are hitting NULL pointer dereferences while running stress tests with KVM.
> See splat [0]. The test is to spawn 100 VMs all doing standard debian
> installation (Thanks to Marc's automated scripts, available here [1] ).
> The problem has been reproduced with a better rate of success from 5.1-rc6
> onwards.
> 
> The issue is only reproducible with swapping enabled and the entire
> memory is used up, when swapping heavily. Also this issue is only reproducible
> on only one server with 128GB, which has the following memory layout:
> 
> [32GB@4GB, hole , 96GB@544GB]
> 
> Here is my non-expert analysis of the issue so far.
> 
> Under extreme memory pressure, the kswapd could trigger 
> reset_isolation_suitable()
> to figure out the cached values for migrate/free pfn for a zone, by scanning 
> through
> the entire zone. On our server it does so in the range of [ 0x10_, 
> 0xa00_ ],
> with the following area of holes : [ 0x20_, 0x880_ ].
> In the failing case, we end up setting the cached migrate pfn as : 
> 0x508_, which
> is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) 
> / 2,
> with reset_migrate = 0x88_4e00, reset_free = 0x10_.
> 
> Now these cached values are used by the fast_isolate_freepages() to find a 
> pfn. However,
> since we cant find anything during the search we fall back to using the page 
> belonging
> to the min_pfn (which is the migrate_pfn), without proper checks to see if 
> that is valid
> PFN or not. This is then passed on to fast_isolate_around() which tries to do 
> :
> set_pageblock_skip(page) on the page which blows up due to an NULL 
> mem_section pointer.
> 
> The following patch seems to fix the issue for me, but I am not quite 
> convinced that
> it is the right fix. Thoughts ?
> 
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9febc8c..9e1b9ac 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
>   page = pfn_to_page(highest);
>   cc->free_pfn = highest;
>   } else {
> - if (cc->direct_compaction) {
> + if (cc->direct_compaction && 
> pfn_valid(min_pfn)) {
>   page = pfn_to_page(min_pfn);

pfn_to_online_page() here would be better as it does not add pfn_valid() cost on
architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless 
if
the compaction is trying to scan pfns in zone holes, then it should be avoided.


Re: mm/compaction: BUG: NULL pointer dereference

2019-05-24 Thread Suzuki K Poulose

Hi Mel,

Thanks for your quick response.

On 24/05/2019 11:39, Mel Gorman wrote:

On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote:

Hi,

We are hitting NULL pointer dereferences while running stress tests with KVM.
See splat [0]. The test is to spawn 100 VMs all doing standard debian
installation (Thanks to Marc's automated scripts, available here [1] ).
The problem has been reproduced with a better rate of success from 5.1-rc6
onwards.

The issue is only reproducible with swapping enabled and the entire
memory is used up, when swapping heavily. Also this issue is only reproducible
on only one server with 128GB, which has the following memory layout:

[32GB@4GB, hole , 96GB@544GB]

Here is my non-expert analysis of the issue so far.

Under extreme memory pressure, the kswapd could trigger 
reset_isolation_suitable()
to figure out the cached values for migrate/free pfn for a zone, by scanning 
through
the entire zone. On our server it does so in the range of [ 0x10_, 
0xa00_ ],
with the following area of holes : [ 0x20_, 0x880_ ].
In the failing case, we end up setting the cached migrate pfn as : 0x508_, 
which
is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) / 
2,
with reset_migrate = 0x88_4e00, reset_free = 0x10_.

Now these cached values are used by the fast_isolate_freepages() to find a pfn. 
However,
since we cant find anything during the search we fall back to using the page 
belonging
to the min_pfn (which is the migrate_pfn), without proper checks to see if that 
is valid
PFN or not. This is then passed on to fast_isolate_around() which tries to do :
set_pageblock_skip(page) on the page which blows up due to an NULL mem_section 
pointer.

The following patch seems to fix the issue for me, but I am not quite convinced 
that
it is the right fix. Thoughts ?



I think the patch is valid and the alternatives would be unnecessarily
complicated. During a normal scan for free pages to isolate, there
is a check for pageblock_pfn_to_page() which uses a pfn_valid check
for non-contiguous zones in __pageblock_pfn_to_page. Now, while the


I had the initial version with the pageblock_pfn_to_page(), but as you said,
it is a complicated way of perform the same check as pfn_valid().


non-contiguous check could be made in the area you highlight, it would be a
relatively small optimisation that would be unmeasurable overall. However,
it is definitely the case that if the PFN you highlight is invalid that
badness happens. If you want to express this as a signed-off patch with
an adjusted changelog then I'd be happy to add


Sure, will send it right away.



Reviewed-by: Mel Gorman 



Thanks.

Suzuki


Re: mm/compaction: BUG: NULL pointer dereference

2019-05-24 Thread Mel Gorman
On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote:
> Hi,
> 
> We are hitting NULL pointer dereferences while running stress tests with KVM.
> See splat [0]. The test is to spawn 100 VMs all doing standard debian
> installation (Thanks to Marc's automated scripts, available here [1] ).
> The problem has been reproduced with a better rate of success from 5.1-rc6
> onwards.
> 
> The issue is only reproducible with swapping enabled and the entire
> memory is used up, when swapping heavily. Also this issue is only reproducible
> on only one server with 128GB, which has the following memory layout:
> 
> [32GB@4GB, hole , 96GB@544GB]
> 
> Here is my non-expert analysis of the issue so far.
> 
> Under extreme memory pressure, the kswapd could trigger 
> reset_isolation_suitable()
> to figure out the cached values for migrate/free pfn for a zone, by scanning 
> through
> the entire zone. On our server it does so in the range of [ 0x10_, 
> 0xa00_ ],
> with the following area of holes : [ 0x20_, 0x880_ ].
> In the failing case, we end up setting the cached migrate pfn as : 
> 0x508_, which
> is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) 
> / 2,
> with reset_migrate = 0x88_4e00, reset_free = 0x10_.
> 
> Now these cached values are used by the fast_isolate_freepages() to find a 
> pfn. However,
> since we cant find anything during the search we fall back to using the page 
> belonging
> to the min_pfn (which is the migrate_pfn), without proper checks to see if 
> that is valid
> PFN or not. This is then passed on to fast_isolate_around() which tries to do 
> :
> set_pageblock_skip(page) on the page which blows up due to an NULL 
> mem_section pointer.
> 
> The following patch seems to fix the issue for me, but I am not quite 
> convinced that
> it is the right fix. Thoughts ?
> 

I think the patch is valid and the alternatives would be unnecessarily
complicated. During a normal scan for free pages to isolate, there
is a check for pageblock_pfn_to_page() which uses a pfn_valid check
for non-contiguous zones in __pageblock_pfn_to_page. Now, while the
non-contiguous check could be made in the area you highlight, it would be a
relatively small optimisation that would be unmeasurable overall. However,
it is definitely the case that if the PFN you highlight is invalid that
badness happens. If you want to express this as a signed-off patch with
an adjusted changelog then I'd be happy to add

Reviewed-by: Mel Gorman 

If you are not comfortable with rewriting the changelog and formatting
it as a patch then I can do it on your behalf and preserve your
Signed-off-by. Just let me know.

Thanks for researching this, I think it also applies to other people but
had not found the time to track it down.

-- 
Mel Gorman
SUSE Labs


Re: [PATCH v2 12/15] KVM: arm64: add a new vcpu device control group for SPEv1

2019-05-24 Thread Marc Zyngier
Hi Sudeep,

On 23/05/2019 11:34, Sudeep Holla wrote:
> To configure the virtual SPEv1 overflow interrupt number, we use the
> vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
> attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.
> 
> After configuring the SPEv1, call the vcpu ioctl with attribute
> KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.
> 
> Signed-off-by: Sudeep Holla 
> ---
>  Documentation/virtual/kvm/devices/vcpu.txt |  28 
>  arch/arm64/include/asm/kvm_host.h  |   2 +-
>  arch/arm64/include/uapi/asm/kvm.h  |   4 +
>  arch/arm64/kvm/Makefile|   1 +
>  arch/arm64/kvm/guest.c |   9 ++
>  arch/arm64/kvm/reset.c |   3 +
>  include/kvm/arm_spe.h  |  35 +
>  include/uapi/linux/kvm.h   |   1 +
>  virt/kvm/arm/arm.c |   1 +
>  virt/kvm/arm/spe.c | 163 +
>  10 files changed, 246 insertions(+), 1 deletion(-)
>  create mode 100644 virt/kvm/arm/spe.c
> 
> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt 
> b/Documentation/virtual/kvm/devices/vcpu.txt
> index 2b5dab16c4f2..d1ece488aeee 100644
> --- a/Documentation/virtual/kvm/devices/vcpu.txt
> +++ b/Documentation/virtual/kvm/devices/vcpu.txt
> @@ -60,3 +60,31 @@ time to use the number provided for a given timer, 
> overwriting any previously
>  configured values on other VCPUs.  Userspace should configure the interrupt
>  numbers on at least one VCPU after creating all VCPUs and before running any
>  VCPUs.
> +
> +3. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
> +Architectures: ARM64
> +
> +1.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
> +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow 
> interrupt
> + is a pointer to an int
> +Returns: -EBUSY: The SPE overflow interrupt is already set
> + -ENXIO: The overflow interrupt not set when attempting to get it
> + -ENODEV: SPEv1 not supported
> + -EINVAL: Invalid SPE overflow interrupt number supplied or
> +  trying to set the IRQ number without using an in-kernel
> +  irqchip.
> +
> +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
> +interrupt number for this vcpu. This interrupt should be PPI and the 
> interrupt
> +type and number must be same for each vcpu.
> +
> +1.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
> +Parameters: no additional parameter in kvm_device_attr.addr
> +Returns: -ENODEV: SPEv1 not supported or GIC not initialized
> + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
> + configured as required prior to calling this attribute
> + -EBUSY: SPEv1 already initialized
> +
> +Request the initialization of the SPEv1.  If using the SPEv1 with an 
> in-kernel
> +virtual GIC implementation, this must be done after initializing the 
> in-kernel
> +irqchip.
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 6921fdfd477b..fc4ead0774b3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -50,7 +50,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>   KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 7b7ac0f6cec9..4c9e168de896 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_SPE_V1  7 /* Support guest SPEv1 */
>  
>  struct kvm_vcpu_init {
>   __u32 target;
> @@ -306,6 +307,9 @@ struct kvm_vcpu_events {
>  #define KVM_ARM_VCPU_TIMER_CTRL  1
>  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER  0
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER  1
> +#define KVM_ARM_VCPU_SPE_V1_CTRL 2
> +#define   KVM_ARM_VCPU_SPE_V1_IRQ0
> +#define   KVM_ARM_VCPU_SPE_V1_INIT   1
>  
>  /* KVM_IRQ_LINE irq field index values */
>  #define KVM_ARM_IRQ_TYPE_SHIFT   24
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 3ac1a64d2fb9..1ba6154dd8e1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -35,3 +35,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> +kvm-$(CONFIG_KVM_ARM_SPE) += $(KVM)/arm/spe.o
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 

mm/compaction: BUG: NULL pointer dereference

2019-05-24 Thread Suzuki K Poulose
Hi,

We are hitting NULL pointer dereferences while running stress tests with KVM.
See splat [0]. The test is to spawn 100 VMs all doing standard debian
installation (Thanks to Marc's automated scripts, available here [1] ).
The problem has been reproduced with a better rate of success from 5.1-rc6
onwards.

The issue is only reproducible with swapping enabled and the entire
memory is used up, when swapping heavily. Also this issue is only reproducible
on only one server with 128GB, which has the following memory layout:

[32GB@4GB, hole , 96GB@544GB]

Here is my non-expert analysis of the issue so far.

Under extreme memory pressure, the kswapd could trigger 
reset_isolation_suitable()
to figure out the cached values for migrate/free pfn for a zone, by scanning 
through
the entire zone. On our server it does so in the range of [ 0x10_, 
0xa00_ ],
with the following area of holes : [ 0x20_, 0x880_ ].
In the failing case, we end up setting the cached migrate pfn as : 0x508_, 
which
is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) / 
2,
with reset_migrate = 0x88_4e00, reset_free = 0x10_.

Now these cached values are used by the fast_isolate_freepages() to find a pfn. 
However,
since we cant find anything during the search we fall back to using the page 
belonging
to the min_pfn (which is the migrate_pfn), without proper checks to see if that 
is valid
PFN or not. This is then passed on to fast_isolate_around() which tries to do :
set_pageblock_skip(page) on the page which blows up due to an NULL mem_section 
pointer.

The following patch seems to fix the issue for me, but I am not quite convinced 
that
it is the right fix. Thoughts ?


diff --git a/mm/compaction.c b/mm/compaction.c
index 9febc8c..9e1b9ac 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc)
page = pfn_to_page(highest);
cc->free_pfn = highest;
} else {
-   if (cc->direct_compaction) {
+   if (cc->direct_compaction && 
pfn_valid(min_pfn)) {
page = pfn_to_page(min_pfn);
cc->free_pfn = min_pfn;
}


Suzuki


[ 0 ] Kernel splat
 Unable to handle kernel NULL pointer dereference at virtual address 
0008 [47/1825]
 Mem abort info:
   ESR = 0x9604
   Exception class = DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
 user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9
 [0008] pgd=
 Internal error: Oops: 9604 [#1] SMP
 ...
 CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 
09/25/2018
 pstate: 6005 (nZCv daif -PAN -UAO)
 pc : set_pfnblock_flags_mask+0x58/0xe8
 lr : compaction_alloc+0x300/0x950
 sp : 1fc03010
 x29: 1fc03010 x28:  
 x27:  x26: 10bf7000 
 x25: 06445000 x24: 06444e00 
 x23: 7e018f138000 x22: 0003 
 x21: 0001 x20: 06444e00 
 x19: 0001 x18:  
 x17:  x16: 809f7fe97268 
 x15: 000191138000 x14:  
 x13: 0070 x12:  
 x11: 1fc03108 x10:  
 x9 : 09222400 x8 : 0187 
 x7 : 063c4e00 x6 : 06444e00 
 x5 : 0008 x4 : 0001 
 x3 : 0003 x2 : 809f7fe92840 
 x1 : 0220 x0 :  
 Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5)
 Call trace:
  set_pfnblock_flags_mask+0x58/0xe8
  compaction_alloc+0x300/0x950
  migrate_pages+0x1a4/0xbb0
  compact_zone+0x750/0xde8
  compact_zone_order+0xd8/0x118
  try_to_compact_pages+0xb4/0x290
  __alloc_pages_direct_compact+0x84/0x1e0
  __alloc_pages_nodemask+0x5e0/0xe18
  alloc_pages_vma+0x1cc/0x210
  do_huge_pmd_anonymous_page+0x108/0x7c8
  __handle_mm_fault+0xdd4/0x1190
  handle_mm_fault+0x114/0x1c0
  __get_user_pages+0x198/0x3c0
  get_user_pages_unlocked+0xb4/0x1d8
  __gfn_to_pfn_memslot+0x12c/0x3b8
  gfn_to_pfn_prot+0x4c/0x60
  kvm_handle_guest_abort+0x4b0/0xcd8
  handle_exit+0x140/0x1b8
  kvm_arch_vcpu_ioctl_run+0x260/0x768
  kvm_vcpu_ioctl+0x490/0x898
  do_vfs_ioctl+0xc4/0x898
  ksys_ioctl+0x8c/0xa0
  __arm64_sys_ioctl+0x28/0x38
  el0_svc_common+0x74/0x118
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc
 Code: f8607840 f11f 8b011401 9a801020 (f9400400) 
 ---[ end trace af6a35219325a9b6 ]---


[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/vminstall.git/
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu