[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2023-04-11 Thread norman shen
Thank you very much for the reply. Another question is try_get_page
returns -ENOMEM but kvm warning is bad address which should be EFAULT.
Why qemu prints error log says bad address?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  vcpu_enter_guest+0x414/0x1260 [kvm]
  [  167.696138]  ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
  [  167.696148]  kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696157]  ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696165]  kvm_vcpu_ioctl+0x33a/0x610 [kvm]
  [  167.696166]  ? do_futex+0x129/0x590
  [  167.696171]  ? __switch_to+0x34c/0x4e0
  [  167.696174]  ? __switch_to_asm+0x35/0x70
  [  167.696176]  do_vfs_ioctl+0xa4/0x600
  [  167.696177]  SyS_ioctl+0x79/0x90
  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2021-05-17 Thread Matthew Ruffell
Hi Jiatong,

Thanks for emailing me, happy to answer questions anytime.

> 1. why linux-hwe-4.15.0 source code is used?

If you look closely at the oops in the description, the customer I was
working with was running:

4.15.0-106-generic #107~16.04.1-Ubuntu
 
This is the Xenial (16.04) HWE kernel. I was using the linux-hwe-4.15.0 source 
code to make sure the debug symbols used for the debug symbol package matched 
exactly.

In your case:

4.15.0-72-generic #81-Ubuntu

you are running the 4.15 kernel on normal Bionic (18.04), so we can use
the normal linux-4.15.0 source code.

> 2. we are using linux-4.15.0-unsigned and by skimming through the
source code, looks like try_get_page is not defined at that time?

Yes! You are correct, the original mainline 4.15 kernel did not have
try_get_page() defined at:

https://elixir.bootlin.com/linux/v4.15/source/mm/gup.c#L156

But if you look closely at the actual kernel sources for
4.15.0-72-generic:

https://git.launchpad.net/~ubuntu-
kernel/ubuntu/+source/linux/+git/bionic/tree/mm/gup.c?h=Ubuntu-4.15.0-72.81#n156

We see that try_get_page() is there. That is because we backported:

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64
Author: Linus Torvalds 
Date:   Thu Apr 11 10:49:19 2019 -0700
Subject: mm: prevent get_user_pages() from overflowing page refcount
Link:https://github.com/torvalds/linux/commit/8fde12ca79aff9b5ba951fce1a2641901b8d8e64

Ubuntu 4.15 backport link: https://paste.ubuntu.com/p/2bF5WWQy2r/

That commit first turned up in 4.15.0-59-generic, via upstream-stable.

Anyway, let's have a look at your stack trace:

4.15.0-72-generic #81-Ubuntu
RIP: 0010:follow_page_pte+0x663/0x6d0

I downloaded the debug symbols:

http://ddebs.ubuntu.com/ubuntu/pool/main/l/linux/linux-image-
unsigned-4.15.0-72-generic-dbgsym_4.15.0-72.81_amd64.ddeb

Extracted them:

dpkg -x linux-image-unsigned-4.15.0-72-generic-
dbgsym_4.15.0-72.81_amd64.ddeb debug

and looked up:

$ eu-addr2line -e ./vmlinux-4.15.0-72-generic -f follow_page_pte+0x663
try_get_page inlined at /build/linux-E6MDAa/linux-4.15.0/mm/gup.c:156 in 
follow_page_pte
/build/linux-E6MDAa/linux-4.15.0/mm/gup.c:138

We see that you hit try_get_page() in mm/gup.c:156

 155 if (flags & FOLL_GET) {
 156 if (unlikely(!try_get_page(page))) {
 157 page = ERR_PTR(-ENOMEM);
 158 goto out;
 159 }
 
Looking at try_get_page() in include/linux/mm.h:

 854 static inline __must_check bool try_get_page(struct page *page)
 855 {
 856 page = compound_head(page);
 857 if (WARN_ON_ONCE(page_ref_count(page) <= 0))
 858 return false;
 859 page_ref_inc(page);
 860 return true;
 861 }
 
We see that you hit the exact same WARN_ON_ONCE for the page_ref_count(page) <= 
0).

So, whatever page you are trying to access, has its reference counter in
the negatives, which suggests that has either wrapped around, or has
been decremented too many times.

Looking at your error log, I can't tell for sure if it is the zero_page,
but its quite likely going to be. The zero_page is a frequently used
page in the system, and it is used outside of ksm, it's just that ksm is
a heavy user of the zero_page. If you are constantly allocating large
amounts of new memory, you will be be using the zero_page similar to
ksm, and the reference counter will eventually overflow.

I think there is a good chance that the fix I submitted in
4.15.0-118-generic will solve your problems. Please do a "apt update"
and "apt upgrade" and upgrade to a newer kernel, the newer the better,
and it will most likely fix the problem.

Let me know if you have any more questions.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: 

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2021-05-15 Thread norman shen
Interestingly, I hit this warning log without enabling ksm

```console
# cat /sys/kernel/mm/ksm/run
0
# uname -a
Linux compute12 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 18.04.3 LTS
Release:18.04
Codename:   bionic
```

log is

[Sat May 15 11:28:32 2021] WARNING: CPU: 31 PID: 3196546 at 
/build/linux-E6MDAa/linux-4.15.0/include/linux/mm.h:857 
follow_page_pte+0x663/0x6d0
[Sat May 15 11:28:32 2021] Modules linked in: nls_iso8859_1 act_police cls_u32 
sch_ingress cls_fw sch_sfq sch_htb ip6table_raw xt_CT xt_mac vhost_net vhost 
tap ebtable_filter ebtables ip6table_filter devlink vxlan ip6_udp_tunnel 
udp_tunnel ip_gre gre xt_multiport xt_set iptable_raw iptable_mangle 
ip_set_hash_net ip_set_hash_ip ip_set ipip tunnel4 ip_tunnel veth xt_statistic 
xt_physdev xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_addrtype 
ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs ip6table_nat ip6_tables xt_comment xt_mark 
iptable_filter xt_conntrack nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo 
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat aufs rbd libceph overlay 
openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_defrag_ipv6 nf_nat bonding dm_service_time dm_multipath
[Sat May 15 11:28:32 2021]  scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl 
skx_edac x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass intel_cstate 
intel_rapl_perf ipmi_ssif ioatdma joydev input_leds acpi_power_meter mei_me mei 
shpchp mac_hid ipmi_si ipmi_devintf ipmi_msghandler lpc_ich sch_fq_codel 
nf_conntrack ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_iscsi br_netfilter bridge stp llc ip_tables x_tables 
autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
ses enclosure scsi_transport_sas hid_generic crct10dif_pclmul crc32_pclmul 
usbhid ghash_clmulni_intel hid pcbc lpfc aesni_intel aes_x86_64 nvmet_fc 
crypto_simd ast glue_helper nvmet cryptd nvme_fc ttm nvme_fabrics
[Sat May 15 11:28:32 2021]  igb nvme_core drm_kms_helper dca scsi_transport_fc 
syscopyarea i2c_algo_bit sysfillrect sysimgblt i40e aacraid fb_sys_fops drm ptp 
pps_core ahci libahci wmi
[Sat May 15 11:28:32 2021] CPU: 31 PID: 3196546 Comm: CPU 2/KVM Not tainted 
4.15.0-72-generic #81-Ubuntu
[Sat May 15 11:28:32 2021] Hardware name: Inspur NF5280M5/YZMB-00882-104, BIOS 
4.0.8 10/17/2018
[Sat May 15 11:28:32 2021] RIP: 0010:follow_page_pte+0x663/0x6d0
[Sat May 15 11:28:32 2021] RSP: 0018:b1eff4e5b8f8 EFLAGS: 00010286
[Sat May 15 11:28:32 2021] RAX: e041b58cba40 RBX: e043fed90cf0 RCX: 
8000
[Sat May 15 11:28:32 2021] RDX: e041b58cba40 RSI: 7f7306766000 RDI: 
800d632e9225
[Sat May 15 11:28:32 2021] RBP: b1eff4e5b960 R08: 800d632e9225 R09: 
a0249cceb1e0
[Sat May 15 11:28:32 2021] R10:  R11: b1eff4e5ba8c R12: 
e041b58cba40
[Sat May 15 11:28:32 2021] R13: 3000 R14: 0326 R15: 
a076af75a198
[Sat May 15 11:28:32 2021] FS:  7f73f48ee700() 
GS:a0947f2c() knlGS:f88001e81000
[Sat May 15 11:28:32 2021] CS:  0010 DS:  ES:  CR0: 80050033
[Sat May 15 11:28:32 2021] CR2: f8a016819000 CR3: 004e72518004 CR4: 
007626e0
[Sat May 15 11:28:32 2021] DR0:  DR1:  DR2: 

[Sat May 15 11:28:32 2021] DR3:  DR6: fffe0ff0 DR7: 
0400
[Sat May 15 11:28:32 2021] PKRU: 5554
[Sat May 15 11:28:32 2021] Call Trace:
[Sat May 15 11:28:32 2021]  follow_pmd_mask+0x209/0x640
[Sat May 15 11:28:32 2021]  follow_page_mask+0x17a/0x210
[Sat May 15 11:28:32 2021]  __get_user_pages+0x18c/0x720
[Sat May 15 11:28:32 2021]  get_user_pages+0x42/0x50
[Sat May 15 11:28:32 2021]  __gfn_to_pfn_memslot+0x126/0x410 [kvm]
[Sat May 15 11:28:32 2021]  try_async_pf+0x66/0x1f0 [kvm]
[Sat May 15 11:28:32 2021]  tdp_page_fault+0x138/0x290 [kvm]
[Sat May 15 11:28:32 2021]  ? vmexit_fill_RSB+0x1c/0x40 [kvm_intel]
[Sat May 15 11:28:32 2021]  kvm_mmu_page_fault+0x62/0x160 [kvm]
[Sat May 15 11:28:32 2021]  handle_ept_violation+0xbb/0x150 [kvm_intel]
[Sat May 15 11:28:32 2021]  vmx_handle_exit+0xb3/0xe80 [kvm_intel]
[Sat May 15 11:28:32 2021]  ? vmexit_fill_RSB+0x1c/0x40 [kvm_intel]
[Sat May 15 11:28:32 2021]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
[Sat May 15 11:28:32 2021]  ? vmexit_fill_RSB+0x1c/0x40 [kvm_intel]
[Sat May 15 11:28:32 2021]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
[Sat May 15 11:28:32 2021]  ? vmx_vcpu_run+0x3fa/0x600 [kvm_intel]
[Sat May 15 11:28:32 2021]  vcpu_enter_guest+0x424/0x1260 [kvm]
[Sat May 15 11:28:32 2021]  ? __schedule+0x256/0x880
[Sat May 15 11:28:32 2021]  kvm_arch_vcpu_ioctl_run+0x203/0x3e0 [kvm]
[Sat May 15 11:28:32 2021]  ? 

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-21 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.4.0-48.52

---
linux (5.4.0-48.52) focal; urgency=medium

  * focal/linux: 5.4.0-48.52 -proposed tracker (LP: #1894654)

  * mm/slub kernel oops on focal kernel 5.4.0-45 (LP: #1895109)
- SAUCE: Revert "mm/slub: fix a memory leak in sysfs_slab_add()"

  * Packaging resync (LP: #1786013)
- update dkms package versions
- update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
- [packaging] add signed modules for nvidia 450 and 450-server

  * [UBUNTU 20.04] zPCI attach/detach issues with PF/VF linking support
(LP: #1892849)
- s390/pci: fix zpci_bus_link_virtfn()
- s390/pci: re-introduce zpci_remove_device()
- s390/pci: fix PF/VF linking on hot plug

  * [UBUNTU 20.04] kernel: s390/cpum_cf,perf: changeDFLT_CCERROR counter name
(LP: #1891454)
- s390/cpum_cf, perf: change DFLT_CCERROR counter name

  * [UBUNTU 20.04] zPCI: Enabling of a reserved PCI function regression
introduced by multi-function support (LP: #1891437)
- s390/pci: fix enabling a reserved PCI function

  * CVE-2020-12888
- vfio/type1: Support faulting PFNMAP vmas
- vfio-pci: Fault mmaps to enable vma tracking
- vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  *  [Hyper-V] VSS and File Copy daemons intermittently fails to start
(LP: #1891224)
- [Packaging] Bind hv_vss_daemon startup to hv_vss device
- [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * alsa/hdmi: support nvidia mst hdmi/dp audio (LP: #1867704)
- ALSA: hda - Rename snd_hda_pin_sense to snd_hda_jack_pin_sense
- ALSA: hda - Add DP-MST jack support
- ALSA: hda - Add DP-MST support for non-acomp codecs
- ALSA: hda - Add DP-MST support for NVIDIA codecs
- ALSA: hda: hdmi - fix regression in connect list handling
- ALSA: hda: hdmi - fix kernel oops caused by invalid PCM idx
- ALSA: hda: hdmi - preserve non-MST PCM routing for Intel platforms
- ALSA: hda: hdmi - Keep old slot assignment behavior for Intel platforms
- ALSA: hda - Fix DP-MST support for NVIDIA codecs

  * Focal update: v5.4.60 upstream stable release (LP: #1892899)
- smb3: warn on confusing error scenario with sec=krb5
- genirq/affinity: Make affinity setting if activated opt-in
- genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()
- PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
- PCI: Add device even if driver attach failed
- PCI: qcom: Define some PARF params needed for ipq8064 SoC
- PCI: qcom: Add support for tx term offset for rev 2.1.0
- btrfs: allow use of global block reserve for balance item deletion
- btrfs: free anon block device right after subvolume deletion
- btrfs: don't allocate anonymous block device for user invisible roots
- btrfs: ref-verify: fix memory leak in add_block_entry
- btrfs: stop incremening log_batch for the log root tree when syncing log
- btrfs: remove no longer needed use of log_writers for the log root tree
- btrfs: don't traverse into the seed devices in show_devname
- btrfs: open device without device_list_mutex
- btrfs: move the chunk_mutex in btrfs_read_chunk_tree
- btrfs: relocation: review the call sites which can be interrupted by 
signal
- btrfs: add missing check for nocow and compression inode flags
- btrfs: avoid possible signal interruption of btrfs_drop_snapshot() on
  relocation tree
- btrfs: sysfs: use NOFS for device creation
- btrfs: don't WARN if we abort a transaction with EROFS
- btrfs: fix race between page release and a fast fsync
- btrfs: fix messages after changing compression level by remount
- btrfs: only search for left_info if there is no right_info in
  try_merge_free_space
- btrfs: inode: fix NULL pointer dereference if inode doesn't need 
compression
- btrfs: fix memory leaks after failure to lookup checksums during inode
  logging
- btrfs: make sure SB_I_VERSION doesn't get unset by remount
- btrfs: fix return value mixup in btrfs_get_extent
- arm64: perf: Correct the event index in sysfs
- dt-bindings: iio: io-channel-mux: Fix compatible string in example code
- iio: dac: ad5592r: fix unbalanced mutex unlocks in ad5592r_read_raw()
- xtensa: add missing exclusive access state management
- xtensa: fix xtensa_pmu_setup prototype
- cifs: Fix leak when handling lease break for cached root fid
- powerpc/ptdump: Fix build failure in hashpagetable.c
- powerpc: Allow 4224 bytes of stack expansion for the signal frame
- powerpc: Fix circular dependency between percpu.h and mmu.h
- pinctrl: ingenic: Enhance support for IRQ_TYPE_EDGE_BOTH
- media: vsp1: dl: Fix NULL pointer dereference on unbind
- net: ethernet: stmmac: Disable hardware multicast filter
- net: stmmac: dwmac1000: provide multicast filter fallback

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-21 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-118.119

---
linux (4.15.0-118.119) bionic; urgency=medium

  * bionic/linux: 4.15.0-118.119 -proposed tracker (LP: #1894697)

  * Packaging resync (LP: #1786013)
- update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
- [packaging] add signed modules for nvidia 450 and 450-server

  * cgroup refcount is bogus when cgroup_sk_alloc is disabled (LP: #1886860)
- cgroup: add missing skcd->no_refcnt check in cgroup_sk_clone()

  * CVE-2020-12888
- vfio/type1: Support faulting PFNMAP vmas
- vfio-pci: Fault mmaps to enable vma tracking
- vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  *  [Hyper-V] VSS and File Copy daemons intermittently fails to start
(LP: #1891224)
- [Packaging] Bind hv_vss_daemon startup to hv_vss device
- [Packaging] bind hv_fcopy_daemon startup to hv_fcopy device

  * KVM: Fix zero_page reference counter overflow when using KSM on KVM compute
host (LP: #1837810)
- KVM: fix overflow of zero page refcount with ksm running

  * Fix false-negative return value for rtnetlink.sh in kselftests/net
(LP: #1890136)
- selftests: rtnetlink: correct the final return value for the test
- selftests: rtnetlink: make kci_test_encap() return sub-test result

  * Bionic update: upstream stable patchset 2020-08-18 (LP: #1892091)
- USB: serial: qcserial: add EM7305 QDL product ID
- USB: iowarrior: fix up report size handling for some devices
- usb: xhci: define IDs for various ASMedia host controllers
- usb: xhci: Fix ASMedia ASM1142 DMA addressing
- Revert "ALSA: hda: call runtime_allow() for all hda controllers"
- ALSA: seq: oss: Serialize ioctls
- staging: android: ashmem: Fix lockdep warning for write operation
- Bluetooth: Fix slab-out-of-bounds read in 
hci_extended_inquiry_result_evt()
- Bluetooth: Prevent out-of-bounds read in hci_inquiry_result_evt()
- Bluetooth: Prevent out-of-bounds read in 
hci_inquiry_result_with_rssi_evt()
- omapfb: dss: Fix max fclk divider for omap36xx
- binder: Prevent context manager from incrementing ref 0
- vgacon: Fix for missing check in scrollback handling
- mtd: properly check all write ioctls for permissions
- leds: wm831x-status: fix use-after-free on unbind
- leds: da903x: fix use-after-free on unbind
- leds: lm3533: fix use-after-free on unbind
- leds: 88pm860x: fix use-after-free on unbind
- net/9p: validate fds in p9_fd_open
- drm/nouveau/fbcon: fix module unload when fbcon init has failed for some
  reason
- drm/nouveau/fbcon: zero-initialise the mode_cmd2 structure
- i2c: slave: improve sanity check when registering
- i2c: slave: add sanity check when unregistering
- usb: hso: check for return value in hso_serial_common_create()
- firmware: Fix a reference count leak.
- cfg80211: check vendor command doit pointer before use
- igb: reinit_locked() should be called with rtnl_lock
- atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent
- tools lib traceevent: Fix memory leak in process_dynamic_array_len
- Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23)
- xattr: break delegations in {set,remove}xattr
- ipv4: Silence suspicious RCU usage warning
- ipv6: fix memory leaks on IPV6_ADDRFORM path
- net: ethernet: mtk_eth_soc: fix MTU warnings
- vxlan: Ensure FDB dump is performed under RCU
- net: lan78xx: replace bogus endpoint lookup
- hv_netvsc: do not use VF device if link is down
- net: gre: recompute gre csum for sctp over gre tunnels
- openvswitch: Prevent kernel-infoleak in ovs_ct_put_key()
- Revert "vxlan: fix tos value before xmit"
- selftests/net: relax cpu affinity requirement in msg_zerocopy test
- rxrpc: Fix race between recvmsg and sendmsg on immediate call failure
- i40e: add num_vectors checker in iwarp handler
- i40e: Wrong truncation from u16 to u8
- i40e: Memory leak in i40e_config_iwarp_qvlist
- Smack: fix use-after-free in smk_write_relabel_self()

  * Bionic update: upstream stable patchset 2020-08-11 (LP: #1891228)
- AX.25: Fix out-of-bounds read in ax25_connect()
- AX.25: Prevent out-of-bounds read in ax25_sendmsg()
- dev: Defer free of skbs in flush_backlog
- drivers/net/wan/x25_asy: Fix to make it work
- net-sysfs: add a newline when printing 'tx_timeout' by sysfs
- net: udp: Fix wrong clean up for IS_UDPLITE macro
- rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA
- AX.25: Prevent integer overflows in connect and sendmsg
- ip6_gre: fix null-ptr-deref in ip6gre_init_net()
- rtnetlink: Fix memory(net_device) leak when ->newlink fails
- tcp: allow at most one TLP probe per flight
- regmap: debugfs: check count when read regmap file
- qrtr: orphan socket in qrtr_release()
- sctp: 

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-08 Thread Matthew Ruffell
As promised, I have an update on the lab machine I left running
ksm_refcnt_overflow.sh for a week straight.

The machine was running 4.15.0-116-generic from -proposed:

$ uname -rv
4.15.0-116-generic #117-Ubuntu SMP Fri Aug 28 16:04:22 UTC 2020
$ uptime
 04:36:14 up 7 days, 1 min,  1 user,  load average: 3.47, 3.14, 2.97
 
In that time it has created and destroyed 32,950 virtual machines:

$ virsh list
 IdName   State

 32945 instance-0 running
 32946 instance-1 running
 32947 instance-2 running
 32948 instance-3 running
 32949 instance-4 running
 
If we look at the current value of the reference counter, it is still set to 1:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

I checked /var/log/kern.log, /var/log/syslog and journalctl, there are
no oops messages, and the KVM subsystem is stable.

I am shutting the lab machine down now, as I am convinced the patch is
stable. This SRU is still verified.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-01 Thread Matthew Ruffell
As requested by the kernel team (in https://lists.ubuntu.com/archives
/kernel-team/2020-August/112775.html), I will do some additional testing
for this SRU to really make sure it won't cause any regressions.

I provisioned a lab machine on segmaas, running Bionic. I installed the
4.15.0-116-generic kernel from -proposed on it.

I built the zero_page_refcount.c kernel module, and inserted it into the
running kernel.

I then got ksm_refcnt_overflow.sh running in a screen session, creating
and destroying virtual machines in an infinite loop.

This way we will know the code path has been exercised a fair amount.

I will leave this running creating and destroying virtual machines for a
week or so, and I will report back with the results.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-01 Thread Matthew Ruffell
Verification steps for focal:

Again, I made sure I can reproduce on the existing 5.4.0-42-generic
kernel.

I copied ksm_refcnt_overflow.sh and zero_page_refcount.c to the VM, and
built the kernel module, and inserted it into the kernel:

$ sudo insmod zero_page_refcount.ko
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

>From there, I started running the ksm_refcnt_script.sh in another
terminal. I checked to ensure VMs were running:

$ virsh list
 Id   Name State

 1instance-0   running
 2instance-1   running
 3instance-2   running
 
>From there, we can see the reference counter increment:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1bd9 or 7129
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1f9e or 8094
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1fb0 or 8112

>From there, I set the reference counter in an attempt to make it
overflow:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f15 or 2147483413
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x8000 or -2147483648

>From there, all vms became paused:

$ virsh list
 IdName State

 137   instance-0   paused
 138   instance-1   paused
 139   instance-2   paused
 
We see the following oops in dmesg:

https://paste.ubuntu.com/p/3Dc73k9VYy/

I then rebooted the machine, enabled -proposed and installed
5.4.0-46-generic.

$ uname -rv
5.4.0-46-generic #50-Ubuntu SMP Fri Aug 28 15:33:36 UTC 2020

I rebooted, and built a new kernel module with the new headers, and
inserted it into the running kernel:

$ sudo insmod zero_page_refcount.ko 
[sudo] password for ubuntu: 
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

Again, I started the ksm_refcnt_overflow.sh script in another terminal,
and checked to see that VMs were being created:

$ virsh list
 Id   Name State

 1instance-0   running
 2instance-1   running
 
When we check the value of the reference counter, it is still 1 and not 
incrementing:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

When I attempt to trigger overflow:

$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1F000

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f00 or 2147483392
$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f00 or 2147483392

We never overflow. The problem is fixed. Marking the bug as verified for
focal.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-01 Thread Matthew Ruffell
Verification steps for Bionic:

First, I made sure I could reproduce the problem on 4.15.0-115-generic.

I made a fresh Bionic VM, and copied over the ksm_refcnt_overflow.sh and
zero_page_refcound.c files.

I built the kernel module, and inserted it into the kernel.

>From there, I checked the zero_page reference counter.

$ sudo insmod zero_page_refcount.ko
[sudo] password for ubuntu: 
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

>From there, in another terminal, I ran the script ksm_refcnt_overflow.sh, and
checked to see VMs were running:

$ virsh list
 IdName   State

 1 instance-0 running
 2 instance-1 running
 3 instance-2 running
 4 instance-3 running
 5 instance-4 running
 
>From there, we can see the reference counter increment:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1158 or 4440
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1622 or 5666
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x163a or 5690

I issued the set command, to get it ready to overflow:

$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1F000

I then checked and saw it overflow:

ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f27 or 2147483431
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f92 or 2147483538
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x8000 or -2147483648

Instances became paused, and virtualisation broken:

$ virsh list
 IdName   State

 5 instance-4 paused
 6 instance-5 paused
 7 instance-6 paused
 8 instance-7 paused
 9 instance-0 paused
 10instance-1 paused
 11instance-2 paused
 12instance-3 paused

>From there, we see the usual call trace in dmesg:

https://paste.ubuntu.com/p/wpJkGCH3fJ/

I rebooted, and enabled -proposed. I then installed the
4.15.0-116-generic kernel, and rebooted again.

I rebuilt the zero_page_refcount kernel module with the new headers, and
inserted it into the running kernel.

$ uname -rv
4.15.0-116-generic #117-Ubuntu SMP Fri Aug 28 16:04:22 UTC 2020
$ sudo insmod zero_page_refcount.ko
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

>From there, I started the script ksm_refcnt_overflow.sh in another
terminal.

We can see that VMs are running:

$ virsh list
 IdName   State

 1 instance-1 running
 2 instance-2 running
 3 instance-3 running
 4 instance-4 running

Checking the value of the zero_page reference counter:

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x1 or 1

We are still at 1. Now attempting to trigger overflow:

$ cat /proc/zero_page_refcount_set
Zero Page Refcount set to 0x1F000

$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f00 or 2147483392
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f00 or 2147483392
ubuntu@ubuntu:~/module$ cat /proc/zero_page_refcount
Zero Page Refcount: 0x7f00 or 2147483392

The reference counter is never incremented, and will not overflow.

The problem is solved, and I am happy to mark this bug as verified for
bionic.

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference 

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-09-01 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-31 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
focal' to 'verification-done-focal'. If the problem still exists, change
the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-25 Thread Ian
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  vcpu_enter_guest+0x414/0x1260 [kvm]
  [  167.696138]  ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
  [  167.696148]  kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696157]  ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696165]  kvm_vcpu_ioctl+0x33a/0x610 [kvm]
  [  167.696166]  ? do_futex+0x129/0x590
  [  167.696171]  ? __switch_to+0x34c/0x4e0
  [  167.696174]  ? __switch_to_asm+0x35/0x70
  [  167.696176]  do_vfs_ioctl+0xa4/0x600
  [  167.696177]  SyS_ioctl+0x79/0x90
  [  167.696180]  ? exit_to_usermode_loop+0xa5/0xd0
  [  167.696181]  do_syscall_64+0x73/0x130
  [  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-20 Thread Ian
** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  vcpu_enter_guest+0x414/0x1260 [kvm]
  [  167.696138]  ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
  [  167.696148]  kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696157]  ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696165]  kvm_vcpu_ioctl+0x33a/0x610 [kvm]
  [  167.696166]  ? do_futex+0x129/0x590
  [  167.696171]  ? __switch_to+0x34c/0x4e0
  [  167.696174]  ? __switch_to_asm+0x35/0x70
  [  167.696176]  do_vfs_ioctl+0xa4/0x600
  [  167.696177]  SyS_ioctl+0x79/0x90
  [  167.696180]  ? exit_to_usermode_loop+0xa5/0xd0
  [  167.696181]  do_syscall_64+0x73/0x130
  [  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-16 Thread Matthew Ruffell
Attached is a kernel module which lets you see the contents of the
zero_page reference counter, and to set it to near overflow.

** Attachment added: "kernel module to view zero_page reference counter"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837810/+attachment/5402014/+files/zero_page_refcount.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  vcpu_enter_guest+0x414/0x1260 [kvm]
  [  167.696138]  ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
  [  167.696148]  kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696157]  ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696165]  kvm_vcpu_ioctl+0x33a/0x610 [kvm]
  [  167.696166]  ? do_futex+0x129/0x590
  [  167.696171]  ? 

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-16 Thread Matthew Ruffell
Attached is a script to create and destroy VMs in a loop, to try and
increment the zero_page reference counter.

** Attachment added: "Reproducer script to create and destroy VMs"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1837810/+attachment/5402013/+files/ksm_refcnt_overflow.sh

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  vcpu_enter_guest+0x414/0x1260 [kvm]
  [  167.696138]  ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
  [  167.696148]  kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696157]  ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696165]  kvm_vcpu_ioctl+0x33a/0x610 [kvm]
  [  167.696166]  ? do_futex+0x129/0x590
  [  167.696171]  ? __switch_to+0x34c/0x4e0
  [  167.696174]  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-13 Thread Pooja Ghumre
Thanks for fixing it @mruffell!

Yes, we did have KSM enabled on the hypervisor where we hit this issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1837810

Title:
  KVM: Fix zero_page reference counter overflow when using KSM on KVM
  compute host

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1837810

  [Impact]

  We are seeing a problem on OpenStack compute nodes, and KVM hosts,
  where a kernel oops is generated, and all running KVM machines are
  placed into the pause state.

  This is caused by the kernel's reserved zero_page reference counter
  overflowing from a positive number to a negative number, and hitting a
  (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().

  This only happens if the machine has Kernel Samepage Mapping (KSM)
  enabled, with "use_zero_pages" turned on. Each time a new VM starts
  and the kernel does a KSM merge run during a EPT violation, the
  reference counter for the zero_page is incremented in try_async_pf()
  and never decremented. Eventually, the reference counter will
  overflow, causing the KVM subsystem to fail.

  Syslog:
  error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required

  QEMU Logs:
  error: kvm run failed Bad address
  EAX=000afe00 EBX=000b ECX=0080 EDX=0cfe
  ESI=0003fe00 EDI=000afe00 EBP=0007 ESP=6d74
  EIP=000ee344 EFL=00010002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0010   00c09300 DPL=0 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0010   00c09300 DPL=0 DS   [-WA]
  FS =0010   00c09300 DPL=0 DS   [-WA]
  GS =0010   00c09300 DPL=0 DS   [-WA]
  LDT=   8200 DPL=0 LDT
  TR =   8b00 DPL=0 TSS32-busy
  GDT= 000f7040 0037
  IDT= 000f707e 
  CR0=0011 CR2= CR3= CR4=
  DR0= DR1= DR2= 
DR3= 
  DR6=0ff0 DR7=0400
  EFER=
  Code=c3 57 56 b8 00 fe 0a 00 be 00 fe 03 00 b9 80 00 00 00 89 c7  a5 a1 
00 80 03 00 8b 15 04 80 03 00 a3 00 80 0a 00 89 15 04 80 0a 00 b8 ae e2 00 00 31

  Kernel Oops:

  [  167.695986] WARNING: CPU: 1 PID: 3016 at 
/build/linux-hwe-FEhT7y/linux-hwe-4.15.0/include/linux/mm.h:852 
follow_page_pte+0x6f4/0x710
  [  167.696023] CPU: 1 PID: 3016 Comm: CPU 0/KVM Tainted: G   OE
4.15.0-106-generic #107~16.04.1-Ubuntu
  [  167.696023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
  [  167.696025] RIP: 0010:follow_page_pte+0x6f4/0x710
  [  167.696026] RSP: 0018:a81802023908 EFLAGS: 00010286
  [  167.696027] RAX: ed8786e33a80 RBX: ed878c6d21b0 RCX: 
8000
  [  167.696027] RDX:  RSI: 3000 RDI: 
8001b8cea225
  [  167.696028] RBP: a81802023970 R08: 8001b8cea225 R09: 
90c4d55fa340
  [  167.696028] R10:  R11:  R12: 
ed8786e33a80
  [  167.696029] R13: 0326 R14: 90c4db94fc50 R15: 
90c4d55fa340
  [  167.696030] FS:  7f6a7798c700() GS:90c4edc8() 
knlGS:
  [  167.696030] CS:  0010 DS:  ES:  CR0: 80050033
  [  167.696031] CR2:  CR3: 000315580002 CR4: 
00162ee0
  [  167.696033] Call Trace:
  [  167.696047]  follow_pmd_mask+0x273/0x630
  [  167.696049]  follow_page_mask+0x178/0x230
  [  167.696051]  __get_user_pages+0xb8/0x740
  [  167.696052]  get_user_pages+0x42/0x50
  [  167.696068]  __gfn_to_pfn_memslot+0x18b/0x3b0 [kvm]
  [  167.696079]  ? mmu_set_spte+0x1dd/0x3a0 [kvm]
  [  167.696090]  try_async_pf+0x66/0x220 [kvm]
  [  167.696101]  tdp_page_fault+0x14b/0x2b0 [kvm]
  [  167.696104]  ? vmexit_fill_RSB+0x10/0x40 [kvm_intel]
  [  167.696114]  kvm_mmu_page_fault+0x62/0x180 [kvm]
  [  167.696117]  handle_ept_violation+0xbc/0x160 [kvm_intel]
  [  167.696119]  vmx_handle_exit+0xa5/0x580 [kvm_intel]
  [  167.696129]  vcpu_enter_guest+0x414/0x1260 [kvm]
  [  167.696138]  ? kvm_arch_vcpu_load+0x4d/0x280 [kvm]
  [  167.696148]  kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696157]  ? kvm_arch_vcpu_ioctl_run+0xd9/0x3d0 [kvm]
  [  167.696165]  kvm_vcpu_ioctl+0x33a/0x610 [kvm]
  [  167.696166]  ? do_futex+0x129/0x590
  [  167.696171]  ? __switch_to+0x34c/0x4e0
  [  167.696174]  ? __switch_to_asm+0x35/0x70
  [  167.696176]  do_vfs_ioctl+0xa4/0x600
  [  167.696177]  SyS_ioctl+0x79/0x90
  [  167.696180]  ? exit_to_usermode_loop+0xa5/0xd0
  [  167.696181]  

[Kernel-packages] [Bug 1837810] Re: KVM: Fix zero_page reference counter overflow when using KSM on KVM compute host

2020-08-13 Thread Matthew Ruffell
** Summary changed:

- qemu instance gets paused with error: kvm run failed Bad address
+ KVM: Fix zero_page reference counter overflow when using KSM on KVM compute 
host

** Description changed:

- We are occasionally running into the below qemu error on our Ubuntu16
- KVM hypervisors managed using Openstack, where the qemu guest instance
- gets paused unexpectedly.
- 
- pooja@kvm14:~$ sudo cat 
/var/log/libvirt/qemu/f8a20654-4c96-4446-95b3-24b8d28fab7f.log.1
+ BugLink: https://bugs.launchpad.net/bugs/1837810
+ 
+ [Impact]
+ 
+ We are seeing a problem on OpenStack compute nodes, and KVM hosts, where
+ a kernel oops is generated, and all running KVM machines are placed into
+ the pause state.
+ 
+ This is caused by the kernel's reserved zero_page reference counter
+ overflowing from a positive number to a negative number, and hitting a
+ (WARN_ON_ONCE(page_ref_count(page) <= 0)) condition in try_get_page().
+ 
+ This only happens if the machine has Kernel Samepage Mapping (KSM)
+ enabled, with "use_zero_pages" turned on. Each time a new VM starts and
+ the kernel does a KSM merge run during a EPT violation, the reference
+ counter for the zero_page is incremented in try_async_pf() and never
+ decremented. Eventually, the reference counter will overflow, causing
+ the KVM subsystem to fail.
+ 
+ Syslog:
+ error : qemuMonitorJSONCheckError:392 : internal error: unable to execute 
QEMU command 'cont': Resetting the Virtual Machine is required
+ 
+ QEMU Logs:
  error: kvm run failed Bad address
- RAX= RBX=8d12ffc01b00 RCX= 
RDX=8d14d111c040
- RSI=000e RDI=8d15bffda000 RBP=8d156afbb898 
RSP=8d156afbb870
- R8 =001e R9 = R10=001d 
R11=ffd0
- R12=8d14d111c000 R13=f7de0d444700 R14=8d14d111c000 
R15=0001
- RIP=9ba1911b RFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
- ES =  000f 
- CS =0010   00a09b00 DPL=0 CS64 [-RA]
- SS =   00c0
- DS =  000f 
- FS = 7f8c7e8ac700 000f 
- GS = 8d15bfd0 000f 
- LDT=  000f 
- TR =0040 8d15bfd04000 2087 8b00 DPL=0 TSS64-busy
- GDT= 8d15bfd0c000 007f
- IDT= ff528000 0fff
- CR0=80050033 CR2=1699e002 CR3=b0d54000 CR4=003606e0
- DR0= DR1= DR2= 
DR3=
- DR6=fffe0ff0 DR7=0400
- EFER=0d01
- Code=44 39 f8 48 63 43 20 0f 8e e1 00 00 00 48 63 53 18 4c 01 e2 <49> 89 14 
04 41 0f b7 55 1a 48 63 43 18 41 83 c7 01 66 81 e2 ff 7f 49 01 c4 0f b7 c2 44 39
- 2019-07-16T23:59:30.354240Z qemu-system-x86_64: terminating on signal 15 from 
pid 7487 (/usr/sbin/libvirtd)
- 2019-07-16 23:59:31.549+: shutting down, reason=destroyed
- 
- 
- We also saw some swap related errors in /var/log/syslog previously:
- 
- Jul  4 08:40:18 kvm14 kernel: \[8084318.769268] audit: type=1400
- audit(1562229618.904:4385): apparmor="STATUS"
- operation="profile_replace" profile="unconfined" name="libvirt-a89c8a67
- -24ae-45cb-a885-8880b49f86fd" pid=53770 comm="apparmor_parser"
- 
- Jul  4 08:40:18 kvm14 libvirtd\[20274]: 2019-07-04 08:40:18.911+:
- 20277: warning : AppArmorSetFDLabel:1164 : could not find path for
- descriptor /proc/self/fd/29, skippingJul  4 08:40:19 kvm13 kernel:
- \[8084318.930088] print_req_error: critical medium error, dev nvme0n1,
- sector 392710192
- 
- Jul  4 08:40:19 kvm14 kernel: \[8084318.937754] Read-error on swap-
- device (259:0:392710200)
- 
- Jul  4 08:40:19 kvm14 kernel: \[8084318.943157] Read-error on swap-
- device (259:0:392710208)
- 
- Wondering if these swap read errors happening intermittently, cause qemu
- to pause the guest instance due to memory overcommitment issues.
- 
- pooja@kvm14:~$ lsb_release -a
- No LSB modules are available.
- Distributor ID: Ubuntu
- Description:Ubuntu 16.04.6 LTS
- Release:16.04
- Codename:   xenial
- 
- pooja@kvm14:~$ /usr/bin/qemu-system-x86_64 --version
- QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.10~cloud0)
- Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
- 
- pooja@kvm14:~$ dpkg -l | grep libvirt
- ii  libvirt-bin   4.0.0-1ubuntu8.8~cloud0 
  amd64programs for the libvirt library
- 
- 
- Sample qemu process args for an instance created using Openstack Nova (Newton 
release):
- 
- pooja@kvm14:~$ sudo ps -ef | grep qemu | tail -1
- libvirt+ 52655 1 83 Jun27 ?22-23:37:19 qemu-system-x86_64 
-enable-kvm -name guest=157b8c3d-87c6-4a62-a05b-8d1aae47e890,debug-threads=on 
-S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-567-157b8c3d-87c6-4a62-a/master-key.aes
 -machine