------- Comment From [email protected] 2017-11-10 12:18 EDT------- So far I've only been able to reproduce this 2 ways: a) booting up a Debian Jessie guest (kernel 3.16). generally the crash happens some time after boot, but on some situations it needs some "help", like running "useradd <newuser>". b) bootup up an Ubuntu 16.04 guest, which doesn't seem to ever trigger the issue itself, but then chrooting into that same Debian Jessie image (attaching as a 2nd virtio disk), and then running that same "useradd <newuser>".
Using these test cases, the crash appears to be during the first instance of compound_head(page) within mm/gup.c:gup_pte_range() https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/gup.c?h=v4.13#n1312 The compound_head(page) call results in a pointer dereference of a struct *page, via page->compound_page, and that generates a page fault which leads to the crash. The address of *page in one instance of the crash was 0xf00000000783af20: [95667.639406] Unable to handle kernel paging request for data at address 0xf00000000783af20 [95667.639518] Faulting instruction address: 0xc000000000309714 90:mon> t [c000001e3c4db900] c00000000030a3d0 get_user_pages_fast+0x110/0x160 [c000001e3c4db950] d0000000181be21c kvmppc_book3s_hv_page_fault+0x384/0xc60 [kvm_hv] [c000001e3c4dba40] d0000000181ba94c kvmppc_vcpu_run_hv+0x314/0x790 [kvm_hv] [c000001e3c4dbb10] d0000000181059ec kvmppc_vcpu_run+0x34/0x48 [kvm] [c000001e3c4dbb30] d000000018101aa0 kvm_arch_vcpu_ioctl_run+0x108/0x320 [kvm] [c000001e3c4dbbd0] d0000000180f5018 kvm_vcpu_ioctl+0x400/0x7c8 [kvm] [c000001e3c4dbd40] c0000000003bd6e4 do_vfs_ioctl+0xd4/0xa00 [c000001e3c4dbde0] c0000000003be0d4 SyS_ioctl+0xc4/0x130 [c000001e3c4dbe30] c00000000000b184 system_call+0x58/0x6c --- Exception: c01 (System Call) at 000079d53a595550 SP (79d5354ede40) is in userspace The 0xf address corresponds to the vmemmap area, where page structs are allocated sequentially for all PFNs in the system, so it isn't obviously a bad address. Some of our kernel folks took a look at this and worked out that that with a 64 byte sizeof(struct page), 0xf00000000783af20 corresponds to 0x783af20 / 64 = 1969852th PFN. For a 64K page size this corresponds to 1969852*64K, an address somewhere at around 120GB, which is in the range of physical memory on the system (0-128GB in this case) Since the *page address appeared valid, it was suggested that the issue was with the vmemmap area being "unbolted" by KVM, leading to a page fault for an address that should always be pinned/bolted within the host, and the following fix was suggested: commit 67f8a8c1151c9ef3d1285905d1e66ebb769ecdf7 Author: Paul Mackerras <[email protected]> Date: Tue Sep 12 13:47:23 2017 +1000 KVM: PPC: Book3S HV: Fix bug causing host SLB to be restored incorrectly I've tested this patch against kernel 4.13.0-16-generic, and at least for test cases a) and b) above, this does appear to resolve the issue. So it looks like we need kernel commit 67f8a8c115 pulled into 17.10 to resolve this bug. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1725350 Title: KVM on 17.10 crashes the machine To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1725350/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
