[Bug 1921211] Re: Taking a memory dump of user mode process on Xenial hosts causes bugcheck/kernel panic and core dump

Marcelo Cerri Wed, 24 Mar 2021 13:21:04 -0700

** Description changed:

  [Impact]
  
  We have some Ubuntu 16.04 hosts (in Hyper-V) being used for testing some 
Ubuntu 20.04 container. As part of the testing we were attempting to take a 
memory dump of a container running SQL Server with Ubuntu 20.04 on the Ubuntu 
16.04 host we started seeing kernel panic and core dump. It started happening 
after a specific Xenial kernel update on the host.
  4.4.0-204-generic - Systems that are crashing
  4.4.0-201-generic - Systems that are able to capture dump
- 
  
  Note from the developer indicates following logging showing up.
  ----
  Now the following is output right after I attempt to start the dump. (gdb, 
attach ###, generate-core-file /var/opt/mssql/log/rdorr.delme.core)
  
  [Fri Mar 19 20:01:38 2021] systemd-journald[581]: Successfully sent stream 
file descriptor to service manager.
  [Fri Mar 19 20:01:41 2021] cni0: port 9(vethdec5d2b7) entered forwarding state
  [Fri Mar 19 20:02:42 2021] systemd-journald[581]: Successfully sent stream 
file descriptor to service manager.
  [Fri Mar 19 20:03:04 2021] ------------[ cut here ]------------
  [Fri Mar 19 20:03:04 2021] kernel BUG at 
/build/linux-qlAbvR/linux-4.4.0/mm/memory.c:3214!
  [Fri Mar 19 20:03:04 2021] invalid opcode: 0000 [#1] SMP
  [Fri Mar 19 20:03:04 2021] Modules linked in: veth vxlan ip6_udp_tunnel 
udp_tunnel xt_statistic xt_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp ip_vs_sh 
ip_vs_wrr ip_vs_rr ip_vs libcrc32c ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6_tables xt_comment xt_mark xt_conntrack 
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user 
xfrm_algo xt_addrtype iptable_filter iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables br_netfilter 
bridge stp llc aufs overlay nls_utf8 isofs crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd input_leds serio_raw i2c_piix4 hv_balloon hyperv_fb 8250_fintek joydev 
mac_hid autofs4 hid_generic hv_utils hid_hyperv ptp hv_netvsc hid hv_storvsc 
pps_core
  [Fri Mar 19 20:03:04 2021] hyperv_keyboard scsi_transport_fc psmouse 
pata_acpi hv_vmbus floppy fjes
  [Fri Mar 19 20:03:04 2021] CPU: 1 PID: 24869 Comm: gdb Tainted: G W 
4.4.0-204-generic #236-Ubuntu
  [Fri Mar 19 20:03:04 2021] Hardware name: Microsoft Corporation Virtual 
Machine/Virtual Machine, BIOS 090007 05/18/2018
  [Fri Mar 19 20:03:04 2021] task: ffff880db9229c80 ti: ffff880d93b9c000 
task.ti: ffff880d93b9c000
  [Fri Mar 19 20:03:04 2021] RIP: 0010:[<ffffffff811cd93e>] 
[<ffffffff811cd93e>] handle_mm_fault+0x13de/0x1b80
  [Fri Mar 19 20:03:04 2021] RSP: 0018:ffff880d93b9fc28 EFLAGS: 00010246
  [Fri Mar 19 20:03:04 2021] RAX: 0000000000000100 RBX: 0000000000000000 RCX: 
0000000000000120
  [Fri Mar 19 20:03:04 2021] RDX: ffff880ea635f3e8 RSI: 00003ffffffff000 RDI: 
0000000000000000
  [Fri Mar 19 20:03:04 2021] RBP: ffff880d93b9fce8 R08: 00003ff32179a120 R09: 
000000000000007d
  [Fri Mar 19 20:03:04 2021] R10: ffff8800000003e8 R11: 00000000000003e8 R12: 
ffff8800ea672708
  [Fri Mar 19 20:03:04 2021] R13: 0000000000000000 R14: 000000010247d000 R15: 
ffff8800f27fe400
  [Fri Mar 19 20:03:04 2021] FS: 00007fdc26061600(0000) 
GS:ffff881025640000(0000) knlGS:0000000000000000
  [Fri Mar 19 20:03:04 2021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [Fri Mar 19 20:03:04 2021] CR2: 000055e3a0011290 CR3: 0000000d93ba4000 CR4: 
0000000000160670
  [Fri Mar 19 20:03:04 2021] Stack:
  [Fri Mar 19 20:03:04 2021] ffffffff81082929 fffffffffffffffd ffffffff81082252 
ffff880d93b9fca8
  [Fri Mar 19 20:03:04 2021] ffffffff811c7bca ffff8800f27fe400 000000010247d000 
ffff880e74a88090
  [Fri Mar 19 20:03:04 2021] 000000003a98d7f0 ffff880e00000001 ffff8800000003e8 
0000000000000017
  [Fri Mar 19 20:03:04 2021] Call Trace:
  [Fri Mar 19 20:03:04 2021] [<ffffffff81082929>] ? mm_access+0x79/0xa0
  [Fri Mar 19 20:03:04 2021] [<ffffffff81082252>] ? mmput+0x12/0x130
  [Fri Mar 19 20:03:04 2021] [<ffffffff811c7bca>] ? follow_page_pte+0x1ca/0x3d0
  [Fri Mar 19 20:03:04 2021] [<ffffffff811c7fe4>] ? follow_page_mask+0x214/0x3a0
  [Fri Mar 19 20:03:04 2021] [<ffffffff811c82a0>] __get_user_pages+0x130/0x680
  [Fri Mar 19 20:03:04 2021] [<ffffffff8122b248>] ? path_openat+0x348/0x1360
  [Fri Mar 19 20:03:04 2021] [<ffffffff811c8b74>] get_user_pages+0x34/0x40
  [Fri Mar 19 20:03:04 2021] [<ffffffff811c90f4>] __access_remote_vm+0xe4/0x2d0
  [Fri Mar 19 20:03:04 2021] [<ffffffff811ef6ac>] ? 
alloc_pages_current+0x8c/0x110
  [Fri Mar 19 20:03:04 2021] [<ffffffff811cfe3f>] access_remote_vm+0x1f/0x30
  [Fri Mar 19 20:03:04 2021] [<ffffffff8128d3fa>] mem_rw.isra.16+0xfa/0x190
  [Fri Mar 19 20:03:04 2021] [<ffffffff8128d4c8>] mem_read+0x18/0x20
  [Fri Mar 19 20:03:04 2021] [<ffffffff8121c89b>] __vfs_read+0x1b/0x40
  [Fri Mar 19 20:03:04 2021] [<ffffffff8121d016>] vfs_read+0x86/0x130
  [Fri Mar 19 20:03:04 2021] [<ffffffff8121df65>] SyS_pread64+0x95/0xb0
  [Fri Mar 19 20:03:04 2021] [<ffffffff8186acdb>] 
entry_SYSCALL_64_fastpath+0x22/0xd0
  [Fri Mar 19 20:03:04 2021] Code: d4 ee ff ff 48 8b 7d 98 89 45 88 e8 2d c7 fd 
ff 8b 45 88 89 c3 e9 be ee ff ff 48 8b bd 70 ff ff ff e8 c7 cf 69 00 e9 ad ee 
ff ff <0f> 0b 4c 89 e7 4c 89 9d 70 ff ff ff e8 f1 c9 00 00 85 c0 4c 8b
  [Fri Mar 19 20:03:04 2021] RIP [<ffffffff811cd93e>] 
handle_mm_fault+0x13de/0x1b80
  [Fri Mar 19 20:03:04 2021] RSP <ffff880d93b9fc28>
  [Fri Mar 19 20:03:04 2021] ---[ end trace 9d28a7e662aea7df ]---
  [Fri Mar 19 20:03:04 2021] systemd-journald[581]: Compressed data object 806 
-> 548 using XZ
  
- 
  ------------------------
  
  We think the following code may be relevant to the crashing behavior.
  I think this is the relevant source for Ubuntu 4.4.0-204 (BTW, are you sure 
this is Ubuntu 20.04? 4.4.0 is a Xenial kernel):
  memory.c\mm - ~ubuntu-kernel/ubuntu/+source/linux/+git/xenial - [no 
description] (launchpad.net)
  
  static int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
  unsigned long addr, pte_t pte, pte_t *ptep, pmd_t *pmd)
  {
  ...
  /* A PROT_NONE fault should not end up here */
  BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));  Line 3214
- 
  
  We see following fix but we are not certain if it's relevant yet.
  This is interesting… mm: check VMA flags to avoid invalid PROT_NONE NUMA 
balancing · torvalds/linux@38e0885 · GitHub
  mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing · 
torvalds/linux@38e0885 · GitHub
  
  mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing
  The NUMA balancing logic uses an arch-specific PROT_NONE page table flag
  defined by pte_protnone() or pmd_protnone() to mark PTEs or huge page
  PMDs respectively as requiring balancing upon a subsequent page fault.
  User-defined PROT_NONE memory regions which also have this flag set will
  not normally invoke the NUMA balancing code as do_page_fault() will send
  a segfault to the process before handle_mm_fault() is even called.
  
  However if access_remote_vm() is invoked to access a PROT_NONE region of
  memory, handle_mm_fault() is called via faultin_page() and
  __get_user_pages() without any access checks being performed, meaning
  the NUMA balancing logic is incorrectly invoked on a non-NUMA memory
  region.
  
  A simple means of triggering this problem is to access PROT_NONE mmap'd
  memory using /proc/self/mem which reliably results in the NUMA handling
  functions being invoked when CONFIG_NUMA_BALANCING is set.
  
  This issue was reported in bugzilla (issue 99101) which includes some
  simple repro code.
  
  There are BUG_ON() checks in do_numa_page() and do_huge_pmd_numa_page()
  added at commit c0e7cad to avoid accidentally provoking strange
  behavior by attempting to apply NUMA balancing to pages that are in
  fact PROT_NONE. The BUG_ON()'s are consistently triggered by the repro.
  
  This patch moves the PROT_NONE check into mm/memory.c rather than
  invoking BUG_ON() as faulting in these pages via faultin_page() is a
  valid reason for reaching the NUMA check with the PROT_NONE page table
  flag set and is therefore not always a bug.
  Link: https://bugzilla.kernel.org/show_bug.cgi?id=99101
  
- We need help in understanding how to prevent core dump/kernel panic
- while taking memory dump of a focal container on a xenial host.
- 
  [Test Plan]
  
  Testing on an 16.04 Azure instance, follow the steps:
  
  $ echo 'GRUB_FLAVOUR_ORDER="generic"' | sudo tee -a
  /etc/default/grub.d/99-custom.cfg
  
  $ sudo apt install linux-generic
  
  $ sudo reboot
  
  # login again and confirm the system is booted with the 4.4 kernel
  
  $ sudo apt install docker.io gdb
  
  $ sudo docker pull mcr.microsoft.com/mssql/server:2019-latest
  
  $ sudo docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=<YourStrong@Passw0rd>" \
-    -p 1433:1433 --name sql1 -h sql1 \
-    -d mcr.microsoft.com/mssql/server:2019-latest
+    -p 1433:1433 --name sql1 -h sql1 \
+    -d mcr.microsoft.com/mssql/server:2019-latest
  
  $ps -ef | grep sqlservr
  
  sudo gdb -p $PID -ex generate-core-file
  
  # A kernel BUG should be triggered
  
  [Where problems could occur]
  
  The patches touches the mm subsystem and because of that there's always
  the potential for significant regressions and in this case a revert and
  a re-spin would probably be necessary.
  
  On the other hand however, this patch is included into the mainline
  kernel since 4.8 without problems.


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1921211

Title:
  Taking a memory dump of user mode process on Xenial hosts causes
  bugcheck/kernel panic and core dump

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921211/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1921211] Re: Taking a memory dump of user mode process on Xenial hosts causes bugcheck/kernel panic and core dump

Reply via email to