[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
** Changed in: linux (Ubuntu) Status: Triaged => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1687512 Title: Kernel panics on Xenial when using cgroups and strict CFS limits Status in linux package in Ubuntu: Fix Released Status in linux source package in Xenial: Fix Released Bug description: SRU Justification - [Impact] Apache Mesos and Kubernetes workloads on Xenial cause a panic (NULL pointer dereference) in the completely fair scheduler. These panics are in pick_next_entity and include pick_next_task_fair in the call stack. [Fix] Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series - they were posted to LKML at the same time. [Testcase] The fix has been validated by the user who reported the bug Bug description --- We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0050 [155852.036931] IP: [] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
This bug was fixed in the package linux - 4.4.0-79.100 --- linux (4.4.0-79.100) xenial; urgency=low * linux: 4.4.0-79.100 -proposed tracker (LP: #1691180) * linux-aws/linux-gke incorrectly producing and using linux-*-tools- common/linux-*-cloud-tools-common (LP: #1688579) - [Config] make linux-tools-common and linux-cloud-tools-common provide linux- gke versions - [Config] make linux-tools-common and linux-cloud-tools-common provide linux- aws versions - [Packaging] prevent linux-*-tools-common from being produced from non linux packages * CVE-2017-0605 - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline() * i915-bpo crashes on external hdmi input (LP: #1580272) - SAUCE: i915_bpo: Silence the warning about watermark entries not changing * Kernel panics on Xenial when using cgroups and strict CFS limits (LP: #1687512) - sched/fair: Initialize throttle_count for new task-groups lazily - sched/fair: Do not announce throttled next buddy in dequeue_task_fair() * bonding - mlx5 - speed changed to 0 after changing ring size (LP: #1687877) - bonding: allow notifications for bond_set_slave_link_state * Xenial update to 4.4.67 stable release (LP: #1689296) - timerfd: Protect the might cancel mechanism proper - Handle mismatched open calls - ASoC: intel: Fix PM and non-atomic crash in bytcr drivers - ALSA: ppc/awacs: shut up maybe-uninitialized warning - drbd: avoid redefinition of BITS_PER_PAGE - mtd: avoid stack overflow in MTD CFI code - net: tg3: avoid uninitialized variable warning - netlink: Allow direct reclaim for fallback allocation - IB/qib: rename BITS_PER_PAGE to RVT_BITS_PER_PAGE - IB/ehca: fix maybe-uninitialized warnings - ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY - ext4 crypto: revalidate dentry after adding or removing the key - ext4 crypto: use dget_parent() in ext4_d_revalidate() - ext4/fscrypto: avoid RCU lookup in d_revalidate - nfsd4: minor NFSv2/v3 write decoding cleanup - nfsd: stricter decoding of write-like NFSv2/v3 ops - dm ioctl: prevent stack leak in dm ioctl call - Linux 4.4.67 * Precision Rack failed to resume from S4 (LP: #1686061) - x86 / hibernate: Use hlt_play_dead() when resuming from hibernation - x86/boot: Split out kernel_ident_mapping_init() - x86/power/64: Always create temporary identity mapping correctly * Xenial update to 4.4.66 stable release (LP: #1688505) - f2fs: do more integrity verification for superblock - xc2028: unlock on error in xc2028_set_config() - ARM: OMAP2+: timer: add probe for clocksources - clk: sunxi: Add apb0 gates for H3 - crypto: testmgr - fix out of bound read in __test_aead() - drm/amdgpu: fix array out of bounds - ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea() - md:raid1: fix a dead loop when read from a WriteMostly disk - MIPS: Fix crash registers on non-crashing CPUs - net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata - net_sched: close another race condition in tcf_mirred_release() - RDS: Fix the atomicity for congestion map update - regulator: core: Clear the supply pointer if enabling fails - usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize - xen/x86: don't lose event interrupts - sparc64: kern_addr_valid regression - sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write() - net: neigh: guard against NULL solicit() method - net: phy: handle state correctly in phy_stop_machine - l2tp: purge socket queues in the .destruct() callback - l2tp: take reference on sessions being dumped - l2tp: fix PPP pseudo-wire auto-loading - net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given - sctp: listen on the sock only when it's state is listening or closed - tcp: clear saved_syn in tcp_disconnect() - dp83640: don't recieve time stamps twice - net: ipv6: RTF_PCPU should not be settable from userspace - netpoll: Check for skb->queue_mapping - ip6mr: fix notification device destruction - macvlan: Fix device ref leak when purging bc_queue - ipv6: check skb->protocol before lookup for nexthop - ipv6: check raw payload size correctly in ioctl - ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type - ALSA: seq: Don't break snd_use_lock_sync() loop by timeout - MIPS: KGDB: Use kernel context for sleeping threads - MIPS: Avoid BUG warning in arch_check_elf - p9_client_readdir() fix - Input: i8042 - add Clevo P650RS to the i8042 reset list - nfsd: check for oversized NFSv2/v3 arguments - ARCv2: save r30 on kernel entry as gcc uses it for code-gen - ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram - Linux 4.4.66 * Xenial
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
Customer has verified that 4.4.0-79-generic resolves the issue in their environment that would previously panic. ** Tags removed: verification-needed-xenial ** Tags added: verification-done-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1687512 Title: Kernel panics on Xenial when using cgroups and strict CFS limits Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Fix Committed Bug description: SRU Justification - [Impact] Apache Mesos and Kubernetes workloads on Xenial cause a panic (NULL pointer dereference) in the completely fair scheduler. These panics are in pick_next_entity and include pick_next_task_fair in the call stack. [Fix] Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series - they were posted to LKML at the same time. [Testcase] The fix has been validated by the user who reported the bug Bug description --- We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0050 [155852.036931] IP: [] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed- xenial'. If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! ** Tags added: verification-needed-xenial -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1687512 Title: Kernel panics on Xenial when using cgroups and strict CFS limits Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Fix Committed Bug description: SRU Justification - [Impact] Apache Mesos and Kubernetes workloads on Xenial cause a panic (NULL pointer dereference) in the completely fair scheduler. These panics are in pick_next_entity and include pick_next_task_fair in the call stack. [Fix] Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series - they were posted to LKML at the same time. [Testcase] The fix has been validated by the user who reported the bug Bug description --- We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
Hi, I didn't realise this had hit -proposed; I am in the process of verifying the fix and will let you know ASAP. Regards, Daniel -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1687512 Title: Kernel panics on Xenial when using cgroups and strict CFS limits Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Fix Committed Bug description: SRU Justification - [Impact] Apache Mesos and Kubernetes workloads on Xenial cause a panic (NULL pointer dereference) in the completely fair scheduler. These panics are in pick_next_entity and include pick_next_task_fair in the call stack. [Fix] Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series - they were posted to LKML at the same time. [Testcase] The fix has been validated by the user who reported the bug Bug description --- We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0050 [155852.036931] IP: [] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
** Changed in: linux (Ubuntu Xenial) Status: Triaged => Fix Committed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1687512 Title: Kernel panics on Xenial when using cgroups and strict CFS limits Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Fix Committed Bug description: SRU Justification - [Impact] Apache Mesos and Kubernetes workloads on Xenial cause a panic (NULL pointer dereference) in the completely fair scheduler. These panics are in pick_next_entity and include pick_next_task_fair in the call stack. [Fix] Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) fix the crash. They appear to be intended as a series - they were posted to LKML at the same time. [Testcase] The fix has been validated by the user who reported the bug Bug description --- We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0050 [155852.036931] IP: [] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
** Description changed: + SRU Justification + - + + [Impact] + Apache Mesos and Kubernetes workloads on Xenial cause a panic + (NULL pointer dereference) in the completely fair scheduler. + + These panics are in pick_next_entity and include pick_next_task_fair + in the call stack. + + [Fix] + Cherry-picking both + 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7 + (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz) + and + 094f469172e00d6ab0a3130b0e01c83b3cf3a98d + (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz) + fix the crash. + They appear to be intended as a series - they were posted to LKML at + the same time. + + [Testcase] + The fix has been validated by the user who reported the bug + + Bug description + --- + We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0050 [155852.036931] IP: [] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits
** Tags added: kernel-da-key ** Changed in: linux (Ubuntu) Importance: Undecided => High ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Xenial) Status: New => Triaged ** Changed in: linux (Ubuntu Xenial) Importance: Undecided => High ** Changed in: linux (Ubuntu) Status: Confirmed => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1687512 Title: Kernel panics on Xenial when using cgroups and strict CFS limits Status in linux package in Ubuntu: Triaged Status in linux source package in Xenial: Triaged Bug description: We see a number of kernel panics on servers running Apache Mesos using cgroups with small (0.1-0.2) cpu limits. These all appear as NULL pointer dereferences in and around pick_next_entity and pick_next_task_fair, for example: [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 0050 [24334.501611] IP: [] pick_next_entity+0x7f/0x160 [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0 [24334.512806] Oops: [#1] SMP [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [24334.584748] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 8803ee67c000 [24334.601799] RIP: 0010:[] [] pick_next_entity+0x7f/0x160 [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086 [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: [24334.623190] RDX: 0225341f RSI: RDI: [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: [24334.645153] R13: R14: 0009067729c4 R15: 8803ee672178 [24334.652512] FS: () GS:8803ffd0() knlGS: [24334.660721] CS: 0010 DS: ES: CR0: 80050033 [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 001406e0 [24334.673851] Stack: [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 880036529800 [24334.683995] 0002 8803ee67fe68 810b98a6 8803ffd16e70 [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 [24334.700172] Call Trace: [24334.702750] [] pick_next_task_fair+0x66/0x4b0 [24334.708886] [] __schedule+0x7f4/0x980 [24334.714349] [] schedule+0x35/0x80 [24334.719445] [] schedule_preempt_disabled+0xe/0x10 [24334.725962] [] cpu_startup_entry+0x18a/0x350 [24334.732012] [] start_secondary+0x149/0x170 [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6 [24334.765124] RIP [] pick_next_entity+0x7f/0x160 [24334.771473] RSP [24334.775077] CR2: 0050 [24334.779121] ---[ end trace 05d941efb97b7bae ]--- and [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 0050 [155852.036931] IP: [] pick_next_entity+0x7f/0x160 [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0 [155852.048550] Oops: [#1] SMP [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic #87~14.04.1-Ubuntu [155852.118233] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 [155852.127661] task: 8803ed29aa00 ti: 8800bbb1 task.ti: 8800bbb1