Public bug reported:

We see a number of kernel panics on servers running Apache Mesos using
cgroups with small (0.1-0.2) cpu limits.

These all appear as NULL pointer dereferences in and around
pick_next_entity and pick_next_task_fair, for example:

[24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000050
[24334.501611] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
[24334.512806] Oops: 0000 [#1] SMP
[24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
[24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
[24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: 
ffff8803ee67c000
[24334.601799] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] 
pick_next_entity+0x7f/0x160
[24334.610490] RSP: 0018:ffff8803ee67fdd8 EFLAGS: 00010086
[24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 0000000000000000
[24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 0000000000000000
[24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 0000000000000000
[24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 0000000000000000
[24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: ffff8803ee672178
[24334.652512] FS: 0000000000000000(0000) GS:ffff8803ffd00000(0000) 
knlGS:0000000000000000
[24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 00000000001406e0
[24334.673851] Stack:
[24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 
ffff880036529800
[24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 
ffff8803ffd16e70
[24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 
0000000000000000
[24334.700172] Call Trace:
[24334.702750] [<ffffffff810b98a6>] pick_next_task_fair+0x66/0x4b0
[24334.708886] [<ffffffff818043c4>] __schedule+0x7f4/0x980
[24334.714349] [<ffffffff81804585>] schedule+0x35/0x80
[24334.719445] [<ffffffff8180481e>] schedule_preempt_disabled+0xe/0x10
[24334.725962] [<ffffffff810bf9fa>] cpu_startup_entry+0x18a/0x350
[24334.732012] [<ffffffff8104f3d9>] start_secondary+0x149/0x170
[24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
[24334.765124] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[24334.771473] RSP <ffff8803ee67fdd8>
[24334.775077] CR2: 0000000000000050
[24334.779121] ---[ end trace 05d941efb97b7bae ]---

and

[155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000050
[155852.036931] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
[155852.048550] Oops: 0000 [#1] SMP
[155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs nfsd auth_rpcgss 
nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev input_leds mac_hid 
i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul 
glue_helper ablk_helper cryptd psmouse virtio_scsi
[155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
[155852.118233] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: 
ffff8800bbb10000
[155852.135347] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] 
pick_next_entity+0x7f/0x160
[155852.144120] RSP: 0018:ffff8800bbb13ce0 EFLAGS: 00010086
[155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: 
ffff8800bb777400
[155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 
0000000000000000
[155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: 
ffff8803ed29aa00
[155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 
0000000000000000
[155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 
0000000000000001
[155852.186387] FS: 00007f387d1c9700(0000) GS:ffff8803ffc80000(0000) 
knlGS:0000000000000000
[155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 
00000000001406e0
[155852.207967] Stack:
[155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 
0000000000000000
[155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 
ffff8803ffc96e70
[155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 
0000000000000001
[155852.234506] Call Trace:
[155852.237156] [<ffffffff810369c9>] ? sched_clock+0x9/0x10
[155852.242673] [<ffffffff810b9b65>] pick_next_task_fair+0x325/0x4b0
[155852.248968] [<ffffffff81803cd9>] __schedule+0x109/0x980
[155852.254491] [<ffffffff81804585>] schedule+0x35/0x80
[155852.259667] [<ffffffff8180727c>] schedule_hrtimeout_range_clock+0xac/0x130
[155852.266838] [<ffffffff810e9fb0>] ? hrtimer_init+0x180/0x180
[155852.272712] [<ffffffff81807270>] ? schedule_hrtimeout_range_clock+0xa0/0x130
[155852.280052] [<ffffffff81807313>] schedule_hrtimeout_range+0x13/0x20
[155852.288558] [<ffffffff812479b9>] ep_poll+0x249/0x310
[155852.293817] [<ffffffff810a8c30>] ? wake_up_q+0x80/0x80
[155852.299271] [<ffffffff81248efc>] SyS_epoll_wait+0xbc/0xe0
[155852.304967] [<ffffffff81807df6>] entry_SYSCALL_64_fastpath+0x16/0x75
[155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
[155852.338852] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
[155852.345270] RSP <ffff8800bbb13ce0>
[155852.348958] CR2: 0000000000000050
[155852.353086] ---[ end trace 8ce693b2314611c4 ]---

Similar issues have been reported in the community for kernels based on
4.4: https://github.com/kubernetes/kops/issues/874

These panics occur in the CFS code when a next buddy is set on an entity
that is not on a run-queue.  This causes pick_next_entity to end up with
curr == left == NULL, which means it will call into
wakeup_preempt_entity() with a valid next buddy and a NULL left, which
it will try to dereference, causing a panic.

This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to catch
when a sched_entity in the hierarchy was not on_rq, as per
https://marc.info/?l=linux-kernel&m=146651668921468&w=2

The stack-trace for the WARN is quite involved:

Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here 
]------------
Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at 
/build/linux-PwPelj/linux-4.4.0/kernel/sched/fair.c:5170 
set_next_buddy+0x55/0x70()
Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat 
xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter 
bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs dm_crypt lockd grace sunrpc 
fscache ppdev input_leds serio_raw parport_pc 8250_fintek parport pvpanic 
mac_hid i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: executor 
Not tainted 4.4.0-72-generic #93+hf135461v20170420b2-Ubuntu
Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google 
Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Apr 25 14:14:48 (none) kernel: [ 5339.764647]  0000000000000086 
00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3
Apr 25 14:14:48 (none) kernel: [ 5339.764650]  0000000000000000 
ffffffff81cbae20 ffff8803ed947640 ffffffff81081302
Apr 25 14:14:48 (none) kernel: [ 5339.764652]  ffff8800bb5fc800 
ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400
Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace:
Apr 25 14:14:48 (none) kernel: [ 5339.764665]  [<ffffffff813f83c3>] 
dump_stack+0x63/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764669]  [<ffffffff81081302>] 
warn_slowpath_common+0x82/0xc0
Apr 25 14:14:48 (none) kernel: [ 5339.764672]  [<ffffffff8108144a>] 
warn_slowpath_null+0x1a/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764674]  [<ffffffff810b52b5>] 
set_next_buddy+0x55/0x70
Apr 25 14:14:48 (none) kernel: [ 5339.764676]  [<ffffffff810b59a4>] 
check_preempt_wakeup+0x244/0x250
Apr 25 14:14:48 (none) kernel: [ 5339.764679]  [<ffffffff810ab580>] 
check_preempt_curr+0x80/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764682]  [<ffffffff810b42eb>] 
attach_task+0x4b/0x60
Apr 25 14:14:48 (none) kernel: [ 5339.764685]  [<ffffffff810be067>] 
load_balance+0x5b7/0x980
Apr 25 14:14:48 (none) kernel: [ 5339.764688]  [<ffffffff810be6e1>] 
pick_next_task_fair+0x2b1/0x4f0
Apr 25 14:14:48 (none) kernel: [ 5339.764692]  [<ffffffff81837c5f>] 
__schedule+0x15f/0xa30
Apr 25 14:14:48 (none) kernel: [ 5339.764694]  [<ffffffff81838565>] 
schedule+0x35/0x80
Apr 25 14:14:48 (none) kernel: [ 5339.764697]  [<ffffffff8183ba85>] 
schedule_hrtimeout_range_clock+0xc5/0x1b0
Apr 25 14:14:48 (none) kernel: [ 5339.764700]  [<ffffffff810ef880>] ? 
__hrtimer_init+0x90/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764703]  [<ffffffff8183ba79>] ? 
schedule_hrtimeout_range_clock+0xb9/0x1b0
Apr 25 14:14:48 (none) kernel: [ 5339.764705]  [<ffffffff8183bb83>] 
schedule_hrtimeout_range+0x13/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764709]  [<ffffffff81223914>] 
poll_schedule_timeout+0x44/0x70
Apr 25 14:14:48 (none) kernel: [ 5339.764711]  [<ffffffff81224407>] 
do_select+0x727/0x810
Apr 25 14:14:48 (none) kernel: [ 5339.764715]  [<ffffffff811fb932>] ? 
page_counter_uncharge+0x22/0x40
Apr 25 14:14:48 (none) kernel: [ 5339.764718]  [<ffffffff811fdb1c>] ? 
drain_stock.isra.33+0x6c/0xa0
Apr 25 14:14:48 (none) kernel: [ 5339.764720]  [<ffffffff810b5349>] ? 
update_curr+0x79/0x160
Apr 25 14:14:48 (none) kernel: [ 5339.764722]  [<ffffffff810b550c>] ? 
update_cfs_shares+0xbc/0x100
Apr 25 14:14:48 (none) kernel: [ 5339.764724]  [<ffffffff810b742b>] ? 
dequeue_entity+0x41b/0xa80
Apr 25 14:14:48 (none) kernel: [ 5339.764729]  [<ffffffff810719f7>] ? 
gup_pud_range+0x127/0x220
Apr 25 14:14:48 (none) kernel: [ 5339.764731]  [<ffffffff810baa9c>] ? 
set_next_entity+0x9c/0xb0
Apr 25 14:14:48 (none) kernel: [ 5339.764736]  [<ffffffff8102d66c>] ? 
__switch_to+0x1dc/0x5c0
Apr 25 14:14:48 (none) kernel: [ 5339.764740]  [<ffffffff81401304>] ? 
timerqueue_del+0x24/0x70
Apr 25 14:14:48 (none) kernel: [ 5339.764742]  [<ffffffff810efa3c>] ? 
__remove_hrtimer+0x3c/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764744]  [<ffffffff810efb61>] ? 
hrtimer_try_to_cancel+0xd1/0x130
Apr 25 14:14:48 (none) kernel: [ 5339.764746]  [<ffffffff810efbd9>] ? 
hrtimer_cancel+0x19/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764751]  [<ffffffff81101166>] ? 
futex_wait+0x206/0x280
Apr 25 14:14:48 (none) kernel: [ 5339.764753]  [<ffffffff810ab5a9>] ? 
ttwu_do_wakeup+0x19/0xe0
Apr 25 14:14:48 (none) kernel: [ 5339.764756]  [<ffffffff812246bf>] 
core_sys_select+0x1cf/0x2f0
Apr 25 14:14:48 (none) kernel: [ 5339.764758]  [<ffffffff810ef880>] ? 
__hrtimer_init+0x90/0x90
Apr 25 14:14:48 (none) kernel: [ 5339.764762]  [<ffffffff81128447>] ? 
audit_filter_rules+0x217/0xe30
Apr 25 14:14:48 (none) kernel: [ 5339.764764]  [<ffffffff81103860>] ? 
do_futex+0x120/0x540
Apr 25 14:14:48 (none) kernel: [ 5339.764768]  [<ffffffff8106428e>] ? 
kvm_clock_get_cycles+0x1e/0x20
Apr 25 14:14:48 (none) kernel: [ 5339.764772]  [<ffffffff810f53aa>] ? 
ktime_get_ts64+0x4a/0xf0
Apr 25 14:14:48 (none) kernel: [ 5339.764774]  [<ffffffff8122489a>] 
SyS_select+0xba/0x110
Apr 25 14:14:48 (none) kernel: [ 5339.764777]  [<ffffffff8183c672>] 
entry_SYSCALL_64_fastpath+0x16/0x71
Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 
]---

Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
(http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d
(http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
fix the crash. They appear to be intended as a series.

We are just waiting on final confirmation that the fix works before
beginning the SRU process.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Daniel Axtens (daxtens)
         Status: Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000050
  [24334.501611] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops: 0000 [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: ffff8803ee671c00 ti: ffff8803ee67c000 task.ti: 
ffff8803ee67c000
  [24334.601799] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:ffff8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: ffff8803ebed4c00 RBX: ffff880036529800 RCX: 
0000000000000000
  [24334.623190] RDX: 000000000225341f RSI: 0000000000000000 RDI: 
0000000000000000
  [24334.630479] RBP: ffff8803ee67fe00 R08: 0000000000000004 R09: 
0000000000000000
  [24334.637758] R10: ffff8803e7ed7600 R11: 0000000000000001 R12: 
0000000000000000
  [24334.645153] R13: 0000000000000000 R14: 00000009067729c4 R15: 
ffff8803ee672178
  [24334.652512] FS: 0000000000000000(0000) GS:ffff8803ffd00000(0000) 
knlGS:0000000000000000
  [24334.660721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [24334.666587] CR2: 0000000000000050 CR3: 00000003eacf9000 CR4: 
00000000001406e0
  [24334.673851] Stack:
  [24334.675980] ffff8803ffd16e00 ffff8803ffd16e00 ffff8803e855a200 
ffff880036529800
  [24334.683995] 0000000000000002 ffff8803ee67fe68 ffffffff810b98a6 
ffff8803ffd16e70
  [24334.692024] 0000000000016e00 ffff8803e7ed7600 ffff8803ee671c00 
0000000000000000
  [24334.700172] Call Trace:
  [24334.702750] [<ffffffff810b98a6>] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [<ffffffff818043c4>] __schedule+0x7f4/0x980
  [24334.714349] [<ffffffff81804585>] schedule+0x35/0x80
  [24334.719445] [<ffffffff8180481e>] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [<ffffffff810bf9fa>] cpu_startup_entry+0x18a/0x350
  [24334.732012] [<ffffffff8104f3d9>] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [24334.771473] RSP <ffff8803ee67fdd8>
  [24334.775077] CR2: 0000000000000050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000050
  [155852.036931] IP: [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops: 0000 [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc 
aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev 
input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [155852.118233] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [155852.127661] task: ffff8803ed29aa00 ti: ffff8800bbb10000 task.ti: 
ffff8800bbb10000
  [155852.135347] RIP: 0010:[<ffffffff810b2f0f>] [<ffffffff810b2f0f>] 
pick_next_entity+0x7f/0x160
  [155852.144120] RSP: 0018:ffff8800bbb13ce0 EFLAGS: 00010086
  [155852.149631] RAX: ffff8801725b5c00 RBX: ffff8800bb777600 RCX: 
ffff8800bb777400
  [155852.156970] RDX: ffff8803ffc96e70 RSI: 0000000000000000 RDI: 
0000000000000000
  [155852.164384] RBP: ffff8800bbb13d08 R08: ffff8803eb92e800 R09: 
ffff8803ed29aa00
  [155852.171718] R10: 0000000000000001 R11: 00000000000003cb R12: 
0000000000000000
  [155852.179052] R13: 0000000000000000 R14: 000009ad6846ff10 R15: 
0000000000000001
  [155852.186387] FS: 00007f387d1c9700(0000) GS:ffff8803ffc80000(0000) 
knlGS:0000000000000000
  [155852.194677] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [155852.200626] CR2: 0000000000000050 CR3: 00000003eb706000 CR4: 
00000000001406e0
  [155852.207967] Stack:
  [155852.210180] ffffffff810369c9 ffff8803ffc96e00 ffff8800bb777600 
0000000000000000
  [155852.218278] 00000000000012a4 ffff8800bbb13d70 ffffffff810b9b65 
ffff8803ffc96e70
  [155852.226402] 0000000000016e00 00008dbf20ccb260 ffff8803ed29aa00 
0000000000000001
  [155852.234506] Call Trace:
  [155852.237156] [<ffffffff810369c9>] ? sched_clock+0x9/0x10
  [155852.242673] [<ffffffff810b9b65>] pick_next_task_fair+0x325/0x4b0
  [155852.248968] [<ffffffff81803cd9>] __schedule+0x109/0x980
  [155852.254491] [<ffffffff81804585>] schedule+0x35/0x80
  [155852.259667] [<ffffffff8180727c>] schedule_hrtimeout_range_clock+0xac/0x130
  [155852.266838] [<ffffffff810e9fb0>] ? hrtimer_init+0x180/0x180
  [155852.272712] [<ffffffff81807270>] ? 
schedule_hrtimeout_range_clock+0xa0/0x130
  [155852.280052] [<ffffffff81807313>] schedule_hrtimeout_range+0x13/0x20
  [155852.288558] [<ffffffff812479b9>] ep_poll+0x249/0x310
  [155852.293817] [<ffffffff810a8c30>] ? wake_up_q+0x80/0x80
  [155852.299271] [<ffffffff81248efc>] SyS_epoll_wait+0xbc/0xe0
  [155852.304967] [<ffffffff81807df6>] entry_SYSCALL_64_fastpath+0x16/0x75
  [155852.311618] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 
ff ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 
2b 74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [155852.338852] RIP [<ffffffff810b2f0f>] pick_next_entity+0x7f/0x160
  [155852.345270] RSP <ffff8800bbb13ce0>
  [155852.348958] CR2: 0000000000000050
  [155852.353086] ---[ end trace 8ce693b2314611c4 ]---

  Similar issues have been reported in the community for kernels based
  on 4.4: https://github.com/kubernetes/kops/issues/874

  These panics occur in the CFS code when a next buddy is set on an
  entity that is not on a run-queue.  This causes pick_next_entity to
  end up with curr == left == NULL, which means it will call into
  wakeup_preempt_entity() with a valid next buddy and a NULL left, which
  it will try to dereference, causing a panic.

  This was confirmed by placing a WARN_ON_ONCE in set_next_buddy to
  catch when a sched_entity in the hierarchy was not on_rq, as per
  https://marc.info/?l=linux-kernel&m=146651668921468&w=2

  The stack-trace for the WARN is quite involved:

  Apr 25 14:14:48 (none) kernel: [ 5339.764597] ------------[ cut here 
]------------
  Apr 25 14:14:48 (none) kernel: [ 5339.764606] WARNING: CPU: 1 PID: 13121 at 
/build/linux-PwPelj/linux-4.4.0/kernel/sched/fair.c:5170 
set_next_buddy+0x55/0x70()
  Apr 25 14:14:48 (none) kernel: [ 5339.764608] Modules linked in: xt_nat 
xt_tcpudp ipvlan ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype 
iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter 
bridge stp llc aufs nfsd auth_rpcgss nfs_acl nfs dm_crypt lockd grace sunrpc 
fscache ppdev input_leds serio_raw parport_pc 8250_fintek parport pvpanic 
mac_hid i2c_piix4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  Apr 25 14:14:48 (none) kernel: [ 5339.764644] CPU: 1 PID: 13121 Comm: 
executor Not tainted 4.4.0-72-generic #93+hf135461v20170420b2-Ubuntu
  Apr 25 14:14:48 (none) kernel: [ 5339.764646] Hardware name: Google Google 
Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Apr 25 14:14:48 (none) kernel: [ 5339.764647]  0000000000000086 
00000000d5fbe9e0 ffff8803ed947608 ffffffff813f83c3
  Apr 25 14:14:48 (none) kernel: [ 5339.764650]  0000000000000000 
ffffffff81cbae20 ffff8803ed947640 ffffffff81081302
  Apr 25 14:14:48 (none) kernel: [ 5339.764652]  ffff8800bb5fc800 
ffff8803e7c9f000 0000000000000008 ffff8800ba1bd400
  Apr 25 14:14:48 (none) kernel: [ 5339.764655] Call Trace:
  Apr 25 14:14:48 (none) kernel: [ 5339.764665]  [<ffffffff813f83c3>] 
dump_stack+0x63/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764669]  [<ffffffff81081302>] 
warn_slowpath_common+0x82/0xc0
  Apr 25 14:14:48 (none) kernel: [ 5339.764672]  [<ffffffff8108144a>] 
warn_slowpath_null+0x1a/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764674]  [<ffffffff810b52b5>] 
set_next_buddy+0x55/0x70
  Apr 25 14:14:48 (none) kernel: [ 5339.764676]  [<ffffffff810b59a4>] 
check_preempt_wakeup+0x244/0x250
  Apr 25 14:14:48 (none) kernel: [ 5339.764679]  [<ffffffff810ab580>] 
check_preempt_curr+0x80/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764682]  [<ffffffff810b42eb>] 
attach_task+0x4b/0x60
  Apr 25 14:14:48 (none) kernel: [ 5339.764685]  [<ffffffff810be067>] 
load_balance+0x5b7/0x980
  Apr 25 14:14:48 (none) kernel: [ 5339.764688]  [<ffffffff810be6e1>] 
pick_next_task_fair+0x2b1/0x4f0
  Apr 25 14:14:48 (none) kernel: [ 5339.764692]  [<ffffffff81837c5f>] 
__schedule+0x15f/0xa30
  Apr 25 14:14:48 (none) kernel: [ 5339.764694]  [<ffffffff81838565>] 
schedule+0x35/0x80
  Apr 25 14:14:48 (none) kernel: [ 5339.764697]  [<ffffffff8183ba85>] 
schedule_hrtimeout_range_clock+0xc5/0x1b0
  Apr 25 14:14:48 (none) kernel: [ 5339.764700]  [<ffffffff810ef880>] ? 
__hrtimer_init+0x90/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764703]  [<ffffffff8183ba79>] ? 
schedule_hrtimeout_range_clock+0xb9/0x1b0
  Apr 25 14:14:48 (none) kernel: [ 5339.764705]  [<ffffffff8183bb83>] 
schedule_hrtimeout_range+0x13/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764709]  [<ffffffff81223914>] 
poll_schedule_timeout+0x44/0x70
  Apr 25 14:14:48 (none) kernel: [ 5339.764711]  [<ffffffff81224407>] 
do_select+0x727/0x810
  Apr 25 14:14:48 (none) kernel: [ 5339.764715]  [<ffffffff811fb932>] ? 
page_counter_uncharge+0x22/0x40
  Apr 25 14:14:48 (none) kernel: [ 5339.764718]  [<ffffffff811fdb1c>] ? 
drain_stock.isra.33+0x6c/0xa0
  Apr 25 14:14:48 (none) kernel: [ 5339.764720]  [<ffffffff810b5349>] ? 
update_curr+0x79/0x160
  Apr 25 14:14:48 (none) kernel: [ 5339.764722]  [<ffffffff810b550c>] ? 
update_cfs_shares+0xbc/0x100
  Apr 25 14:14:48 (none) kernel: [ 5339.764724]  [<ffffffff810b742b>] ? 
dequeue_entity+0x41b/0xa80
  Apr 25 14:14:48 (none) kernel: [ 5339.764729]  [<ffffffff810719f7>] ? 
gup_pud_range+0x127/0x220
  Apr 25 14:14:48 (none) kernel: [ 5339.764731]  [<ffffffff810baa9c>] ? 
set_next_entity+0x9c/0xb0
  Apr 25 14:14:48 (none) kernel: [ 5339.764736]  [<ffffffff8102d66c>] ? 
__switch_to+0x1dc/0x5c0
  Apr 25 14:14:48 (none) kernel: [ 5339.764740]  [<ffffffff81401304>] ? 
timerqueue_del+0x24/0x70
  Apr 25 14:14:48 (none) kernel: [ 5339.764742]  [<ffffffff810efa3c>] ? 
__remove_hrtimer+0x3c/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764744]  [<ffffffff810efb61>] ? 
hrtimer_try_to_cancel+0xd1/0x130
  Apr 25 14:14:48 (none) kernel: [ 5339.764746]  [<ffffffff810efbd9>] ? 
hrtimer_cancel+0x19/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764751]  [<ffffffff81101166>] ? 
futex_wait+0x206/0x280
  Apr 25 14:14:48 (none) kernel: [ 5339.764753]  [<ffffffff810ab5a9>] ? 
ttwu_do_wakeup+0x19/0xe0
  Apr 25 14:14:48 (none) kernel: [ 5339.764756]  [<ffffffff812246bf>] 
core_sys_select+0x1cf/0x2f0
  Apr 25 14:14:48 (none) kernel: [ 5339.764758]  [<ffffffff810ef880>] ? 
__hrtimer_init+0x90/0x90
  Apr 25 14:14:48 (none) kernel: [ 5339.764762]  [<ffffffff81128447>] ? 
audit_filter_rules+0x217/0xe30
  Apr 25 14:14:48 (none) kernel: [ 5339.764764]  [<ffffffff81103860>] ? 
do_futex+0x120/0x540
  Apr 25 14:14:48 (none) kernel: [ 5339.764768]  [<ffffffff8106428e>] ? 
kvm_clock_get_cycles+0x1e/0x20
  Apr 25 14:14:48 (none) kernel: [ 5339.764772]  [<ffffffff810f53aa>] ? 
ktime_get_ts64+0x4a/0xf0
  Apr 25 14:14:48 (none) kernel: [ 5339.764774]  [<ffffffff8122489a>] 
SyS_select+0xba/0x110
  Apr 25 14:14:48 (none) kernel: [ 5339.764777]  [<ffffffff8183c672>] 
entry_SYSCALL_64_fastpath+0x16/0x71
  Apr 25 14:14:48 (none) kernel: [ 5339.764779] ---[ end trace ace97b626b47e1f9 
]---

  Cherry-picking both 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and 094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash. They appear to be intended as a series.

  We are just waiting on final confirmation that the fix works before
  beginning the SRU process.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1687512/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to