[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-08-20 Thread Daniel Axtens
** Changed in: linux (Ubuntu)
   Status: Triaged => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released

Bug description:
  SRU Justification
  -

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc 
aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev 
input_leds mac_hid i2c_piix4 parport_pc 

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-06-06 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-79.100

---
linux (4.4.0-79.100) xenial; urgency=low

  * linux: 4.4.0-79.100 -proposed tracker (LP: #1691180)

  * linux-aws/linux-gke incorrectly producing and using linux-*-tools-
common/linux-*-cloud-tools-common (LP: #1688579)
- [Config] make linux-tools-common and linux-cloud-tools-common provide 
linux-
  gke versions
- [Config] make linux-tools-common and linux-cloud-tools-common provide 
linux-
  aws versions
- [Packaging] prevent linux-*-tools-common from being produced from non 
linux
  packages

  * CVE-2017-0605
- tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * i915-bpo crashes on external hdmi input (LP: #1580272)
- SAUCE: i915_bpo: Silence the warning about watermark entries not changing

  * Kernel panics on Xenial when using cgroups and strict CFS limits
(LP: #1687512)
- sched/fair: Initialize throttle_count for new task-groups lazily
- sched/fair: Do not announce throttled next buddy in dequeue_task_fair()

  * bonding - mlx5 - speed changed to 0 after changing ring size  (LP: #1687877)
- bonding: allow notifications for bond_set_slave_link_state

  * Xenial update to 4.4.67 stable release (LP: #1689296)
- timerfd: Protect the might cancel mechanism proper
- Handle mismatched open calls
- ASoC: intel: Fix PM and non-atomic crash in bytcr drivers
- ALSA: ppc/awacs: shut up maybe-uninitialized warning
- drbd: avoid redefinition of BITS_PER_PAGE
- mtd: avoid stack overflow in MTD CFI code
- net: tg3: avoid uninitialized variable warning
- netlink: Allow direct reclaim for fallback allocation
- IB/qib: rename BITS_PER_PAGE to RVT_BITS_PER_PAGE
- IB/ehca: fix maybe-uninitialized warnings
- ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY
- ext4 crypto: revalidate dentry after adding or removing the key
- ext4 crypto: use dget_parent() in ext4_d_revalidate()
- ext4/fscrypto: avoid RCU lookup in d_revalidate
- nfsd4: minor NFSv2/v3 write decoding cleanup
- nfsd: stricter decoding of write-like NFSv2/v3 ops
- dm ioctl: prevent stack leak in dm ioctl call
- Linux 4.4.67

  * Precision Rack failed to resume from S4 (LP: #1686061)
- x86 / hibernate: Use hlt_play_dead() when resuming from hibernation
- x86/boot: Split out kernel_ident_mapping_init()
- x86/power/64: Always create temporary identity mapping correctly

  * Xenial update to 4.4.66 stable release (LP: #1688505)
- f2fs: do more integrity verification for superblock
- xc2028: unlock on error in xc2028_set_config()
- ARM: OMAP2+: timer: add probe for clocksources
- clk: sunxi: Add apb0 gates for H3
- crypto: testmgr - fix out of bound read in __test_aead()
- drm/amdgpu: fix array out of bounds
- ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
- md:raid1: fix a dead loop when read from a WriteMostly disk
- MIPS: Fix crash registers on non-crashing CPUs
- net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
- net_sched: close another race condition in tcf_mirred_release()
- RDS: Fix the atomicity for congestion map update
- regulator: core: Clear the supply pointer if enabling fails
- usb: gadget: f_midi: Fixed a bug when buflen was smaller than 
wMaxPacketSize
- xen/x86: don't lose event interrupts
- sparc64: kern_addr_valid regression
- sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
- net: neigh: guard against NULL solicit() method
- net: phy: handle state correctly in phy_stop_machine
- l2tp: purge socket queues in the .destruct() callback
- l2tp: take reference on sessions being dumped
- l2tp: fix PPP pseudo-wire auto-loading
- net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given
- sctp: listen on the sock only when it's state is listening or closed
- tcp: clear saved_syn in tcp_disconnect()
- dp83640: don't recieve time stamps twice
- net: ipv6: RTF_PCPU should not be settable from userspace
- netpoll: Check for skb->queue_mapping
- ip6mr: fix notification device destruction
- macvlan: Fix device ref leak when purging bc_queue
- ipv6: check skb->protocol before lookup for nexthop
- ipv6: check raw payload size correctly in ioctl
- ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned
  type
- ALSA: seq: Don't break snd_use_lock_sync() loop by timeout
- MIPS: KGDB: Use kernel context for sleeping threads
- MIPS: Avoid BUG warning in arch_check_elf
- p9_client_readdir() fix
- Input: i8042 - add Clevo P650RS to the i8042 reset list
- nfsd: check for oversized NFSv2/v3 arguments
- ARCv2: save r30 on kernel entry as gcc uses it for code-gen
- ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
- Linux 4.4.66

  * Xenial 

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-26 Thread Jay Vosburgh
Customer has verified that 4.4.0-79-generic resolves the issue in their
environment that would previously panic.


** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  SRU Justification
  -

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge 

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-25 Thread Thadeu Lima de Souza Cascardo
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  SRU Justification
  -

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-24 Thread Daniel Axtens
Hi,

I didn't realise this had hit -proposed; I am in the process of
verifying the fix and will let you know ASAP.

Regards,
Daniel

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  SRU Justification
  -

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc 
aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache 

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-17 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Xenial)
   Status: Triaged => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  SRU Justification
  -

  [Impact]
  Apache Mesos and Kubernetes workloads on Xenial cause a panic
  (NULL pointer dereference) in the completely fair scheduler.

  These panics are in pick_next_entity and include pick_next_task_fair
  in the call stack.

  [Fix]
  Cherry-picking both
  754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
  (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
  and
  094f469172e00d6ab0a3130b0e01c83b3cf3a98d
  (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
  fix the crash.
  They appear to be intended as a series - they were posted to LKML at
  the same time.

  [Testcase]
  The fix has been validated by the user who reported the bug

  Bug description
  ---

  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc 
aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev 
input_leds mac_hid i2c_piix4 

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-03 Thread Daniel Axtens
** Description changed:

+ SRU Justification
+ -
+ 
+ [Impact]
+ Apache Mesos and Kubernetes workloads on Xenial cause a panic
+ (NULL pointer dereference) in the completely fair scheduler.
+ 
+ These panics are in pick_next_entity and include pick_next_task_fair
+ in the call stack.
+ 
+ [Fix]
+ Cherry-picking both
+ 754bd598be9bbc953bc709a9e8ed7f3188bfb9d7
+ (http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz)
+ and
+ 094f469172e00d6ab0a3130b0e01c83b3cf3a98d
+ (http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz)
+ fix the crash.
+ They appear to be intended as a series - they were posted to LKML at
+ the same time.
+ 
+ [Testcase]
+ The fix has been validated by the user who reported the bug
+ 
+ Bug description
+ ---
+ 
  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.
  
  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:
  
  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---
  
  and
  
  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc 
aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev 
input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [155852.118233] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  

[Kernel-packages] [Bug 1687512] Re: Kernel panics on Xenial when using cgroups and strict CFS limits

2017-05-02 Thread Joseph Salisbury
** Tags added: kernel-da-key

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Xenial)
   Status: New => Triaged

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
   Status: Confirmed => Triaged

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1687512

Title:
  Kernel panics on Xenial when using cgroups and strict CFS limits

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  We see a number of kernel panics on servers running Apache Mesos using
  cgroups with small (0.1-0.2) cpu limits.

  These all appear as NULL pointer dereferences in and around
  pick_next_entity and pick_next_task_fair, for example:

  [24334.493331] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [24334.501611] IP: [] pick_next_entity+0x7f/0x160
  [24334.507868] PGD 3eacfa067 PUD 3eacfb067 PMD 0
  [24334.512806] Oops:  [#1] SMP
  [24334.516420] Modules linked in: ipvlan xt_nat xt_tcpudp veth ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack 
x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs tcp_diag 
inet_diag nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt 
ppdev input_leds mac_hid i2c_piix4 8250_fintek parport_pc pvpanic parport 
serio_raw crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel 
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [24334.576359] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [24334.584748] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [24334.594188] task: 8803ee671c00 ti: 8803ee67c000 task.ti: 
8803ee67c000
  [24334.601799] RIP: 0010:[] [] 
pick_next_entity+0x7f/0x160
  [24334.610490] RSP: 0018:8803ee67fdd8 EFLAGS: 00010086
  [24334.615924] RAX: 8803ebed4c00 RBX: 880036529800 RCX: 

  [24334.623190] RDX: 0225341f RSI:  RDI: 

  [24334.630479] RBP: 8803ee67fe00 R08: 0004 R09: 

  [24334.637758] R10: 8803e7ed7600 R11: 0001 R12: 

  [24334.645153] R13:  R14: 0009067729c4 R15: 
8803ee672178
  [24334.652512] FS: () GS:8803ffd0() 
knlGS:
  [24334.660721] CS: 0010 DS:  ES:  CR0: 80050033
  [24334.666587] CR2: 0050 CR3: 0003eacf9000 CR4: 
001406e0
  [24334.673851] Stack:
  [24334.675980] 8803ffd16e00 8803ffd16e00 8803e855a200 
880036529800
  [24334.683995] 0002 8803ee67fe68 810b98a6 
8803ffd16e70
  [24334.692024] 00016e00 8803e7ed7600 8803ee671c00 

  [24334.700172] Call Trace:
  [24334.702750] [] pick_next_task_fair+0x66/0x4b0
  [24334.708886] [] __schedule+0x7f4/0x980
  [24334.714349] [] schedule+0x35/0x80
  [24334.719445] [] schedule_preempt_disabled+0xe/0x10
  [24334.725962] [] cpu_startup_entry+0x18a/0x350
  [24334.732012] [] start_secondary+0x149/0x170
  [24334.737895] Code: 8b 70 50 4d 2b 74 24 50 4d 85 f6 7e 59 4c 89 e7 e8 67 ff 
ff ff 49 39 c6 7f 04 4c 8b 6b 48 48 8b 43 40 48 85 c0 74 1f 4c 8b 70 50 <4d> 2b 
74 24 50 4d 85 f6 7e 2c 4c 89 e7 e8 3f ff ff ff 49 39 c6
  [24334.765124] RIP [] pick_next_entity+0x7f/0x160
  [24334.771473] RSP 
  [24334.775077] CR2: 0050
  [24334.779121] ---[ end trace 05d941efb97b7bae ]---

  and

  [155852.028575] BUG: unable to handle kernel NULL pointer dereference at 
0050
  [155852.036931] IP: [] pick_next_entity+0x7f/0x160
  [155852.043491] PGD 3ebae8067 PUD 3ebae9067 PMD 0
  [155852.048550] Oops:  [#1] SMP
  [155852.052437] Modules linked in: ipvlan veth xt_nat xt_tcpudp 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc 
aufs nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache dm_crypt ppdev 
input_leds mac_hid i2c_piix4 parport_pc 8250_fintek pvpanic parport serio_raw 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd psmouse virtio_scsi
  [155852.109847] CPU: 1 PID: 2215 Comm: ruby Not tainted 4.4.0-66-generic 
#87~14.04.1-Ubuntu
  [155852.118233] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
  [155852.127661] task: 8803ed29aa00 ti: 8800bbb1 task.ti: 
8800bbb1