** Description changed:
[Impact]
- The following Oops was discovered by user:
+ * Line discipline code is racy when we have buffer being flush while the
+ tty is being initialized or reinitialized. For the first problem, we
+ have an upstream patch since January 2018: b027e2298bd5 ("tty: fix data
+ race between tty_init_dev and flush of buf") - although it is not in
+ Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones.
- [684766.666639] BUG: unable to handle kernel paging request at
0000000000002268
- [684766.667642] IP: [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0
- [684766.668487] PGD 80000019574fe067 PUD 19574ff067 PMD 0
- [684766.669194] Oops: 0000 [#1] SMP
- [684766.669687] Modules linked in: xt_nat dccp_diag dccp tcp_diag udp_diag
inet_diag unix_diag xt_connmark ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink
nfnetlink veth ip6table_filter ip6_tables xt_tcpmss xt_multiport xt_conntrack
iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle xt_CT iptable_raw
ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat ip_tables x_tables
target_core_mod configfs softdog scini(POE) ib_iser rdma_cm iw_cm ib_cm ib_sa
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
openvswitch(OE) nf_nat_ipv6 nf_nat_ipv4 nf_nat gre kvm_intel kvm irqbypass ttm
crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel drm
aesni_intel aes_x86_64 i2c_piix4 lrw gf128mul fb_sys_fops syscopyarea
glue_helper sysfillrect ablk_helper cryptd sysimgblt joydev
- [684766.679406] input_leds mac_hid serio_raw 8250_fintek br_netfilter bridge
stp llc nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xfs raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid1 raid0 psmouse multipath floppy pata_acpi linear dm_multipath
- [684766.683585] CPU: 15 PID: 7470 Comm: kworker/u40:1 Tainted: P OE
4.4.0-124-generic #148~14.04.1-Ubuntu
- [684766.684967] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Bochs 01/01/2011
- [684766.686062] Workqueue: events_unbound flush_to_ldisc
- [684766.686703] task: ffff88165e5d8000 ti: ffff88170dc2c000 task.ti:
ffff88170dc2c000
- [684766.687670] RIP: 0010:[<ffffffff814e2a5a>] [<ffffffff814e2a5a>]
n_tty_receive_buf_common+0x6a/0xae0
- [684766.688870] RSP: 0018:ffff88170dc2fd28 EFLAGS: 00010202
- [684766.689521] RAX: 0000000000000000 RBX: ffff88162c895000 RCX:
0000000000000001
- [684766.690488] RDX: 0000000000000000 RSI: ffff88162c895020 RDI:
ffff8819c2d3d4d8
- [684766.691518] RBP: ffff88170dc2fdc0 R08: 0000000000000001 R09:
ffffffff81ec2ba0
- [684766.692480] R10: 0000000000000004 R11: 0000000000000000 R12:
ffff8819c2d3d400
- [684766.693423] R13: ffff8819c45b2670 R14: ffff8816a358c028 R15:
ffff8819c2d3d400
- [684766.694390] FS: 0000000000000000(0000) GS:ffff8819d73c0000(0000)
knlGS:0000000000000000
- [684766.695484] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [684766.696182] CR2: 0000000000002268 CR3: 0000001957520000 CR4:
0000000000360670
- [684766.697141] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
- [684766.698114] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
- [684766.699079] Stack:
- [684766.699412] 0000000000000000 ffff8819c2d3d4d8 0000000000000000
ffff8819c2d3d648
- [684766.700467] ffff8819c2d3d620 ffff8819c9c10400 ffff88170dc2fd68
ffffffff8106312e
- [684766.701501] ffff88170dc2fd78 0000000000000001 0000000000000000
ffff88162c895020
- [684766.702534] Call Trace:
- [684766.702905] [<ffffffff8106312e>] ? kvm_sched_clock_read+0x1e/0x30
- [684766.703685] [<ffffffff814e34e4>] n_tty_receive_buf2+0x14/0x20
- [684766.704505] [<ffffffff814e5f05>] flush_to_ldisc+0xd5/0x120
- [684766.705269] [<ffffffff81099506>] process_one_work+0x156/0x400
- [684766.706008] [<ffffffff81099eea>] worker_thread+0x11a/0x480
- [684766.706686] [<ffffffff81099dd0>] ? rescuer_thread+0x310/0x310
- [684766.707386] [<ffffffff8109f3b8>] kthread+0xd8/0xf0
- [684766.707993] [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60
- [684766.708664] [<ffffffff8181a9b5>] ret_from_fork+0x55/0x80
- [684766.709335] [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60
- [684766.709998] Code: 85 70 ff ff ff e8 97 5f 33 00 49 8d 87 20 02 00 00 c7
45 b4 00 00 00 00 48 89 45 88 49 8d 87 48 02 00 00 48 89 45 80 48 8b 45 b8 <48>
8b b0 68 22 00 00 48 8b 08 89 f0 29 c8 41 f6 87 30 01 00 00
- [684766.713290] RIP [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0
- [684766.714105] RSP <ffff88170dc2fd28>
- [684766.714609] CR2: 0000000000002268
+ * For the race between the buffer flush while tty is being reopened, we
+ have a patch that addresses this issue recently merged for 5.0-rc1:
+ 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No
+ Ubuntu kernel currently contains this patch, hence we're hereby
+ submitting the SRU request. The upstream complete patch series for this
+ is in [0].
- The issue happened in a VM
- KDUMP was configured, so a full Kernel crashdump was created
+ * The approach of both patches are similar - they rely in locking/semaphore
to prevent race conditions. Some additional patches are
+ necessary to prevent correlated issues, like preventing a potential deadlock
due to bad prioritization in servicing I/O over releasing
+ tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc
change is pending"). All the necessary fixes are grouped here in this SRU
request.
- User has Ubuntu Trusty, Kernel 4.4.0-124 on its VM
+ * The symptom of the race condition between the buffer flush and the tty
+ reopen routine is a kernel crash with the following trace:
+
+ BUG: unable to handle kernel paging request at 0000000000002268
+ IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
+ [...]
+ Call Trace:
+ [<addr>] ? kvm_sched_clock_read+0x1e/0x30
+ [<addr>] n_tty_receive_buf2+0x14/0x20
+ [<addr>] flush_to_ldisc+0xd5/0x120
+ [<addr>] process_one_work+0x156/0x400
+ [<addr>] worker_thread+0x11a/0x480
+ [...]
+
+ * A kernel crash was collected from an user, analysis is present in
+ comment #4 in this LP.
+
[Test Case]
- * Deploy a Trusty KVM instance with a LTS Xenial kernel (v4.4 series)
- * SSH in frequently while system is under load, send commands before the
prompt has returned.
- ----
+ * It is not trivial to trigger this fault, but the usual recipe is to
+ keep accessing a machine through SSH (or IPMI serial console) and in
+ some way run commands before the terminal is ready in that machine (like
+ hacking some echo into ttySx or pts in an infinite loop).
- Check comment #5 for a summary about the upstream proposals to resolve
- this issue.
+ * We have reports of users that could reproduce this issue in their
+ production environment, and with the patches present in this SRU request
+ the problem was fixed.
+
+
+ [Regression Potential]
+
+ * tty subsystem is highly central and patches in that area are always
+ delicate. For example, the upstream series [0] is a re-spin (V6) due to
+ a hard to reproduce issue reported in the PA-RISC architecture, which
+ was found in the V5 iteration [1] but was fixed by the patch
+ c96cf923a98d, present in this SRU request.
+
+ * The patchset [0] is present in tty-next tree since mid-November, and
+ the patch b027e2298bd5 is available upstream since January/2018 (it's
+ available in both Ubuntu kernels 4.15 and 4.18), so the overall
+ likelihood of regressions is low.
+
+ * These patches were sniff-tested for the 3 versions (4.4, 4.15 and
+ 4.18) and didn't show any issues.
+
+
+ [0] https://marc.info/?l=linux-kernel&m=154103190111795
+ [1] https://marc.info/?l=linux-kernel&m=153737852618183
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1791758
Title:
ldisc crash on reopened tty
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791758/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs