** Description changed:

  [Impact]
  
- The following Oops was discovered by user:
+ * Line discipline code is racy when we have buffer being flush while the
+ tty is being initialized or reinitialized. For the first problem, we
+ have an upstream patch since January 2018: b027e2298bd5 ("tty: fix data
+ race between tty_init_dev and flush of buf") - although it is not in
+ Ubuntu kernel 4.4, only in kernels 4.15 and subsequent ones.
  
- [684766.666639] BUG: unable to handle kernel paging request at 
0000000000002268
- [684766.667642] IP: [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0
- [684766.668487] PGD 80000019574fe067 PUD 19574ff067 PMD 0
- [684766.669194] Oops: 0000 [#1] SMP
- [684766.669687] Modules linked in: xt_nat dccp_diag dccp tcp_diag udp_diag 
inet_diag unix_diag xt_connmark ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink 
nfnetlink veth ip6table_filter ip6_tables xt_tcpmss xt_multiport xt_conntrack 
iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle xt_CT iptable_raw 
ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat ip_tables x_tables 
target_core_mod configfs softdog scini(POE) ib_iser rdma_cm iw_cm ib_cm ib_sa 
ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
openvswitch(OE) nf_nat_ipv6 nf_nat_ipv4 nf_nat gre kvm_intel kvm irqbypass ttm 
crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel drm 
aesni_intel aes_x86_64 i2c_piix4 lrw gf128mul fb_sys_fops syscopyarea 
glue_helper sysfillrect ablk_helper cryptd sysimgblt joydev
- [684766.679406]  input_leds mac_hid serio_raw 8250_fintek br_netfilter bridge 
stp llc nf_conntrack_proto_gre nf_conntrack_ipv6 nf_defrag_ipv6 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack xfs raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 psmouse multipath floppy pata_acpi linear dm_multipath
- [684766.683585] CPU: 15 PID: 7470 Comm: kworker/u40:1 Tainted: P           OE 
  4.4.0-124-generic #148~14.04.1-Ubuntu
- [684766.684967] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
- [684766.686062] Workqueue: events_unbound flush_to_ldisc
- [684766.686703] task: ffff88165e5d8000 ti: ffff88170dc2c000 task.ti: 
ffff88170dc2c000
- [684766.687670] RIP: 0010:[<ffffffff814e2a5a>]  [<ffffffff814e2a5a>] 
n_tty_receive_buf_common+0x6a/0xae0
- [684766.688870] RSP: 0018:ffff88170dc2fd28  EFLAGS: 00010202
- [684766.689521] RAX: 0000000000000000 RBX: ffff88162c895000 RCX: 
0000000000000001
- [684766.690488] RDX: 0000000000000000 RSI: ffff88162c895020 RDI: 
ffff8819c2d3d4d8
- [684766.691518] RBP: ffff88170dc2fdc0 R08: 0000000000000001 R09: 
ffffffff81ec2ba0
- [684766.692480] R10: 0000000000000004 R11: 0000000000000000 R12: 
ffff8819c2d3d400
- [684766.693423] R13: ffff8819c45b2670 R14: ffff8816a358c028 R15: 
ffff8819c2d3d400
- [684766.694390] FS:  0000000000000000(0000) GS:ffff8819d73c0000(0000) 
knlGS:0000000000000000
- [684766.695484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
- [684766.696182] CR2: 0000000000002268 CR3: 0000001957520000 CR4: 
0000000000360670
- [684766.697141] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
- [684766.698114] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
- [684766.699079] Stack:
- [684766.699412]  0000000000000000 ffff8819c2d3d4d8 0000000000000000 
ffff8819c2d3d648
- [684766.700467]  ffff8819c2d3d620 ffff8819c9c10400 ffff88170dc2fd68 
ffffffff8106312e
- [684766.701501]  ffff88170dc2fd78 0000000000000001 0000000000000000 
ffff88162c895020
- [684766.702534] Call Trace:
- [684766.702905]  [<ffffffff8106312e>] ? kvm_sched_clock_read+0x1e/0x30
- [684766.703685]  [<ffffffff814e34e4>] n_tty_receive_buf2+0x14/0x20
- [684766.704505]  [<ffffffff814e5f05>] flush_to_ldisc+0xd5/0x120
- [684766.705269]  [<ffffffff81099506>] process_one_work+0x156/0x400
- [684766.706008]  [<ffffffff81099eea>] worker_thread+0x11a/0x480
- [684766.706686]  [<ffffffff81099dd0>] ? rescuer_thread+0x310/0x310
- [684766.707386]  [<ffffffff8109f3b8>] kthread+0xd8/0xf0
- [684766.707993]  [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60
- [684766.708664]  [<ffffffff8181a9b5>] ret_from_fork+0x55/0x80
- [684766.709335]  [<ffffffff8109f2e0>] ? kthread_park+0x60/0x60
- [684766.709998] Code: 85 70 ff ff ff e8 97 5f 33 00 49 8d 87 20 02 00 00 c7 
45 b4 00 00 00 00 48 89 45 88 49 8d 87 48 02 00 00 48 89 45 80 48 8b 45 b8 <48> 
8b b0 68 22 00 00 48 8b 08 89 f0 29 c8 41 f6 87 30 01 00 00
- [684766.713290] RIP  [<ffffffff814e2a5a>] n_tty_receive_buf_common+0x6a/0xae0
- [684766.714105]  RSP <ffff88170dc2fd28>
- [684766.714609] CR2: 0000000000002268
+ * For the race between the buffer flush while tty is being reopened, we
+ have a patch that addresses this issue recently merged for 5.0-rc1:
+ 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()"). No
+ Ubuntu kernel currently contains this patch, hence we're hereby
+ submitting the SRU request. The upstream complete patch series for this
+ is in [0].
  
- The issue happened in a VM
- KDUMP was configured, so a full Kernel crashdump was created
+ * The approach of both patches are similar - they rely in locking/semaphore 
to prevent race conditions. Some additional patches are
+ necessary to prevent correlated issues, like preventing a potential deadlock 
due to bad prioritization in servicing I/O over releasing
+ tty_ldisc_lock() - refer to c96cf923a98d ("tty: Don't block on IO when ldisc 
change is pending"). All the necessary fixes are grouped here in this SRU 
request.
  
- User has Ubuntu Trusty, Kernel 4.4.0-124 on its VM
+ * The symptom of the race condition between the buffer flush and the tty
+ reopen routine is a kernel crash with the following trace:
+ 
+ BUG: unable to handle kernel paging request at 0000000000002268
+ IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
+ [...]
+ Call Trace:
+ [<addr>] ? kvm_sched_clock_read+0x1e/0x30
+ [<addr>] n_tty_receive_buf2+0x14/0x20
+ [<addr>] flush_to_ldisc+0xd5/0x120
+ [<addr>] process_one_work+0x156/0x400
+ [<addr>] worker_thread+0x11a/0x480
+ [...]
+ 
+ * A kernel crash was collected from an user, analysis is present in
+ comment #4 in this LP.
+ 
  
  [Test Case]
  
- * Deploy a Trusty KVM instance with a LTS Xenial kernel (v4.4 series)
- * SSH in frequently while system is under load, send commands before the 
prompt has returned.
- ----
+ * It is not trivial to trigger this fault, but the usual recipe is to
+ keep accessing a machine through SSH (or IPMI serial console) and in
+ some way run commands before the terminal is ready in that machine (like
+ hacking some echo into ttySx or pts in an infinite loop).
  
- Check comment #5 for a summary about the upstream proposals to resolve
- this issue.
+ * We have reports of users that could reproduce this issue in their
+ production environment, and with the patches present in this SRU request
+ the problem was fixed.
+ 
+ 
+ [Regression Potential]
+ 
+ * tty subsystem is highly central and patches in that area are always
+ delicate. For example, the upstream series [0] is a re-spin (V6) due to
+ a hard to reproduce issue reported in the PA-RISC architecture, which
+ was found in the V5 iteration [1] but was fixed by the patch
+ c96cf923a98d, present in this SRU request.
+ 
+ * The patchset [0] is present in tty-next tree since mid-November, and
+ the patch b027e2298bd5 is available upstream since January/2018 (it's
+ available in both Ubuntu kernels 4.15 and 4.18), so the overall
+ likelihood of regressions is low.
+ 
+ * These patches were sniff-tested for the 3 versions (4.4, 4.15 and
+ 4.18) and didn't show any issues.
+ 
+ 
+ [0] https://marc.info/?l=linux-kernel&m=154103190111795
+ [1] https://marc.info/?l=linux-kernel&m=153737852618183

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1791758

Title:
  ldisc crash on reopened tty

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791758/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to