[Group.of.nepali.translators] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2022-04-25 Thread Mauricio Faria de Oliveira
** Changed in: linux (Ubuntu Xenial)
   Status: Confirmed => New

** Changed in: linux (Ubuntu)
   Status: Confirmed => Fix Released

** Changed in: linux (Ubuntu Xenial)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1802021

Title:
  [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

Status in linux package in Ubuntu:
  Fix Released
Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux-azure source package in Xenial:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in linux-azure source package in Cosmic:
  Fix Released

Bug description:
  We had a customer seeing traces like the following:

  tack trace from kern.log:
  2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task 
kworker/u16:0:16678 blocked for more than 120 seconds.
  2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 
4.15.0-1023-azure #24~16.04.1-Ubuntu
  2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 
0x8000
  2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
  2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace:
  2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0
  2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? 
check_preempt_wakeup+0xfb/0x240
  2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? 
sched_clock_local+0x17/0x90
  2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80
  2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: 
schedule_timeout+0x1db/0x370
  2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? 
__enqueue_entity+0x5c/0x60
  2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? 
enqueue_entity+0x112/0x670
  2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: 
wait_for_completion+0xb4/0x140
  2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70
  2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: 
__synchronize_srcu.part.13+0x85/0xb0
  2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? 
trace_raw_output_rcu_utilization+0x50/0x50
  2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? 
synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: 
fsnotify_mark_destroy_workfn+0x7c/0xe0
  2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: 
process_one_work+0x14d/0x410
  2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460
  2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140
  2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? 
process_one_work+0x410/0x410
  2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? 
kthread_destroy_worker+0x50/0x50
  2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130
  2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20
  2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40

  Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120
  seconds.

  We are seeing more issue with fsnotify related callbacks. These are
  not a soft/hard lockup but seem to significantly degrade the
  responsiveness of systemd (and from there everything else).

  The following upstream commit may fix this issue, but it is in Paul's
  RCU tree and not in linux-next or upstream yet:

  https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
  rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f

  srcu: Lock srcu_data structure in srcu_gp_start()
  The srcu_gp_start() function is called with the srcu_struct structure's
  ->lock held, but not with the srcu_data structure's ->lock.  This is
  problematic because this function accesses and updates the srcu_data
  structure's ->srcu_cblist, which is protected by that lock.  Failing to
  hold this lock can result in corruption of the SRCU callback lists,
  which in turn can result in arbitrarily bad results.

  This commit therefore makes srcu_gp_start() acquire the srcu_data
  structure's ->lock across the calls to rcu_segcblist_advance() and
  rcu_segcblist_accelerate(), thus preventing this corruption.

  Please investigate this issue and evaluate the proposed fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions


___
Mailing list: https://launchpad.net/~group.of.nepali.translators
Post 

[Group.of.nepali.translators] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-04-02 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-47.50

---
linux (4.15.0-47.50) bionic; urgency=medium

  * linux: 4.15.0-47.50 -proposed tracker (LP: #1819716)

  * Packaging resync (LP: #1786013)
- [Packaging] resync getabis
- [Packaging] update helper scripts
- [Packaging] resync retpoline extraction

  * C++ demangling support missing from perf (LP: #1396654)
- [Packaging] fix a mistype

  * arm-smmu-v3 arm-smmu-v3.3.auto: CMD_SYNC timeout (LP: #1818162)
- iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout

  * Crash in nvme_irq_check() when using threaded interrupts (LP: #1818747)
- nvme-pci: fix out of bounds access in nvme_cqe_pending

  * CVE-2019-9213
- mm: enforce min addr even if capable() in expand_downwards()

  * CVE-2019-3460
- Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt

  * amdgpu with mst WARNING on blanking (LP: #1814308)
- drm/amd/display: Don't use dc_link in link_encoder
- drm/amd/display: Move wait for hpd ready out from edp power control.
- drm/amd/display: eDP sequence BL off first then DP blank.
- drm/amd/display: Fix unused variable compilation error
- drm/amd/display: Fix warning about misaligned code
- drm/amd/display: Fix MST dp_blank REG_WAIT timeout

  * tun/tap: unable to manage carrier state from userland (LP: #1806392)
- tun: implement carrier change

  * CVE-2019-8980
- exec: Fix mem leak in kernel_read_file

  * raw_skew in timer from the ubuntu_kernel_selftests failed on Bionic
(LP: #1811194)
- selftest: timers: Tweak raw_skew to SKIP when ADJ_OFFSET/other clock
  adjustments are in progress

  * [Packaging] Allow overlay of config annotations (LP: #1752072)
- [Packaging] config-check: Add an include directive

  * CVE-2019-7308
- bpf: move {prev_,}insn_idx into verifier env
- bpf: move tmp variable into ax register in interpreter
- bpf: enable access to ax register also from verifier rewrite
- bpf: restrict map value pointer arithmetic for unprivileged
- bpf: restrict stack pointer arithmetic for unprivileged
- bpf: restrict unknown scalars of mixed signed bounds for unprivileged
- bpf: fix check_map_access smin_value test when pointer contains offset
- bpf: prevent out of bounds speculation on pointer arithmetic
- bpf: fix sanitation of alu op with pointer / scalar type from different
  paths
- bpf: add various test cases to selftests

  * CVE-2017-5753
- bpf: properly enforce index mask to prevent out-of-bounds speculation
- bpf: fix inner map masking to prevent oob under speculation

  * BPF: kernel pointer leak to unprivileged userspace (LP: #1815259)
- bpf/verifier: disallow pointer subtraction

  * squashfs hardening (LP: #1816756)
- squashfs: more metadata hardening
- squashfs metadata 2: electric boogaloo
- squashfs: more metadata hardening
- Squashfs: Compute expected length from inode size rather than block length

  * efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted (LP: #1814982)
- efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted

  * Update ENA driver to version 2.0.3K (LP: #1816806)
- net: ena: update driver version from 2.0.2 to 2.0.3
- net: ena: fix race between link up and device initalization
- net: ena: fix crash during failed resume from hibernation

  * ipset kernel error: 4.15.0-43-generic (LP: #1811394)
- netfilter: ipset: Fix wraparound in hash:*net* types

  * Silent "Unknown key" message when pressing keyboard backlight hotkey
(LP: #1817063)
- platform/x86: dell-wmi: Ignore new keyboard backlight change event

  * CVE-2018-18021
- arm64: KVM: Tighten guest core register access from userspace
- KVM: arm/arm64: Introduce vcpu_el1_is_32bit
- arm64: KVM: Sanitize PSTATE.M when being set from userspace

  * CVE-2018-14678
- x86/entry/64: Remove %ebx handling from error_entry/exit

  * CVE-2018-19824
- ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in 
card.c

  * CVE-2019-3459
- Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer

  * Bionic update: upstream stable patchset 2019-02-08 (LP: #1815234)
- fork: unconditionally clear stack on fork
- spi: spi-s3c64xx: Fix system resume support
- Input: elan_i2c - add ACPI ID for lenovo ideapad 330
- Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
- Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
- kvm, mm: account shadow page tables to kmemcg
- delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
- tracing: Fix double free of event_trigger_data
- tracing: Fix possible double free in event_enable_trigger_func()
- kthread, tracing: Don't expose half-written comm when creating kthreads
- tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
- tracing: Quiet gcc warning about maybe unused link 

[Group.of.nepali.translators] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-03-05 Thread Launchpad Bug Tracker
This bug was fixed in the package linux-azure - 4.15.0-1040.44

---
linux-azure (4.15.0-1040.44) xenial; urgency=medium

  * linux-azure: 4.15.0-1040.44 -proposed tracker (LP: #1817038)

  * Packaging resync (LP: #1786013)
- [Packaging] resync retpoline extraction

  * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure
(LP: #1813866)
- [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE
- [Config] Update configs

  * Allow I/O schedulers to be loaded with modprobe in linux-azure
(LP: #1813211)
- [Config] linux-azure: Enable all IO schedulers as modules

  * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021)
- srcu: Prohibit call_srcu() use under raw spinlocks
- srcu: Lock srcu_data structure in srcu_gp_start()

  [ Ubuntu: 4.15.0-46.49 ]

  * linux: 4.15.0-46.49 -proposed tracker (LP: #1814726)
  * mprotect fails on ext4 with dax (LP: #1799237)
- x86/speculation/l1tf: Exempt zeroed PTEs from inversion
  * kernel BUG at /build/linux-vxxS7y/linux-4.15.0/mm/slub.c:296! (LP: #1812086)
- iscsi target: fix session creation failure handling
- scsi: iscsi: target: Set conn->sess to NULL when 
iscsi_login_set_conn_values
  fails
- scsi: iscsi: target: Fix conn_ops double free
  * user_copy in user from ubuntu_kernel_selftests failed on KVM kernel
(LP: #1812198)
- selftests: user: return Kselftest Skip code for skipped tests
- selftests: kselftest: change KSFT_SKIP=4 instead of KSFT_PASS
- selftests: kselftest: Remove outdated comment
  * RTL8822BE WiFi Disabled in Kernel 4.18.0-12 (LP: #1806472)
- SAUCE: staging: rtlwifi: allow RTLWIFI_DEBUG_ST to be disabled
- [Config] CONFIG_RTLWIFI_DEBUG_ST=n
- SAUCE: Add r8822be to signature inclusion list
  * kernel oops in bcache module (LP: #1793901)
- SAUCE: bcache: never writeback a discard operation
  * CVE-2018-18397
- userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
- userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
- userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
- userfaultfd: shmem: add i_size checks
- userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  * Ignore "incomplete report" from Elan touchpanels (LP: #1813733)
- HID: i2c-hid: Ignore input report if there's no data present on Elan
  touchpanels
  * Vsock connect fails with ENODEV for large CID (LP: #1813934)
- vhost/vsock: fix vhost vsock cid hashing inconsistent
  * SRU: Fix thinkpad 11e 3rd boot hang (LP: #1804604)
- ACPI / LPSS: Force LPSS quirks on boot
  * Bionic update: upstream stable patchset 2019-01-17 (LP: #1812229)
- scsi: sd_zbc: Fix variable type and bogus comment
- KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in
  parallel.
- x86/apm: Don't access __preempt_count with zeroed fs
- x86/events/intel/ds: Fix bts_interrupt_threshold alignment
- x86/MCE: Remove min interval polling limitation
- fat: fix memory allocation failure handling of match_strdup()
- ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
- ARCv2: [plat-hsdk]: Save accl reg pair by default
- ARC: Fix CONFIG_SWAP
- ARC: configs: Remove CONFIG_INITRAMFS_SOURCE from defconfigs
- ARC: mm: allow mprotect to make stack mappings executable
- mm: memcg: fix use after free in mem_cgroup_iter()
- mm/huge_memory.c: fix data loss when splitting a file pmd
- cpufreq: intel_pstate: Register when ACPI PCCH is present
- vfio/pci: Fix potential Spectre v1
- stop_machine: Disable preemption when waking two stopper threads
- drm/i915: Fix hotplug irq ack on i965/g4x
- drm/nouveau: Use drm_connector_list_iter_* for iterating connectors
- drm/nouveau: Avoid looping through fake MST connectors
- gen_stats: Fix netlink stats dumping in the presence of padding
- ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
- ipv6: fix useless rol32 call on hash
- ipv6: ila: select CONFIG_DST_CACHE
- lib/rhashtable: consider param->min_size when setting initial table size
- net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort
- net: Don't copy pfmemalloc flag in __copy_skb_header()
- skbuff: Unconditionally copy pfmemalloc in __skb_clone()
- net/ipv4: Set oif in fib_compute_spec_dst
- net: phy: fix flag masking in __set_phy_supported
- ptp: fix missing break in switch
- qmi_wwan: add support for Quectel EG91
- tg3: Add higher cpu clock for 5762.
- hv_netvsc: Fix napi reschedule while receive completion is busy
- net/mlx4_en: Don't reuse RX page when XDP is set
- net: systemport: Fix CRC forwarding check for SYSTEMPORT Lite
- ipv6: make DAD fail with enhanced DAD when nonce length differs
- net: usb: asix: replace mii_nway_restart in resume path
- alpha: fix osf_wait4() breakage
- 

[Group.of.nepali.translators] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-02-25 Thread Launchpad Bug Tracker
This bug was fixed in the package linux-azure - 4.18.0-1011.11

---
linux-azure (4.18.0-1011.11) cosmic; urgency=medium

  * linux-azure: 4.18.0-1011.11 -proposed tracker (LP: #1816081)

  * 4.15.0-1037 does not see all PCI devices on GPU VMs (LP: #1816106)
- Revert "PCI: hv: Make sure the bus domain is really unique"

linux-azure (4.18.0-1009.9) cosmic; urgency=medium

  * Allow I/O schedulers to be loaded with modprobe in linux-azure
(LP: #1813211)
- [Config] linux-azure: Enable all IO schedulers as modules

  * [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start() (LP: #1802021)
- srcu: Lock srcu_data structure in srcu_gp_start()

  * CONFIG_SECURITY_SELINUX_DISABLE should be disabled on 4.15/4.18 Azure
(LP: #1813866)
- [Config]: disable CONFIG_SECURITY_SELINUX_DISABLE

  [ Ubuntu: 4.18.0-15.16 ]

  * Ubuntu boot failure. 4.18.0-14 boot stalls. (does not boot) (LP: #1814555)
- Revert "drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5"
  * Userspace break as a result of missing patch backport (LP: #1813873)
- tty: Don't hold ldisc lock in tty_reopen() if ldisc present

 -- Stefan Bader   Fri, 15 Feb 2019 17:16:24
+0100

** Changed in: linux-azure (Ubuntu)
   Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1802021

Title:
  [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

Status in linux package in Ubuntu:
  Confirmed
Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  New
Status in linux-azure source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Confirmed
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Confirmed
Status in linux-azure source package in Cosmic:
  Fix Released

Bug description:
  We had a customer seeing traces like the following:

  tack trace from kern.log:
  2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task 
kworker/u16:0:16678 blocked for more than 120 seconds.
  2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 
4.15.0-1023-azure #24~16.04.1-Ubuntu
  2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 
0x8000
  2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
  2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace:
  2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0
  2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? 
check_preempt_wakeup+0xfb/0x240
  2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? 
sched_clock_local+0x17/0x90
  2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80
  2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: 
schedule_timeout+0x1db/0x370
  2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? 
__enqueue_entity+0x5c/0x60
  2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? 
enqueue_entity+0x112/0x670
  2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: 
wait_for_completion+0xb4/0x140
  2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70
  2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: 
__synchronize_srcu.part.13+0x85/0xb0
  2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? 
trace_raw_output_rcu_utilization+0x50/0x50
  2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? 
synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: 
fsnotify_mark_destroy_workfn+0x7c/0xe0
  2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: 
process_one_work+0x14d/0x410
  2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460
  2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140
  2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? 
process_one_work+0x410/0x410
  2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? 
kthread_destroy_worker+0x50/0x50
  2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130
  2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20
  2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40

  Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120
  seconds.

  We are seeing more issue with fsnotify related callbacks. These are
  not a soft/hard lockup but seem to significantly degrade the
  responsiveness of systemd (and from there everything else).

  The following upstream commit may fix this issue, but it is in Paul's
  RCU tree and not in linux-next or upstream yet:

  

[Group.of.nepali.translators] [Bug 1802021] Re: [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

2019-02-21 Thread Kleber Sacilotto de Souza
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux-azure (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Changed in: linux-azure (Ubuntu Xenial)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1802021

Title:
  [Hyper-V] srcu: Lock srcu_data structure in srcu_gp_start()

Status in linux package in Ubuntu:
  Confirmed
Status in linux-azure package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  New
Status in linux-azure source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Confirmed
Status in linux-azure source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Confirmed
Status in linux-azure source package in Cosmic:
  Fix Released

Bug description:
  We had a customer seeing traces like the following:

  tack trace from kern.log:
  2018-10-10T04:43:08.542464+00:00 hbp2ann-2 kernel: INFO: task 
kworker/u16:0:16678 blocked for more than 120 seconds.
  2018-10-10T04:43:08.542503+00:00 hbp2ann-2 kernel: Not tainted 
4.15.0-1023-azure #24~16.04.1-Ubuntu
  2018-10-10T04:43:08.542513+00:00 hbp2ann-2 kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
  2018-10-10T04:43:08.547366+00:00 hbp2ann-2 kernel: kworker/u16:0 D 0 16678 2 
0x8000
  2018-10-10T04:43:08.547386+00:00 hbp2ann-2 kernel: Workqueue: events_unbound 
fsnotify_mark_destroy_workfn
  2018-10-10T04:43:08.547395+00:00 hbp2ann-2 kernel: Call Trace:
  2018-10-10T04:43:08.547413+00:00 hbp2ann-2 kernel: __schedule+0x3d6/0x8b0
  2018-10-10T04:43:08.547422+00:00 hbp2ann-2 kernel: ? 
check_preempt_wakeup+0xfb/0x240
  2018-10-10T04:43:08.547431+00:00 hbp2ann-2 kernel: ? 
sched_clock_local+0x17/0x90
  2018-10-10T04:43:08.547440+00:00 hbp2ann-2 kernel: schedule+0x36/0x80
  2018-10-10T04:43:08.547448+00:00 hbp2ann-2 kernel: 
schedule_timeout+0x1db/0x370
  2018-10-10T04:43:08.547458+00:00 hbp2ann-2 kernel: ? 
__enqueue_entity+0x5c/0x60
  2018-10-10T04:43:08.547467+00:00 hbp2ann-2 kernel: ? 
enqueue_entity+0x112/0x670
  2018-10-10T04:43:08.547477+00:00 hbp2ann-2 kernel: 
wait_for_completion+0xb4/0x140
  2018-10-10T04:43:08.547486+00:00 hbp2ann-2 kernel: ? wake_up_q+0x70/0x70
  2018-10-10T04:43:08.547510+00:00 hbp2ann-2 kernel: 
__synchronize_srcu.part.13+0x85/0xb0
  2018-10-10T04:43:08.547535+00:00 hbp2ann-2 kernel: ? 
trace_raw_output_rcu_utilization+0x50/0x50
  2018-10-10T04:43:08.547560+00:00 hbp2ann-2 kernel: synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547594+00:00 hbp2ann-2 kernel: ? 
synchronize_srcu+0xd3/0xe0
  2018-10-10T04:43:08.547604+00:00 hbp2ann-2 kernel: 
fsnotify_mark_destroy_workfn+0x7c/0xe0
  2018-10-10T04:43:08.547612+00:00 hbp2ann-2 kernel: 
process_one_work+0x14d/0x410
  2018-10-10T04:43:08.547620+00:00 hbp2ann-2 kernel: worker_thread+0x4b/0x460
  2018-10-10T04:43:08.547628+00:00 hbp2ann-2 kernel: kthread+0x105/0x140
  2018-10-10T04:43:08.547637+00:00 hbp2ann-2 kernel: ? 
process_one_work+0x410/0x410
  2018-10-10T04:43:08.547645+00:00 hbp2ann-2 kernel: ? 
kthread_destroy_worker+0x50/0x50
  2018-10-10T04:43:08.547654+00:00 hbp2ann-2 kernel: ? do_syscall_64+0x73/0x130
  2018-10-10T04:43:08.547677+00:00 hbp2ann-2 kernel: ? SyS_exit_group+0x14/0x20
  2018-10-10T04:43:08.547685+00:00 hbp2ann-2 kernel: ret_from_fork+0x35/0x40

  Error Code: INFO: task kworker/u16:0:16678 blocked for more than 120
  seconds.

  We are seeing more issue with fsnotify related callbacks. These are
  not a soft/hard lockup but seem to significantly degrade the
  responsiveness of systemd (and from there everything else).

  The following upstream commit may fix this issue, but it is in Paul's
  RCU tree and not in linux-next or upstream yet:

  https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-
  rcu.git/commit/?h=dev=1a05c0cd2fee234a10362cc8f66057557cbb291f

  srcu: Lock srcu_data structure in srcu_gp_start()
  The srcu_gp_start() function is called with the srcu_struct structure's
  ->lock held, but not with the srcu_data structure's ->lock.  This is
  problematic because this function accesses and updates the srcu_data
  structure's ->srcu_cblist, which is protected by that lock.  Failing to
  hold this lock can result in corruption of the SRCU callback lists,
  which in turn can result in arbitrarily bad results.

  This commit therefore makes srcu_gp_start() acquire the srcu_data
  structure's ->lock across the calls to rcu_segcblist_advance() and
  rcu_segcblist_accelerate(), thus preventing this corruption.

  Please investigate this issue and evaluate the proposed fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1802021/+subscriptions

___
Mailing list: