[Kernel-packages] [Bug 1938013] Re: 4.15.0-151 is freezing intel 5th gen ThinkPad (T450)

2021-07-28 Thread Stéphane Lesimple
Same thing since I upgraded to 4.15.0-151-generic on my laptop.
Previously had 150+ days of uptime, now it crashes every few days, and
up to 3 times a day.

Very similar behaviour than what Martin reported just above: when
logging in to XFCE, it can crash easily, or when simply using the
Desktop. When it happens, this is a hard freeze, the watchdog doesn't
trigger (I waited for 1 hour), and even the sysrq keys no longer work
(yes, these are enabled).

Hardware: DELL Latitude 5300.

To add some datapoints, here are my 2 last crash reports:

==> /var/crash/linux-image-4.15.0-151-generic.31533.crash <==
ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be 
restarted.
Date: Fri Jul 23 15:41:43 2021
Failure: oops
OopsText:
 BUG: Bad page map in process Renderer  pte:809fb753c258 pmd:3993e9067
 addr:b8067c8b vm_flags:08f9 anon_vma:  (null) 
mapping:73ae2b58 index:648
 file:memfd:mozilla-ipc fault:shmem_fault mmap:shmem_mmap readpage:  
(null)
 CPU: 0 PID: 5797 Comm: Renderer Not tainted 4.15.0-151-generic #157-Ubuntu
 Hardware name: Dell Inc. Latitude 5300/0932VT, BIOS 1.8.1 12/16/2019
 Call Trace:
 
Package: linux-image-4.15.0-151-generic 4.15.0-151.157
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 4.15.0-151-generic x86_64

==> /var/crash/linux-image-4.15.0-151-generic.45463.crash <==
ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be 
restarted.
Date: Fri Jul 23 15:41:42 2021
Failure: oops
OopsText:
 BUG: Bad page cache in process Renderer  pfn:2d5fe5
 page:ed164b57f940 count:3 mapcount:1 mapping:8fdc5041b488 index:0x648
 flags: 0x17c004002d(locked|referenced|uptodate|lru|swapbacked)
 raw: 0017c004002d 8fdc5041b488 0648 0003
 raw: ed164d96dbe0 ed1650092820  8fdda9018000
 page dumped because: still mapped when deleted
 page->mem_cgroup:8fdda9018000
 CPU: 0 PID: 5797 Comm: Renderer Tainted: GB4.15.0-151-generic 
#157-Ubuntu
 Hardware name: Dell Inc. Latitude 5300/0932VT, BIOS 1.8.1 12/16/2019
 Call Trace:
 
Package: linux-image-4.15.0-151-generic 4.15.0-151.157
SourcePackage: linux
Tags: kernel-oops
Uname: Linux 4.15.0-151-generic x86_64

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1938013

Title:
  4.15.0-151 is freezing intel 5th gen ThinkPad (T450)

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  From: https://askubuntu.com/questions/1353859/ubuntu-18-04-05-lts-
  desktop-hangs-with-since-kernel-4-15-0-151-and-systemd-237-3

  Several crashes in /var/crash, here's the last one:-

  ProblemType: KernelOops
  Annotation: Your system might become unstable now and might need to be 
restarted.
  Date: Fri Jul 23 18:10:54 2021
  Failure: oops
  OopsText:
   BUG: Bad rss-counter state mm:c098a229 idx:2 val:-1
   usblp0: removed
   usblp 1-5:1.0: usblp0: USB Bidirectional printer dev 3 if 0 alt 0 proto 2 
vid 0x04F9 pid 0x02EC
   <44>[   18.329026] systemd-journald[358]: File 
/var/log/journal/b022dca21fd4480baeeb84f47ab439d3/user-1000.journal corrupted 
or uncleanly shut down, renaming and replacing.
   vboxdrv: loading out-of-tree module taints kernel.
   vboxdrv: module verification failed: signature and/or required key missing - 
tainting kernel
   vboxdrv: Found 8 processor cores
   vboxdrv: TSC mode is Invariant, tentative frequency 2303999142 Hz
   vboxdrv: Successfully loaded version 6.1.24 r145767 (interface 0x0030)
   VBoxNetFlt: Successfully started.
   VBoxNetAdp: Successfully started.
   Bluetooth: RFCOMM TTY layer initialized
   Bluetooth: RFCOMM socket layer initialized
   Bluetooth: RFCOMM ver 1.11
   rfkill: input handler disabled
   [UFW BLOCK] IN=enp3s0f1 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 
SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2 
   [UFW BLOCK] IN=wlp2s0 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 
SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2 
   [UFW BLOCK] IN=enp3s0f1 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 
SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2 
   [UFW BLOCK] IN=wlp2s0 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 
SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2 
   [UFW BLOCK] IN=enp3s0f1 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 
SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2 
   [UFW BLOCK] IN=wlp2s0 OUT= MAC=01:00:5e:00:00:01:80:20:da:95:bc:56:08:00 
SRC=192.168.1.254 DST=224.0.0.1 LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=0 DF PROTO=2 
   
  Package: linux-image-4.15.0-151-generic 4.15.0-151.157
  SourcePackage: linux
  Tags: kernel-oops
  Uname: Linux 

[Kernel-packages] [Bug 1765998] Re: FS access deadlock with btrfs quotas enabled

2019-03-03 Thread Stéphane Lesimple
I see, thanks for the info.

I'll report my findings so far here, in case it turns out useful to some
future person landing on this bug later:

The call trace of the deadlocked btrfs-cleaner kthread is as follows.

  Tainted: P   OE4.15.0-45-generic #48-Ubuntu
btrfs-cleaner   D0  7969  2 0x8000
Call Trace:
 __schedule+0x291/0x8a0
 schedule+0x2c/0x80
 btrfs_tree_read_lock+0xcc/0x120 [btrfs]
 ? wait_woken+0x80/0x80
 find_parent_nodes+0x295/0xe90 [btrfs]
 ? _cond_resched+0x19/0x40
 btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
 ? btrfs_find_all_roots_safe+0xb0/0x120 [btrfs]
 btrfs_find_all_roots+0x61/0x80 [btrfs]
 btrfs_qgroup_trace_extent_post+0x37/0x60 [btrfs]
[...]

I'm not including the bottom of the call trace because it varies: the common 
part does start from btrfs_qgroup_trace_extent_post and up however. The caller 
of btrfs_qgroup_trace_extent_pos can be either
- btrfs_qgroup_trace_extent+0xee/0x110 [btrfs], or
- btrfs_add_delayed_tree_ref+0x1c6/0x1f0 [btrfs], or
- btrfs_add_delayed_data_ref+0x30a/0x340 [btrfs]

This happens on 4.15 (Ubuntu flavor), on 4.18 (Ubuntu flavor "HWE"),
4.20.13 (vanilla).

On 4.20.0 (vanilla) and 5.0-rc8 (vanilla), there is also a deadlock
under similar conditions, but the call trace of the deadlocked btrfs-
transaction kthread looks different:

  Tainted: P   OE 4.20.0-042000-generic #201812232030
btrfs-transacti D0  8665  2 0x8000
Call Trace:
 __schedule+0x29e/0x840
 ? btrfs_free_path+0x13/0x20 [btrfs]
 schedule+0x2c/0x80
 btrfs_commit_transaction+0x715/0x840 [btrfs]
 ? wait_woken+0x80/0x80
 transaction_kthread+0x15c/0x190 [btrfs]
 kthread+0x120/0x140
 ? btrfs_cleanup_transaction+0x560/0x560 [btrfs]
 ? __kthread_parkme+0x70/0x70
 ret_from_fork+0x35/0x40

Other userspace threads are locked at the same time.

So we seem to be dealing with at least 2 different deadlock cases which
seem to happen with lots of subvolumes and/or snapshots, and quota
enabled. All of this disappears with quota disabled.

For the record, the main btrfs qgroups dev seems to have a lot of
pending changes / fixes coming around this, expected for 5.1 or 5.2.
Stay tuned...

I have disabled quota for now. I only enable it for a short period of
time when I need to get size information about my subvols and snapshots.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765998

Title:
  FS access deadlock with btrfs quotas enabled

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  I'm running into an issue on Ubuntu Bionic (but not Xenial) where
  shortly after boot, under heavy load from many LXD containers starting
  at once, access to the btrfs filesystem that the containers are on
  deadlocks.

  The issue is quite hard to reproduce on other systems, quite likely
  related to the size of the filesystem involved (4 devices with a total
  of 8TB, millions of files, ~20 subvolumes with tens of snapshots each)
  and the access pattern from many LXD containers at once. It definitely
  goes away when disabling btrfs quotas though. Another prerequisite to
  trigger this bug may be the container subvolumes sharing extents (from
  their parent image or due to deduplication).

  I can only reliably reproduce it on a production system that I can only do 
very limited testing on, however I have been able to gather the following 
information:
  - Many threads are stuck, trying to aquire locks on various tree roots, which 
are never released by their current holders.
  - There always seem to be (at least) two threads executing rmdir syscalls 
which are creating the circular dependency: One of them is in btrfs_cow_block 
=> ... => btrfs_qgroup_trace_extent_post => ... => find_parent_nodes and wants 
to acquire a lock that was already aquired by btrfs_search_slot of the other 
rmdir.
  - Reverting this patch seems to prevent it from happening: 
https://patchwork.kernel.org/patch/9573267/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765998/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1765998] Re: FS access deadlock with btrfs quotas enabled

2019-02-26 Thread Stéphane Lesimple
I confirm I'm having this problem too since migrating to Bionic from
Xenial.

My setup is a bit analogous to the original bug reporter: I have 5
drives (22 TB) in RAID1, organized in around 10 subvolumes with up to 20
snapshots per subvolume.

After some hours running normally, at some point [btrfs-transaction]
goes into D state, and everything btrfs-related slowly comes down to a
stall, with any program trying to touch it ending up in D state too.

The call trace I have also references btrfs_qgroup_trace_extent_post.

I'm currently testing 5.0-rc8 from the Ubuntu ppa mainline, to see if
the problem is still there.

Michael, did you end up reporting the problem upstream? I would be keen
do to it on the btrfs mailing-list, as soon as I have the answer to
whether this is fixed with 5.0 or not.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1765998

Title:
  FS access deadlock with btrfs quotas enabled

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Bionic:
  Triaged

Bug description:
  I'm running into an issue on Ubuntu Bionic (but not Xenial) where
  shortly after boot, under heavy load from many LXD containers starting
  at once, access to the btrfs filesystem that the containers are on
  deadlocks.

  The issue is quite hard to reproduce on other systems, quite likely
  related to the size of the filesystem involved (4 devices with a total
  of 8TB, millions of files, ~20 subvolumes with tens of snapshots each)
  and the access pattern from many LXD containers at once. It definitely
  goes away when disabling btrfs quotas though. Another prerequisite to
  trigger this bug may be the container subvolumes sharing extents (from
  their parent image or due to deduplication).

  I can only reliably reproduce it on a production system that I can only do 
very limited testing on, however I have been able to gather the following 
information:
  - Many threads are stuck, trying to aquire locks on various tree roots, which 
are never released by their current holders.
  - There always seem to be (at least) two threads executing rmdir syscalls 
which are creating the circular dependency: One of them is in btrfs_cow_block 
=> ... => btrfs_qgroup_trace_extent_post => ... => find_parent_nodes and wants 
to acquire a lock that was already aquired by btrfs_search_slot of the other 
rmdir.
  - Reverting this patch seems to prevent it from happening: 
https://patchwork.kernel.org/patch/9573267/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765998/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp