Re: Reproducible system lockup, extracting files into XFS on dm-raid5 on dm-integrity on HDD

2024-06-06 Thread Dave Chinner
On Thu, Jun 06, 2024 at 11:48:57AM -0400, Zack Weinberg wrote: > On Wed, Jun 5, 2024, at 7:05 PM, Dave Chinner wrote: > > On Wed, Jun 05, 2024 at 02:40:45PM -0400, Zack Weinberg wrote: > >> I am experimenting with the use of dm-integrity underneath dm-raid, > >> to g

Re: Reproducible system lockup, extracting files into XFS on dm-raid5 on dm-integrity on HDD

2024-06-05 Thread Dave Chinner
in time... > [ 2213.651425] 2 locks held by kworker/25:3/13498: > [ 2213.651426] #0: 9aa0c7bfe758 > ((wq_completion)xfs-sync/md126p1){+.+.}-{0:0}, at: > process_one_work+0x3cc/0x640 > [ 2213.651436] #1: b848e259be58 > ((work_completion)(&(>l_work)->work)){+.+.}-{0:0}, at: > process_one_work+0x1ca/0x640 And that's the periodic log worker that generated the log force which trigger the hung task timer. > [ 2213.651465] = > [ 2213.651467] Kernel panic - not syncing: hung_task: blocked tasks > [ 2213.652654] Kernel Offset: 0x700 from 0x8100 (relocation > range: 0x8000-0xbfff) So the system has been up for just under an hour, and it's configured to panic on warnings and/or hung tasks. If you want the untar to complete, turn off the hung task timer or push it out so far that it doesn't trigger (i.e. 12-24 hours). Don't expect it to finish quickly, and when it does there's probably still hours of metadata writeback pending which will block unmount until it's done. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v20 06/12] fs, block: copy_file_range for def_blk_ops for direct block device

2024-05-25 Thread Dave Chinner
d be. It may well be that the application falls back to "copy through the page cache", but that is an application policy choice, not a something the kernel offload driver should be making mandatory. Userspace has to handle copy offload failure anyway, so they a fallback path regardless of whether copy_file_range() works on block devices or not... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v20 05/12] fs/read_write: Enable copy_file_range for block device.

2024-05-25 Thread Dave Chinner
s to the rest of generic_copy_file_checks() when block devices are used. Is this correct? If so, this needs a pair of comments (one for each function) to explain why the specific inode used for these functions is correct for block devices -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/5] zonefs: pass GFP_KERNEL to blkdev_zone_mgmt() call

2024-01-23 Thread Dave Chinner
(_I(inode)->i_truncate_mutex); - so, this > function is called with the mutex held - could it happen that the > GFP_KERNEL allocation recurses into the filesystem and attempts to take > i_truncate_mutex as well? > > i.e. GFP_KERNEL -> iomap_do_writepage -> zonefs_write_map_blocks -> > zonefs_write_iomap_begin -> mutex_lock(>i_truncate_mutex) zonefs doesn't have a ->writepage method, so writeback can't be called from memory reclaim like this. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list

2023-12-06 Thread Dave Chinner
On Thu, Dec 07, 2023 at 12:42:59AM -0500, Kent Overstreet wrote: > On Wed, Dec 06, 2023 at 05:05:33PM +1100, Dave Chinner wrote: > > From: Waiman Long > > > > The dlock list needs one list for each of the CPUs available. However, > > for sibling CPUs, they are sha

Re: [PATCH 08/11] vfs: inode cache conversion to hash-bl

2023-12-06 Thread Dave Chinner
On Wed, Dec 06, 2023 at 11:58:44PM -0500, Kent Overstreet wrote: > On Wed, Dec 06, 2023 at 05:05:37PM +1100, Dave Chinner wrote: > > From: Dave Chinner > > > > Scalability of the global inode_hash_lock really sucks for > > filesystems that use the vfs inode cac

Re: [PATCH 03/11] vfs: Use dlock list for superblock's inode list

2023-12-06 Thread Dave Chinner
On Thu, Dec 07, 2023 at 02:40:24AM +, Al Viro wrote: > On Wed, Dec 06, 2023 at 05:05:32PM +1100, Dave Chinner wrote: > > > @@ -303,6 +303,7 @@ static void destroy_unused_super(struct super_block *s) > > super_unlock_excl(s); > > list_l

Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep

2023-12-06 Thread Dave Chinner
On Wed, Dec 06, 2023 at 11:16:50PM -0500, Kent Overstreet wrote: > On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote: > > From: Dave Chinner > > > > hash-bl nests spinlocks inside the bit locks. This causes problems > > for CONFIG_PREEMPT_RT which converts s

Re: [PATCH 05/11] selinux: use dlist for isec inode list

2023-12-06 Thread Dave Chinner
On Wed, Dec 06, 2023 at 04:52:42PM -0500, Paul Moore wrote: > On Wed, Dec 6, 2023 at 1:07 AM Dave Chinner wrote: > > > > From: Dave Chinner > > > > Because it's a horrible point of lock contention under heavily > > concurrent directory traversals...

[PATCH 03/11] vfs: Use dlock list for superblock's inode list

2023-12-05 Thread Dave Chinner
unlock 0.67% __raw_callee_save___pv_queued_spin_unlock Signed-off-by: Waiman Long Signed-off-by: Dave Chinner --- block/bdev.c | 24 fs/drop_caches.c | 9 - fs/gfs2/ops_fstype.c | 21 +++-- fs/inode.c

[PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep

2023-12-05 Thread Dave Chinner
From: Dave Chinner hash-bl nests spinlocks inside the bit locks. This causes problems for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks, and we're not allowed to sleep while holding a spinning lock. Further, lockdep does not support bit locks, so we lose lockdep coverage

[PATCH 08/11] vfs: inode cache conversion to hash-bl

2023-12-05 Thread Dave Chinner
From: Dave Chinner Scalability of the global inode_hash_lock really sucks for filesystems that use the vfs inode cache (i.e. everything but XFS). Profiles of a 32-way concurrent sharded directory walk (no contended directories) on a couple of different filesystems. All numbers from a 6.7-rc4

[PATCH 01/11] lib/dlock-list: Distributed and lock-protected lists

2023-12-05 Thread Dave Chinner
From: Waiman Long Linked list is used everywhere in the Linux kernel. However, if many threads are trying to add or delete entries into the same linked list, it can create a performance bottleneck. This patch introduces a new list APIs that provide a set of distributed lists (one per CPU), each

[PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list

2023-12-05 Thread Dave Chinner
From: Waiman Long The dlock list needs one list for each of the CPUs available. However, for sibling CPUs, they are sharing the L2 and probably L1 caches too. As a result, there is not much to gain in term of avoiding cacheline contention while increasing the cacheline footprint of the L1/L2

[PATCH 11/11] hlist-bl: introduced nested locking for dm-snap

2023-12-05 Thread Dave Chinner
From: Dave Chinner Testing with lockdep enabled threw this warning from generic/081 in fstests: [ 2369.724151] [ 2369.725805] WARNING: possible recursive locking detected [ 2369.727125] 6.7.0-rc2-dgc+ #1952 Not tainted [ 2369.728647

[PATCH 09/11] hash-bl: explicitly initialise hash-bl heads

2023-12-05 Thread Dave Chinner
From: Dave Chinner Because we are going to change how the structure is laid out to support RTPREEMPT and LOCKDEP, just assuming that the hash table is allocated as zeroed memory is no longer sufficient to initialise a hash-bl table. Signed-off-by: Dave Chinner --- fs/dcache.c | 21

[PATCH 05/11] selinux: use dlist for isec inode list

2023-12-05 Thread Dave Chinner
From: Dave Chinner Because it's a horrible point of lock contention under heavily concurrent directory traversals... - 12.14% d_instantiate - 12.06% security_d_instantiate - 12.13% selinux_d_instantiate - 12.16% inode_doinit_with_dentry - 15.45

[PATCH 0/11] vfs: inode cache scalability improvements

2023-12-05 Thread Dave Chinner
We all know that the global inode_hash_lock and the per-fs global sb->s_inode_list_lock locks are contention points in filesystem workloads that stream inodes through memory, so it's about time we addressed these limitations. The first part of the patchset address the sb->s_inode_list_lock. This

[PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants

2023-12-05 Thread Dave Chinner
From: Jan Kara evict_inodes() and invalidate_inodes() use list_for_each_entry_safe() to iterate sb->s_inodes list. However, since we use i_lru list entry for our local temporary list of inodes to destroy, the inode is guaranteed to stay in sb->s_inodes list while we hold sb->s_inode_list_lock.

[PATCH 07/11] hlist-bl: add hlist_bl_fake()

2023-12-05 Thread Dave Chinner
From: Dave Chinner in preparation for switching the VFS inode cache over the hlist_bl lists, we nee dto be able to fake a list node that looks like it is hased for correct operation of filesystems that don't directly use the VFS indoe cache. Signed-off-by: Dave Chinner --- include/linux

[PATCH 06/11] vfs: factor out inode hash head calculation

2023-12-05 Thread Dave Chinner
From: Dave Chinner In preparation for changing the inode hash table implementation. Signed-off-by: Dave Chinner --- fs/inode.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 3426691fa305

Re: [PATCH v9 0/3] [PATCH v9 0/3] Introduce provisioning primitives

2023-11-20 Thread Dave Chinner
On Mon, Nov 13, 2023 at 01:26:51PM -0800, Sarthak Kukreti wrote: > On Fri, Nov 10, 2023 at 4:56 PM Dave Chinner wrote: > > > > On Thu, Nov 09, 2023 at 05:01:35PM -0800, Sarthak Kukreti wrote: > > > Hi, > > > > > > This patch series is version 9 of the p

Re: [PATCH v9 0/3] [PATCH v9 0/3] Introduce provisioning primitives

2023-11-10 Thread Dave Chinner
e() operations through XFS? Cheers, Dave. -- Dave Chinner da...@fromorbit.com