Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong > > > wrote: > &

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong > > > wrote: > &

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
On Fri, Feb 26, 2021 at 12:59:53PM -0800, Dan Williams wrote: > On Fri, Feb 26, 2021 at 12:51 PM Dave Chinner wrote: > > > > On Fri, Feb 26, 2021 at 11:24:53AM -0800, Dan Williams wrote: > > > On Fri, Feb 26, 2021 at 11:05 AM Darrick J. Wong > > > wrote: > &

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
hen when userspace tries to access the mapped DAX pages we get a new page fault. In processing the fault, the filesystem will try to get direct access to the pmem from the block device. This will get an ENODEV error from the block device because because the backing store (pmem) has been unplugged and is no longer there... AFAICT, as long as pmem removal invalidates all the active ptes that point at the pmem being removed, the filesystem doesn't need to care about device removal at all, DAX or no DAX... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
hen when userspace tries to access the mapped DAX pages we get a new page fault. In processing the fault, the filesystem will try to get direct access to the pmem from the block device. This will get an ENODEV error from the block device because because the backing store (pmem) has been unplugged and is no longer there... AFAICT, as long as pmem removal invalidates all the active ptes that point at the pmem being removed, the filesystem doesn't need to care about device removal at all, DAX or no DAX... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Question about the "EXPERIMENTAL" tag for dax in XFS

2021-02-26 Thread Dave Chinner
hen when userspace tries to access the mapped DAX pages we get a new page fault. In processing the fault, the filesystem will try to get direct access to the pmem from the block device. This will get an ENODEV error from the block device because because the backing store (pmem) has been unplugge

Re: page->index limitation on 32bit system?

2021-02-21 Thread Dave Chinner
fix it in mainline that I know of. > As I said, some vendors have tried to fix it in their NAS products, > but I don't know where to find that patch any more. It's not suportable from a disaster recovery perspective. I recently saw a 14TB filesystem with billions of hardlinks in it require 240GB of RAM to run xfs_repair. We just can't support large filesystems on 32 bit systems, and it has nothing to do with simple stuff like page cache index sizes... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: page->index limitation on 32bit system?

2021-02-21 Thread Dave Chinner
fset for such systems to 16TB so sparse files can't be larger than what the kernel supports. See xfs_sb_validate_fsb_count() call and the file offset checks against MAX_LFS_FILESIZE in xfs_fs_fill_super()... FWIW, XFS has been doing this for roughly 20 years now - >16TB on 32 bit machines w

Re: [dm-devel] [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Dave Chinner
n care at this point about cross-device XCOPY at this point? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [RFC PATCH v5 0/4] add simple copy support

2021-02-21 Thread Dave Chinner
n care at this point about cross-device XCOPY at this point? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-14 Thread Dave Chinner
On Fri, Feb 12, 2021 at 03:54:48PM -0800, Darrick J. Wong wrote: > On Sat, Feb 13, 2021 at 10:27:26AM +1100, Dave Chinner wrote: > > On Fri, Feb 12, 2021 at 03:07:39PM -0800, Ian Lance Taylor wrote: > > > On Fri, Feb 12, 2021 at 3:03 PM Dave Chinner wrote: > > > > &

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Dave Chinner
On Fri, Feb 12, 2021 at 03:07:39PM -0800, Ian Lance Taylor wrote: > On Fri, Feb 12, 2021 at 3:03 PM Dave Chinner wrote: > > > > On Fri, Feb 12, 2021 at 04:45:41PM +0100, Greg KH wrote: > > > On Fri, Feb 12, 2021 at 07:33:57AM -0800, Ian Lance Taylor wrote: > > > &

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Dave Chinner
ly breaking? What changed in > > the kernel that caused this? Procfs has been around for a _very_ long > > time :) > > That would be because of (v5.3): > > 5dae222a5ff0 vfs: allow copy_file_range to copy across devices > > The intention of this change (series) was to

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Dave Chinner
It is not intended as a copy mechanism for copying data from one random file descriptor to another. The use of it as a general file copy mechanism in the Go system library is incorrect and wrong. It is a userspace bug. Userspace has done the wrong thing, userspace needs to be fixed. -Dave. -- Dave Chinner da...@fromorbit.com

Re: rcu: INFO: rcu_sched self-detected stall on CPU: Workqueue: xfs-conv/md0 xfs_end_io

2021-02-08 Thread Dave Chinner
back. It's likely to be too much work for a bound workqueue, too, especially when you consider that the workqueue completion code will merge sequential ioends into one ioend, hence making the IO completion loop counts bigger and latency problems worse rather than better... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 00/18] new API for FS_IOC_[GS]ETFLAGS/FS_IOC_FS[GS]ETXATTR

2021-02-07 Thread Dave Chinner
to list the requested attributes of all directories and files in the tree... So, yeah, we do indeed do thousands of these fsxattr based operations a second, sometimes tens of thousands a second or more, and sometimes they are issued in bulk in performance critical paths for container build/deployment operations Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Unexpected reflink/subvol snapshot behaviour

2021-02-01 Thread Dave Chinner
On Mon, Feb 01, 2021 at 06:14:21PM -0800, Darrick J. Wong wrote: > On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote: > > Hi btrfs-gurus, > > > > I'm running a simple reflink/snapshot/COW scalability test at the > > moment. It is just a loop that d

Re: Unexpected reflink/subvol snapshot behaviour

2021-02-01 Thread Dave Chinner
On Fri, Jan 29, 2021 at 06:25:50PM -0500, Zygo Blaxell wrote: > On Mon, Jan 25, 2021 at 09:36:55AM +1100, Dave Chinner wrote: > > On Sat, Jan 23, 2021 at 04:42:33PM +0800, Qu Wenruo wrote: > > > > > > > > > On 2021/1/22 上午6:20, Dave Chinner wrote: > >

Re: [PATCH] fs: generic_copy_file_checks: Do not adjust count based on file size

2021-01-26 Thread Dave Chinner
mechanisms. Of course, with these special zero length files that contain ephemeral data, userspace can't actually tell that they contain data from userspace using stat(). So as far as userspace is concerned, copy_file_range() correctly returned zero bytes copied from a zero byte long file and there's nothing more to do. This zero length file behaviour is, fundamentally, a kernel filesystem implementation bug, not a copy_file_range() bug. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [BUG] copy_file_range with sysfs file as input

2021-01-26 Thread Dave Chinner
On Tue, Jan 26, 2021 at 11:50:50AM +0800, Nicolas Boichat wrote: > On Tue, Jan 26, 2021 at 9:34 AM Dave Chinner wrote: > > > > On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote: > > > Hi copy_file_range experts, > > > > > > We hit this in

Re: [BUG] copy_file_range with sysfs file as input

2021-01-26 Thread Dave Chinner
x27;t check the file size and just attempts to read unconditionally from the file. Hence it happily returns non-existent stale data from busted filesystem implementations that allow data to be read from beyond EOF... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Unexpected reflink/subvol snapshot behaviour

2021-01-24 Thread Dave Chinner
On Sat, Jan 23, 2021 at 04:42:33PM +0800, Qu Wenruo wrote: > > > On 2021/1/22 上午6:20, Dave Chinner wrote: > > Hi btrfs-gurus, > > > > I'm running a simple reflink/snapshot/COW scalability test at the > > moment. It is just a loop that does "fio overwri

Re: Unexpected reflink/subvol snapshot behaviour

2021-01-24 Thread Dave Chinner
On Sat, Jan 23, 2021 at 07:19:03PM -0500, Zygo Blaxell wrote: > On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote: > > Hi btrfs-gurus, > > > > I'm running a simple reflink/snapshot/COW scalability test at the > > moment. It is just a loop that d

Unexpected reflink/subvol snapshot behaviour

2021-01-21 Thread Dave Chinner
workload, I suspect the issues I note above are btrfs issues, not expected behaviour. I'm not sure what the expected scalability of btrfs file clones and snapshots are though, so I'm interested to hear if these results are expected or not. Cheers, Dave. -- Dave Chinner da...@fromorbit.com JOBS=4 IODEPTH=4 IOCOUNT=$((1 / $JOBS)) FILESIZE=4g cat >$fio_config <

Re: Expense of read_iter

2021-01-19 Thread Dave Chinner
and so provide the same benefit to all the filesystems that use it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: Expense of read_iter

2021-01-19 Thread Dave Chinner
and so provide the same benefit to all the filesystems that use it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-13 Thread Dave Chinner
On Fri, Jan 08, 2021 at 11:56:57AM -0500, Brian Foster wrote: > On Fri, Jan 08, 2021 at 08:54:44AM +1100, Dave Chinner wrote: > > e.g. we run the first transaction into the CIL, it steals the sapce > > needed for the cil checkpoint headers for the transaciton. Then if > > the

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-13 Thread Dave Chinner
On Mon, Jan 11, 2021 at 11:38:48AM -0500, Brian Foster wrote: > On Fri, Jan 08, 2021 at 11:56:57AM -0500, Brian Foster wrote: > > On Fri, Jan 08, 2021 at 08:54:44AM +1100, Dave Chinner wrote: > > > On Mon, Jan 04, 2021 at 11:23:53AM -0500, Brian Foster wrote: > > > >

Re: [f2fs-dev] [PATCH v2 04/12] fat: only specify I_DIRTY_TIME when needed in fat_update_time()

2021-01-11 Thread Dave Chinner
ourse, it will do if you crash or even just unmount/mount a filesystem that doesn't persist it. Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Re: [PATCH] mm: vmscan: support complete shrinker reclaim

2021-01-10 Thread Dave Chinner
.com/ and that should also allow accrual of the work skipped on each memcg be accounted across multiple calls to the shrinkers for the same memcg. Hence as memory pressure within the memcg goes up, the repeated calls to direct reclaim within that memcg will result in all of the freeable items in each cache eventually being freed... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-08 Thread Dave Chinner
On Fri, Jan 08, 2021 at 03:59:22PM +0800, Ming Lei wrote: > On Thu, Jan 07, 2021 at 09:21:11AM +1100, Dave Chinner wrote: > > On Wed, Jan 06, 2021 at 04:45:48PM +0800, Ming Lei wrote: > > > On Tue, Jan 05, 2021 at 07:39:38PM +0100, Christoph Hellwig wrote: > > > > A

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-07 Thread Dave Chinner
On Sun, Jan 03, 2021 at 05:03:33PM +0100, Donald Buczek wrote: > On 02.01.21 23:44, Dave Chinner wrote: > > On Sat, Jan 02, 2021 at 08:12:56PM +0100, Donald Buczek wrote: > > > On 31.12.20 22:59, Dave Chinner wrote: > > > > On Thu, Dec 31, 2020 at 12:48:5

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-07 Thread Dave Chinner
On Mon, Jan 04, 2021 at 11:23:53AM -0500, Brian Foster wrote: > On Thu, Dec 31, 2020 at 09:16:11AM +1100, Dave Chinner wrote: > > On Wed, Dec 30, 2020 at 12:56:27AM +0100, Donald Buczek wrote: > > > If the value goes below the limit while some threads are > > > already

Re: [RFC PATCH] fs: block_dev: compute nr_vecs hint for improving writeback bvecs allocation

2021-01-06 Thread Dave Chinner
rything we need to determine whether we should do a large or small bio vec allocation in the iomap writeback path... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2021-01-02 Thread Dave Chinner
On Sat, Jan 02, 2021 at 08:12:56PM +0100, Donald Buczek wrote: > On 31.12.20 22:59, Dave Chinner wrote: > > On Thu, Dec 31, 2020 at 12:48:56PM +0100, Donald Buczek wrote: > > > On 30.12.20 23:16, Dave Chinner wrote: > > One could argue that, but one should al

Re: [xfs] db962cd266: Assertion_failed

2021-01-01 Thread Dave Chinner
lifts of the context setting up into xfs_trans_alloc() back into the patchset before adding the current->journal functionality patch. Also, you need to test XFS code with CONFIG_XFS_DEBUG=y so that asserts are actually built into the code and exercised, because this ASSERT should have fired on the first rolling transaction that the kernel executes... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Linux-cachefs] [xfs] db962cd266: Assertion_failed

2021-01-01 Thread Dave Chinner
lifts of the context setting up into xfs_trans_alloc() back into the patchset before adding the current->journal functionality patch. Also, you need to test XFS code with CONFIG_XFS_DEBUG=y so that asserts are actually built into the code and exercised, because this ASSERT should have fired o

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2020-12-31 Thread Dave Chinner
On Thu, Dec 31, 2020 at 12:48:56PM +0100, Donald Buczek wrote: > On 30.12.20 23:16, Dave Chinner wrote: > > On Wed, Dec 30, 2020 at 12:56:27AM +0100, Donald Buczek wrote: > > > Threads, which committed items to the CIL, wait in the > > > xc_push_wait waitqueue when use

Re: [PATCH] xfs: Wake CIL push waiters more reliably

2020-12-30 Thread Dave Chinner
> wake_up_all(&cil->xc_push_wait); That just smells wrong to me. It *might* be correct, but this condition should pair with the sleep condition, as space used by a CIL context should never actually decrease Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: v5.10.1 xfs deadlock

2020-12-18 Thread Dave Chinner
is > related to that, because the md block devices itself are > responsive (`xxd /dev/md0` ) My bet is that the OOT driver/hardware had dropped a log IO on the floor - XFS is waiting for the CIL push to complete, and I'm betting that is stuck waiting for iclog IO completion while writing the CIL to the journal. The sysrq output will tell us if this is the case, so that's the first place to look. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2020-12-17 Thread Dave Chinner
inspection. But I'm > not a VFS expert so I'm not quite sure. Uh, if you have a shrinker racing to register and unregister, you've got a major bug in your object initialisation/teardown code. i.e. calling reagister/unregister at the same time for the same shrinker is a bug, pure and simple. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Linux-cachefs] [PATCH v13 4/4] xfs: use current->journal_info to avoid transaction reservation recursion

2020-12-17 Thread Dave Chinner
reservation recursion is used by XFS only, we can > move the check into xfs_vm_writepage(s), per Dave. > > Cc: Darrick J. Wong > Cc: Matthew Wilcox (Oracle) > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: Michal Hocko > Cc: David Howells > Cc: Jeff Layton > Sig

Re: [Linux-cachefs] [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set, clear}

2020-12-17 Thread Dave Chinner
On Thu, Dec 17, 2020 at 03:06:27PM -0800, Darrick J. Wong wrote: > On Fri, Dec 18, 2020 at 09:15:09AM +1100, Dave Chinner wrote: > > The obvious solution: we've moved the saved process state to a > > different context, so it is no longer needed for the current > > t

Re: [Linux-cachefs] [PATCH v13 3/4] xfs: refactor the usage around xfs_trans_context_{set, clear}

2020-12-17 Thread Dave Chinner
if (tp->t_pflags) memalloc_nofs_restore(tp->t_pflags); } and the problem is solved. The NOFS state will follow the active transaction and not be reset until the entire transaction chain is completed. In the next patch you can go and introduce current->journal_info into just the wrapper functions, maintaining the same overall logic. -Dave. -- Dave Chinner da...@fromorbit.com -- Linux-cachefs mailing list Linux-cachefs@redhat.com https://www.redhat.com/mailman/listinfo/linux-cachefs

Re: [Linux-cachefs] [PATCH v13 1/4] mm: Add become_kswapd and restore_kswapd

2020-12-16 Thread Dave Chinner
n kswapd -- the only time we reach this code is when we're > exiting and the task_struct is about to be destroyed anyway. > > Cc: Dave Chinner > Acked-by: Michal Hocko > Reviewed-by: Darrick J. Wong > Reviewed-by: Christoph Hellwig > Signed-off-by: Matthew Wilcox (O

Re: [RFC PATCH v3 4/9] mm, fsdax: Refactor memory-failure handler for dax mapping

2020-12-16 Thread Dave Chinner
way. So, AFAICT, the dax_lock() stuff is only necessary when the filesystem can't be used to resolve the owner of physical page that went bad Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v3 4/9] mm, fsdax: Refactor memory-failure handler for dax mapping

2020-12-16 Thread Dave Chinner
way. So, AFAICT, the dax_lock() stuff is only necessary when the filesystem can't be used to resolve the owner of physical page that went bad Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker

2020-12-15 Thread Dave Chinner
On Tue, Dec 15, 2020 at 02:27:18PM -0800, Yang Shi wrote: > On Mon, Dec 14, 2020 at 6:46 PM Dave Chinner wrote: > > > > On Mon, Dec 14, 2020 at 02:37:19PM -0800, Yang Shi wrote: > > > Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's >

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Dave Chinner
Combine that with the proposed "watch_sb()" syscall for reporting such errors in a generic manner to interested listeners, and we've got a fairly solid generic path for reporting data loss events to userspace for an appropriate user-defined action to be taken... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v3 8/9] md: Implement ->corrupted_range()

2020-12-15 Thread Dave Chinner
Combine that with the proposed "watch_sb()" syscall for reporting such errors in a generic manner to interested listeners, and we've got a fairly solid generic path for reporting data loss events to userspace for an appropriate user-defined action to be taken... Cheers, Dave. -- Dave Chi

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Dave Chinner
u still have a uesr data recovery process to perform after this... > And how does it help in dealing with page faults upon poisoned > dax page? It doesn't. If the page is poisoned, the same behaviour will occur as does now. This is simply error reporting infrastructure, not error handling. Future work might change how we correct the faults found in the storage, but I think the user visible behaviour is going to be "kill apps mapping corrupted data" for a long time yet Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-15 Thread Dave Chinner
u still have a uesr data recovery process to perform after this... > And how does it help in dealing with page faults upon poisoned > dax page? It doesn't. If the page is poisoned, the same behaviour will occur as does now. This is simply error reporting infrastructure, not error handling

Re: [PATCH 1/2] iomap: Separate out generic_write_sync() from iomap_dio_complete()

2020-12-15 Thread Dave Chinner
AP_DIO_NEED_SYNC)) > - ret = generic_write_sync(iocb, ret); > + ret = generic_write_sync(dio->iocb, ret); > > kfree(dio); > > return ret; > } > -EXPORT_SYMBOL_GPL(iomap_dio_complete); > + NACK. If you don't want iomap_dio_comple

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-15 Thread Dave Chinner
On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote: > On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote: > > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote: > > > Since memcg_shrinker_map_size just can be changd under holding &g

Re: [v2 PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers

2020-12-14 Thread Dave Chinner
return; > > kfree(shrinker->nr_deferred); > shrinker->nr_deferred = NULL; e.g. then this function can simply do: { if (shrinker->flags & SHRINKER_MEMCG_AWARE) return unregister_memcg_shrinker(shrinker); kfree(shrinker->nr_deferred); shrinker->nr_deferred = NULL; } Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 8/9] mm: memcontrol: reparent nr_deferred when memcg offline

2020-12-14 Thread Dave Chinner
acd..693a41e89969 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -201,7 +201,7 @@ DECLARE_RWSEM(shrinker_rwsem); > #define SHRINKER_REGISTERING ((struct shrinker *)~0UL) > > static DEFINE_IDR(shrinker_idr); > -static int shrinker_nr_max; > +int shrinker_nr_max; Then we don't need to make yet another variable global... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 9/9] mm: vmscan: shrink deferred objects proportional to priority

2020-12-14 Thread Dave Chinner
ile it may help your specific corner case, it's likely to significantly change the reclaim balance of slab caches, especially under GFP_NOFS intensive workloads where we can only defer the work to kswapd. Hence I think this is still a problematic approach as it doesn't address the reason why deferred counts are increasing out of control in the first place Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker

2020-12-14 Thread Dave Chinner
r will do that for static functions automatically if it makes sense. Ok, so you only do the memcg nr_deferred thing if NUMA_AWARE && sc->memcg is true. so static long shrink_slab_set_nr_deferred_memcg(...) { int nid = sc->nid; deferred = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_deferred, true); return atomic_long_add_return(nr, &deferred->nr_deferred[id]); } static long shrink_slab_set_nr_deferred(...) { int nid = sc->nid; if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) nid = 0; else if (sc->memcg) return shrink_slab_set_nr_deferred_memcg(, nid); return atomic_long_add_return(nr, &shrinker->nr_deferred[nid]); } And now there's no duplicated code. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 5/9] mm: memcontrol: add per memcg shrinker nr_deferred

2020-12-14 Thread Dave Chinner
nd nr_deferred pointers to the correct offset in the allocated range. Then this patch is really only changes to the size of the chunk being allocated, setting up the pointers and copying the relevant data from the old to new. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation

2020-12-14 Thread Dave Chinner
is a good idea. This couples the shrinker infrastructure to internal details of how cgroups are initialised and managed. Sure, certain operations might be done in certain shrinker lock contexts, but that doesn't mean we should share global locks across otherwise independent subsystems Chee

Re: [v2 PATCH 3/9] mm: vmscan: guarantee shrinker_slab_memcg() sees valid shrinker_maps for online memcg

2020-12-14 Thread Dave Chinner
up that the barriers enforce. IOWs, these memory barriers belong inside the cgroup code to guarantee anything that sees an online cgroup will always see the fully initialised cgroup structures. They do not belong in the shrinker infrastructure... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO

2020-12-14 Thread Dave Chinner
On Tue, Dec 15, 2020 at 01:03:45AM +, Pavel Begunkov wrote: > On 15/12/2020 00:56, Dave Chinner wrote: > > On Tue, Dec 15, 2020 at 12:20:23AM +, Pavel Begunkov wrote: > >> As reported, we must not do pressure stall information accounting for > >> direct IO, beca

Re: [Linux-cachefs] [PATCH v12 3/4] xfs: refactor the usage around xfs_trans_context_{set, clear}

2020-12-14 Thread Dave Chinner
On Tue, Dec 15, 2020 at 08:42:08AM +0800, Yafang Shao wrote: > On Tue, Dec 15, 2020 at 5:08 AM Dave Chinner wrote: > > On Sun, Dec 13, 2020 at 05:09:02PM +0800, Yafang Shao wrote: > > > On Thu, Dec 10, 2020 at 3:52 AM Darrick J. Wong > > > wrote: > > > > O

Re: [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc

2020-12-14 Thread Dave Chinner
On Tue, Dec 15, 2020 at 12:00:23PM +1100, Dave Chinner wrote: > On Tue, Dec 15, 2020 at 12:20:24AM +, Pavel Begunkov wrote: > > A preparation patch. It adds a simple helper which abstracts out number > > of segments we're allocating for a bio from iov_iter_npages(). >

Re: [PATCH v1 6/6] block/iomap: don't copy bvec for direct IO

2020-12-14 Thread Dave Chinner
io_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs) > { > + /* reuse iter->bvec */ > + if (iov_iter_is_bvec(iter)) > + return 0; > return iov_iter_npages(iter, max_segs); Ah, I'm a blind idiot... :/ Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 5/6] bio: add a helper calculating nr segments to alloc

2020-12-14 Thread Dave Chinner
de this specific patch, so it's not clear what it's actually needed for... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v1 4/6] block/psi: remove PSI annotations from direct IO

2020-12-14 Thread Dave Chinner
for paging IO */ > + bio_clear_flag(bio, BIO_WORKINGSET); Why only do this for the old direct IO path? Why isn't this necessary for the iomap DIO path? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Linux-cachefs] [PATCH v12 3/4] xfs: refactor the usage around xfs_trans_context_{set, clear}

2020-12-14 Thread Dave Chinner
> > This patch is based on Darrick's work to fix the issue in xfs/141 in the > > > earlier version. [1] > > > > > > 1. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia > > > > > > Cc: Darrick J. Wong >

Re: [Linux-cachefs] [PATCH v10 4/4] xfs: use current->journal_info to avoid transaction reservation recursion

2020-12-07 Thread Dave Chinner
trans_context_active > To check whehter current is in fs transcation or not > - xfs_trans_context_swap > Transfer the transaction context when rolling a permanent transaction > > These two new helpers are instroduced in xfs_trans.h. > > Cc: Darrick J. Wong > Cc: Matt

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-06 Thread Dave Chinner
On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote: > Hi Dave, > > On 2020/11/30 上午6:47, Dave Chinner wrote: > > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: > > > > > > The call trace is like this: > > > memory_fail

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-12-06 Thread Dave Chinner
On Wed, Dec 02, 2020 at 03:12:20PM +0800, Ruan Shiyang wrote: > Hi Dave, > > On 2020/11/30 上午6:47, Dave Chinner wrote: > > On Mon, Nov 23, 2020 at 08:41:10AM +0800, Shiyang Ruan wrote: > > > > > > The call trace is like this: > > > memory_fail

Re: [Linux-cachefs] [PATCH v9 2/2] xfs: avoid transaction reservation recursion

2020-12-06 Thread Dave Chinner
& (PF_MEMALLOC|PF_KSWAPD)) == > PF_MEMALLOC)) > goto redirty; > > [2]. https://lore.kernel.org/linux-xfs/20201104001649.GN7123@magnolia/ > > Cc: Darrick J. Wong > Cc: Matthew Wilcox (Oracle) > Cc: Christoph Hellwig > Cc: Dave Chinner > Cc: M

Re: [Linux-cachefs] Problems doing DIO to netfs cache on XFS from Ceph

2020-12-03 Thread Dave Chinner
saction context is a bug in XFS. IOWs, we are waiting on a new version of this patchset to be posted: https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.s...@gmail.com/ so that we can get rid of this from iomap and check the transaction recursion case directly in the XFS code. Then your pr

Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

2020-12-02 Thread Dave Chinner
On Wed, Dec 02, 2020 at 10:04:17PM +0100, Greg Kroah-Hartman wrote: > On Thu, Dec 03, 2020 at 07:40:45AM +1100, Dave Chinner wrote: > > On Wed, Dec 02, 2020 at 08:06:01PM +0100, Greg Kroah-Hartman wrote: > > > On Wed, Dec 02, 2020 at 06:41:43PM +0100, Miklos Szeredi wrote: >

Re: [PATCH V2] uapi: fix statx attribute value overlap for DAX & MOUNT_ROOT

2020-12-02 Thread Dave Chinner
orrect regressions in fixes before they get propagated to users. It also creates a clear demarcation between fixes and cc: stable for maintainers and developers: only patches with a cc: stable will be backported immediately to stable. Developers know what patches need urgent backports and, unlike developers, the automated fixes scan does not have the subject matter expertise or background to make that judgement Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 2/2] statx: move STATX_ATTR_DAX attribute handling to filesystems

2020-12-01 Thread Dave Chinner
r that filesystem instance then, by definition, it does not support DAX and the bit should never be set. e.g. We don't talk about kernels that support reflink - what matters to userspace is whether the filesystem instance supports reflink. Think of the useless mess that xfs_info would be if it reported kernel capabilities instead of filesystem instance capabilities. i.e. we don't report that a filesystem supports reflink just because the kernel supports it - it reports whether the filesystem instance being queried supports reflink. And that also implies the kernel supports it, because the kernel has to support it to mount the filesystem... So, yeah, I think it really does need to be conditional on the filesystem instance being queried to be actually useful to users Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-11-29 Thread Dave Chinner
is cached then we can try to re-write it to disk to fix the bad data, otherwise we treat it like a writeback error and report it on the next write/fsync/close operation done on that file. This gets rid of the mf_recover_controller altogether and allows the interface to be used by any sort of block device for any sort of bottom-up reporting of media/device failures. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH v2 0/6] fsdax: introduce fs query to support reflink

2020-11-29 Thread Dave Chinner
is cached then we can try to re-write it to disk to fix the bad data, otherwise we treat it like a writeback error and report it on the next write/fsync/close operation done on that file. This gets rid of the mf_recover_controller altogether and allows the interface to be used by any

Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries

2020-11-25 Thread Dave Chinner
On Wed, Nov 25, 2020 at 06:46:54PM -0500, Sasha Levin wrote: > On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote: > > We've already had one XFS upstream kernel regression in this -rc > > cycle propagated to the stable kernels in 5.9.9 because the stable > > pr

Re: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries

2020-11-25 Thread Dave Chinner
On Wed, Nov 25, 2020 at 10:35:50AM -0500, Sasha Levin wrote: > From: Dave Chinner > > [ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ] > > Jens has reported a situation where partial direct IOs can be issued > and completed yet still return -EAGAIN. We don't

Re: [PATCH] fs/stat: set attributes_mask for STATX_ATTR_DAX

2020-11-23 Thread Dave Chinner
TX_ATTR_DAX in statx for either the attributes or attributes_mask field because the filesystem is not DAX capable. And given that we have filesystems with multiple block devices that can have different DAX capabilities, I think this statx() attr state (and mask) really has to come from the filesystem, not VFS... > Extra question: should we only set this in the attributes mask if > CONFIG_FS_DAX=y ? IMO, yes, because it will always be false on CONFIG_FS_DAX=n and so it may well as not be emitted as a supported bit in the mask. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 1/2] xfs: show the dax option in mount options.

2020-11-11 Thread Dave Chinner
On Wed, Nov 11, 2020 at 11:28:48AM +0100, Michal Suchánek wrote: > On Tue, Nov 10, 2020 at 08:08:23AM +1100, Dave Chinner wrote: > > On Mon, Nov 09, 2020 at 09:27:05PM +0100, Michal Suchánek wrote: > > > On Mon, Nov 09, 2020 at 11:24:19AM -0800, Darrick J. Wong wrote: > >

Re: [PATCH 1/2] xfs: show the dax option in mount options.

2020-11-09 Thread Dave Chinner
storing it's data on a different filesystem that isn't mounted at install time, so the installer has no chance of detecting that the application is going to use DAX enabled storage. IOWs, the installer cannot make decisions based on DAX state on behalf of applications because it does not know what environment the application is going to be configured to run in. DAX can only be deteted reliably by the application at runtime inside it's production execution environment. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH 00/34] fs: idmapped mounts

2020-10-29 Thread Dave Chinner
passing work off to worker threads, duplicating the current creds will capture this information and won't leave random landmines where stuff doesn't work as it should because the worker thread is unaware of the userns that it is supposed to be doing filesytsem operations under... Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit

Re: [PATCH] fs/dcache: optimize start_dir_add()

2020-10-26 Thread Dave Chinner
quire() so that people who have no clue what the hell smp_acquire__after_ctrl_dep() means or does have some hope of understanding of what objects the ordering semantics in the function actually apply to Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [RFC PATCH 0/2] Remove shrinker's nr_deferred

2020-09-30 Thread Dave Chinner
#x27;re going completely in the wrong direction. The problem that needs solving is integrating shrinker scanning control state with memcgs more tightly, not force every memcg aware shrinker to use list_lru for their subsystem shrinker implementations Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-22 Thread Dave Chinner
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote: > Thanks for reviewing NVFS. Not a review - I've just had a cursory look and not looked any deeper after I'd noticed various red flags... > On Tue, 22 Sep 2020, Dave Chinner wrote: > > IOWs, extent based tre

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-22 Thread Dave Chinner
On Tue, Sep 22, 2020 at 12:46:05PM -0400, Mikulas Patocka wrote: > Thanks for reviewing NVFS. Not a review - I've just had a cursory look and not looked any deeper after I'd noticed various red flags... > On Tue, 22 Sep 2020, Dave Chinner wrote: > > IOWs, extent based tre

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-21 Thread Dave Chinner
ch... I can see how "almost in place" modification can be done by having two copies side by side and updating one while the other is the active copy and switching atomically between the two objects. That way a traditional soft-update algorithm would work because the exposure of the changes is via ordering the active copy switches. That would come at a cost, though, both in metadata footprint and CPU overhead. So, what have I missed about the way metadata is updated in the pmem that allows non-atomic updates to work reliably? Cheers, Dave. -- Dave Chinner da...@fromorbit.com ___ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-le...@lists.01.org

Re: NVFS XFS metadata (was: [PATCH] pmem: export the symbols __copy_user_flushcache and __copy_from_user_flushcache)

2020-09-21 Thread Dave Chinner
ch... I can see how "almost in place" modification can be done by having two copies side by side and updating one while the other is the active copy and switching atomically between the two objects. That way a traditional soft-update algorithm would work because the exposure of the changes is via ordering the active copy switches. That would come at a cost, though, both in metadata footprint and CPU overhead. So, what have I missed about the way metadata is updated in the pmem that allows non-atomic updates to work reliably? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-21 Thread Dave Chinner
On Thu, Sep 17, 2020 at 12:47:10AM -0700, Hugh Dickins wrote: > On Thu, 17 Sep 2020, Dave Chinner wrote: > > On Wed, Sep 16, 2020 at 07:04:46PM -0700, Hugh Dickins wrote: > > > On Thu, 17 Sep 2020, Dave Chinner wrote: > > > >

Re: [RFC PATCH 0/2] Remove shrinker's nr_deferred

2020-09-20 Thread Dave Chinner
On Thu, Sep 17, 2020 at 05:12:08PM -0700, Yang Shi wrote: > On Wed, Sep 16, 2020 at 7:37 PM Dave Chinner wrote: > > On Wed, Sep 16, 2020 at 11:58:21AM -0700, Yang Shi wrote: > > It clamps the worst case freeing to half the cache, and that is > > exactly what you are seeing

Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

2020-09-20 Thread Dave Chinner
e running millions of IOPS through the AIO subsystem, then the cost of doing millions of extra atomic ops every second is going to be noticable... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: the "read" syscall sees partial effects of the "write" syscall

2020-09-20 Thread Dave Chinner
thread. There are quite a few custom enterprise apps around that rely on this POSIX behaviour, especially stuff that has come from different Unixes that actually provided Posix compliant behaviour. IOWs, from an upstream POV, POSIX atomic write behaviour doesn't matter very much. From an enter

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-16 Thread Dave Chinner
On Wed, Sep 16, 2020 at 07:04:46PM -0700, Hugh Dickins wrote: > On Thu, 17 Sep 2020, Dave Chinner wrote: > > > > So > > > > P0 p1 > > > > hole punch starts > > takes XFS_MMAPLOCK_EXCL > > truncate_pagec

Re: [RFC PATCH 0/2] Remove shrinker's nr_deferred

2020-09-16 Thread Dave Chinner
031234618.15403-1-da...@fromorbit.com/ Unfortunately, none of the MM developers showed any interest in these patches, so when I found a different solution to the XFS problem it got dropped on the ground. > So why do we have to still keep it around? Because we need a feedback mechanism to allow us to maintain control of the size of filesystem caches that grow via GFP_NOFS allocations. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-16 Thread Dave Chinner
On Wed, Sep 16, 2020 at 05:58:51PM +0200, Jan Kara wrote: > On Sat 12-09-20 09:19:11, Amir Goldstein wrote: > > On Tue, Jun 23, 2020 at 8:21 AM Dave Chinner wrote: > > > > > > From: Dave Chinner > > > > > > The page faultround path ->map_pages i

Re: Support for I/O to a bitbucket

2020-09-06 Thread Dave Chinner
ct. Or if it's a stupid idea, > someone can point out why. I think it's pretty straight forward to do it in the iomap layer... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [PATCH v2] fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()

2020-09-06 Thread Dave Chinner
format > - Add Fixes tag in commit message > > fs/inode.c | 4 +++- > include/linux/fs.h | 3 +-- > 2 files changed, 4 insertions(+), 3 deletions(-) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [PATCH] fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode()

2020-09-03 Thread Dave Chinner
the statement. i.e. if (!drop && !(inode->i_state & I_DONTCACHE) && (sb->s_flags & SB_ACTIVE)) { Which gives a clear indication that there are all at the same precedence and separate logic statements... Otherwise the change looks good. Probably best to resend with the fixes tag :) Cheers, Dave. -- Dave Chinner da...@fromorbit.com

<    1   2   3   4   5   6   7   8   9   10   >