Re: [Cluster-devel] [PATCH 07/11] vfs: add nowait parameter for file_accessed()

2023-09-10 Thread Dave Chinner
On Fri, Sep 08, 2023 at 01:29:55AM +0100, Pavel Begunkov wrote: > On 9/3/23 23:30, Dave Chinner wrote: > > On Wed, Aug 30, 2023 at 02:11:31PM +0800, Hao Xu wrote: > > > On 8/29/23 19:53, Matthew Wilcox wrote: > > > > On Tue, Aug 29, 2023 at 03:46:13PM +0800, Hao Xu

Re: [Cluster-devel] [PATCH 02/11] xfs: add NOWAIT semantics for readdir

2023-09-03 Thread Dave Chinner
On Sun, Aug 27, 2023 at 09:28:26PM +0800, Hao Xu wrote: > From: Hao Xu > > Implement NOWAIT semantics for readdir. Return EAGAIN error to the > caller if it would block, like failing to get locks, or going to > do IO. > > Co-developed-by: Dave Chinner Not really. "C

Re: [Cluster-devel] [PATCH 07/11] vfs: add nowait parameter for file_accessed()

2023-09-03 Thread Dave Chinner
> > Hi Matthew, > The previous discussion shows this does cause issues in real > producations: > https://lore.kernel.org/io-uring/2785f009-2ebb-028d-8250-d5f3a3051...@gmail.com/#:~:text=fwiw%2C%20we%27ve%20just%20recently%20had%20similar%20problems%20with%20io_uring%20read/write > Then separate it out into it's own patch set so we can have a discussion on the merits of requiring using noatime, relatime or lazytime for really latency sensitive IO applications. Changing code is not always the right solution... -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH RFC v5 00/29] io_uring getdents

2023-08-25 Thread Dave Chinner
tarted with and don't try to solve every little blocking problem that might exist in the VFS and filesystems... -Dave -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 25/29] xfs: support nowait for xfs_buf_item_init()

2023-08-25 Thread Dave Chinner
ven worse than that - once we have committed intents, the whole chain of intent processing must be run to completionr. Hence we can't tolerate backing out of that defered processing chain half way through because we might have to block. Until we can roll back partial dirty transactions and partially completed defered intent chains at any random point of completion, XFS_TRANS_NOWAIT will not work. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 28/29] xfs: support nowait semantics for xc_ctx_lock in xlog_cil_commit()

2023-08-25 Thread Dave Chinner
tem shutdown as we cannot back out from failure with dirty log items gracefully at this point. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 26/29] xfs: return -EAGAIN when nowait meets sync in transaction commit

2023-08-25 Thread Dave Chinner
is point with shutting down the filesystem. This points to XFS_TRANS_NOWAIT being completely broken, too, because we don't call xfs_trans_set_sync() until just before we commit the transaction. At this point, it is -too late- for nowait+sync to be handled gracefully, and it will *always* go bad. IOWs,

Re: [Cluster-devel] [PATCH 24/29] xfs: support nowait for xfs_buf_read_map()

2023-08-25 Thread Dave Chinner
to my first comments that XBF_TRYLOCK cannot simpy be replaced with XBF_NOWAIT semantics... -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 02/29] xfs: rename XBF_TRYLOCK to XBF_NOWAIT

2023-08-25 Thread Dave Chinner
XBF_INCORE(1u << 29)/* lookup only, return if found in cache */ > -#define XBF_TRYLOCK (1u << 30)/* lock requested, but do not wait */ > +#define XBF_NOWAIT(1u << 30)/* mem/lock requested, but do not wait */ That's now a really poor comment. It doesn't describe the semantics or constraints that NOWAIT might imply. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v4 46/48] mm: shrinker: make memcg slab shrink lockless

2023-08-07 Thread Dave Chinner
ropping the refcount to zero and freeing occuring in a different context... > + /* > + * We have already exited the read-side of rcu critical section > + * before calling do_shrink_slab(), the shrinker_info may be > + * released in expand_one_shrinker_info(), so reacquire the > + * shrinker_info. > + */ > + index++; > + goto again; With that, what makes the use of shrinker_info in xchg_nr_deferred_memcg() in do_shrink_slab() coherent and valid? -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless

2023-08-07 Thread Dave Chinner
ker_put(shrinker); > + wait_for_completion(>done); > + } Needs a comment explaining why we need to wait here... > + > down_write(_rwsem); > if (shrinker->flags & SHRINKER_REGISTERED) { > - list_del(>list); > + /* > + * Lookups on the shrinker are over and will fail in the future, > + * so we can now remove it from the lists and free it. > + */ rather than here after the wait has been done and provided the guarantee that no shrinker is running or will run again... -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v4 44/48] mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}

2023-08-07 Thread Dave Chinner
> unsigned long ret, freed = 0; > - int i; > + int offset, index = 0; > > if (!mem_cgroup_online(memcg)) > return 0; > @@ -419,56 +470,63 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask, > int nid, > if (unlikely(!inf

Re: [Cluster-devel] [PATCH v4 45/48] mm: shrinker: make global slab shrink lockless

2023-08-07 Thread Dave Chinner
lowing cases. > This commit uses the refcount+RCU method [5] proposed by Dave Chinner > to re-implement the lockless global slab shrink. The memcg slab shrink is > handled in the subsequent patch. > --- > include/linux/shrinker.h | 17 ++ >

Re: [Cluster-devel] [PATCH v3 28/49] dm zoned: dynamically allocate the dm-zoned-meta shrinker

2023-07-27 Thread Dave Chinner
obviously correct" that what we have now. > So not adding that super simple > helper is not exactly the best choice in my opinion. Each to their own - I much prefer the existing style/API over having to go look up a helper function every time I want to check some random shrinker has been set up correctly -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 03/47] mm: shrinker: add infrastructure for dynamically allocating shrinker

2023-07-26 Thread Dave Chinner
shrinker); up_write(_rwsem); if (debugfs_entry) shrinker_debugfs_remove(debugfs_entry, debugfs_id); kfree(shrinker->nr_deferred); kfree(shrinker); } EXPORT_SYMBOL_GPL(shrinker_free); -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v5 6/8] xfs: switch to multigrain timestamps

2023-07-18 Thread Dave Chinner
> + WARN_ON_ONCE(!(flags & XFS_ICHGTIME_CHG)); Make that an ASSERT(flags & XFS_ICHGTIME_CHG), please. There's no need to verify this at runtime on production kernels. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v7 19/20] fs: iomap: use bio_add_folio_nofail where possible

2023-05-31 Thread Dave Chinner
On Thu, Jun 01, 2023 at 06:20:21AM +0200, Christoph Hellwig wrote: > On Thu, Jun 01, 2023 at 08:36:59AM +1000, Dave Chinner wrote: > > We lose adjacent page merging with this change. > > This is only used for adding the first folio to a brand new bio, > so there is nothing

Re: [Cluster-devel] [PATCH v7 19/20] fs: iomap: use bio_add_folio_nofail where possible

2023-05-31 Thread Dave Chinner
h more performant when it only has to do a single large DMA instead of (potentially) hundreds of single page DMAs for the same amount of data... What performance regression testing has been done on this change? -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 16/17] block: use iomap for writes to block devices

2023-05-23 Thread Dave Chinner
t; - iomap->addr = iomap->offset; > + if (WARN_ON_ONCE(iomap->offset >= isize)) { > + iomap->type = IOMAP_HOLE; > + iomap->addr = IOMAP_NULL_ADDR; > + } else { > + iomap->type = IOMAP_MAPPED; > + iomap->addr = iomap->offset; > + } I think Christoph's code is correct. IMO, any attempt to read beyond the end of the device should throw out a warning and return an error, not silently return zeros. If readahead is trying to read beyond the end of the device, then it really seems to me like the problem here is readahead, not the iomap code detecting the OOB read request Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 00/23] fs-verity support for XFS

2023-04-11 Thread Dave Chinner
o do for directory names) except for the xattrs that hold the encryption keys themselves. That means the merkle tree blocks should get encrypted without any extra work needing to be done anywhere. This will simply require the fscrypt keys to be held in a private internal xattr namespace that isn't encrypted, but that's realtively trivial to do... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 21/23] xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE

2023-04-05 Thread Dave Chinner
On Wed, Apr 05, 2023 at 10:54:06PM +, Eric Biggers wrote: > On Thu, Apr 06, 2023 at 08:26:46AM +1000, Dave Chinner wrote: > > > We could certainly think about moving to a design where fs/verity/ asks > > > the > > > filesystem to just *read* a Mer

Re: [Cluster-devel] [PATCH v2 21/23] xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE

2023-04-05 Thread Dave Chinner
de doesn't work with data structures that span multiple disjoint pages... Another problem is that the xfs-buf might be backed by heap memory (e.g. 4kB fs block size on 64kB PAGE_SIZE) and so it cannot be treated like a page cache page by the fsverity merkle tree code. We most definitely do not want to be passing pages containing heap memory to functions expecting to be passed lru-resident page cache pages That said, xfs-bufs do have a stable method of addressing the data in the buffers, and all the XFS code uses this to access and manipulate data directly in the buffers. That is, xfs_buf_offset() returns a mapped kaddr that points to the contiguous memory region containing the metadata in the buffer. If the xfs_buf spans multiple pages, it will return a kaddr pointing into the contiguous vmapped memory address that maps the entire buffer data range. If it is heap memory, it simply returns a pointer into that heap memory. If it's a single page, then it returns the kaddr for the data within the page. If you move all the assumptions about how the merkle tree data is managed out of fsverity and require the fielsystems to do the mapping to kaddrs and reference counting to guarantee life times, then the need for multiple different methods for reading merkle tree data go away... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 21/23] xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE

2023-04-05 Thread Dave Chinner
ey's new drop callout. fsverity doesn't need to care what the buffer is made from, how it is cached, what it's life cycle is, etc. The caching mechanism and reference counting is entirely controlled by the filesystem callout implementations, and fsverity only needs to deal with memory buffers that are guaranteed to live for the entire walk of the merkle tree Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 19/23] xfs: disable direct read path for fs-verity sealed files

2023-04-05 Thread Dave Chinner
lly want the same memory buffer based interface for the merkle tree reading so that the merkle tree code can read directly from the xfs-buf rather than requiring us to copy it out of the xfsbuf into temporary pages... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 19/23] xfs: disable direct read path for fs-verity sealed files

2023-04-05 Thread Dave Chinner
more -- I suppose we shouldn't just go > breaking directio reads from a verity file if we can help it. Is there > a way to ask fsverity to perform its validation against some arbitrary > memory buffer that happens to be fs-block aligned? The memory buffer doesn't even need to be fs-block aligned - it just needs to be a pointer to memory the kernel can read... We also need fsverity to be able to handle being passed mapped kernel memory rather than pages/folios for the merkle tree interfaces. That way we can just pass it the mapped buffer memory straight from the xfs-buf and we don't have to do the whacky "copy from xattr xfs_bufs into pages so fsverity can take temporary reference counts on what it thinks are page cache pages" as it walks the merkle tree. -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 16/23] xfs: add inode on-disk VERITY flag

2023-04-04 Thread Dave Chinner
I think they should get it automatically now that it has been defined for FS_IOC_FSGETXATTR and added to the generic fileattr flag fill functions in fs/ioctl.c. > I'd also like all ways of getting the verity flag to continue to be mentioned > in > Documentation/filesystems/fsverity.rst. The existing methods (FS_IOC_GETFLAGS > and statx) are already mentioned there. *nod* -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2 06/23] fsverity: add drop_page() callout

2023-04-04 Thread Dave Chinner
p_page(page); else put_page(page); } And then you don't need to add the functions to each of the filesystems nor make an indirect call just to run put_page(). Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler

2023-01-18 Thread Dave Chinner
with DIO write vs buffered read, nor DIO write vs mmap IO. It's not totally clear to me whether we need ->iomap_valid checks in the buffered read paths to avoid the completion races with DIO writes, but there are windows there where cached iomaps could be considered stale Chee

Re: [Cluster-devel] [RFC v6 04/10] iomap: Add iomap_get_folio helper

2023-01-11 Thread Dave Chinner
c error handling from a single caller? If there's half a dozen cases that need this sort of handling, then maybe it's the right thing to do. But for a single calling context that only needs to add a null return check in one specific case? There's absolutely no need to make generic infrastructure violate layering abstractions to handle that... -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler

2023-01-09 Thread Dave Chinner
On Mon, Jan 09, 2023 at 07:45:27PM +0100, Andreas Gruenbacher wrote: > On Sun, Jan 8, 2023 at 10:59 PM Dave Chinner wrote: > > On Sun, Jan 08, 2023 at 08:40:32PM +0100, Andreas Gruenbacher wrote: > > > Eliminate the ->iomap_valid() handler by switching to a ->g

Re: [Cluster-devel] [RFC v6 08/10] iomap/xfs: Eliminate the iomap_valid handler

2023-01-08 Thread Dave Chinner
the iomap core needs to be able to formally verify the iomap is valid at any point in time, not just at the point in time a folio in the page cache has been locked... -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC v6 04/10] iomap: Add iomap_get_folio helper

2023-01-08 Thread Dave Chinner
ing a big hole in it by requiring filesystems to actually allocation page cache folios on behalf of the iomap core. Given that I recently got major push-back for fixing an XFS-only bug by walking the page cache directly instead of abstracting it via the iomap core, punching an even bigger hole in the abstraction layer to fix a GFS2-only problem is just as bad -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [RFC v2 0/3] Turn iomap_page_ops into iomap_folio_ops

2022-12-05 Thread Dave Chinner
On Fri, Dec 02, 2022 at 02:54:00AM +0100, Andreas Gruenbacher wrote: > On Thu, Dec 1, 2022 at 10:30 PM Dave Chinner wrote: > > On Thu, Dec 01, 2022 at 07:09:54PM +0100, Andreas Gruenbacher wrote: > > > Hi again, > > > > > > [Same thing, but with t

Re: [Cluster-devel] [RFC v2 0/3] Turn iomap_page_ops into iomap_folio_ops

2022-12-01 Thread Dave Chinner
s. To avoid all this, can we simple move the ->page_done() callout in the error path and iomap_write_end() to before we unlock the folio? You've already done that for pagecache_isize_extended(), and I can't see anything obvious in the gfs2 ->page_done callout that would cause issues if it is called with a locked dirty folio... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [syzbot] WARNING in iomap_read_inline_data

2022-11-29 Thread Dave Chinner
end trace ]--- > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkal...@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > syzbot can test patches for this issue, for details see: > https://goo.gl/tpsmEJ#testing-patches > -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 04/23] page-writeback: Convert write_cache_pages() to use filemap_get_folios_tag()

2022-11-03 Thread Dave Chinner
On Thu, Nov 03, 2022 at 07:45:01PM -0700, Darrick J. Wong wrote: > On Fri, Nov 04, 2022 at 11:32:35AM +1100, Dave Chinner wrote: > > On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote: > > > On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote: > > > &

Re: [Cluster-devel] [PATCH 04/23] page-writeback: Convert write_cache_pages() to use filemap_get_folios_tag()

2022-11-03 Thread Dave Chinner
On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote: > On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote: > > On Thu, Sep 01, 2022 at 03:01:19PM -0700, Vishal Moola (Oracle) wrote: > > > Converted function to use folios throughout. This is in preparation for

Re: [Cluster-devel] [PATCH v4 00/23] Convert to filemap_get_folios_tag()

2022-11-03 Thread Dave Chinner
On Thu, Nov 03, 2022 at 09:38:48AM -0700, Vishal Moola wrote: > On Thu, Nov 3, 2022 at 12:08 AM Dave Chinner wrote: > > > > On Wed, Nov 02, 2022 at 09:10:08AM -0700, Vishal Moola (Oracle) wrote: > > > This patch series replaces find_get_pages_range_tag() with >

Re: [Cluster-devel] [PATCH v4 00/23] Convert to filemap_get_folios_tag()

2022-11-03 Thread Dave Chinner
e other changes (afs, ceph, cifs, gfs2) would be appreciated. Same question as last time: have you tested this with multipage folios enabled? If you haven't tested XFS, then I'm guessing the answer is no, and you haven't fixed the bug I pointed out in the write_cache_pages() implementation -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 00/23] Convert to filemap_get_folios_tag()

2022-10-18 Thread Dave Chinner
o ensure that it correctly handles multi-page folios. And the only way you can do that fully at this point in time is via testing XFS or AFS... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 04/23] page-writeback: Convert write_cache_pages() to use filemap_get_folios_tag()

2022-10-18 Thread Dave Chinner
error = (*writepage)(page, wbc, data); > + error = writepage(>page, wbc, data); Yet, IIUC, this treats all folios as if they are single page folios. i.e. it passes the head page of a multi-page folio to a callback that will treat it as a single PAGE_SIZE page, because that's all the writepage callbacks are currently expected to be passed... So won't this break writeback of dirty multipage folios? -Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] remove iomap_writepage v2

2022-07-28 Thread Dave Chinner
, btrfs and XFS), ->writepage no longer plays any part in memory reclaim on their systems. So why should we try to maintain the fiction that ->writepage is required functionality in a filesystem when it clearly isn't? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH] iomap: fix incomplete async dio reads when using IOMAP_DIO_PARTIAL

2022-03-02 Thread Dave Chinner
On Wed, Mar 02, 2022 at 02:03:28PM +0100, Andreas Gruenbacher wrote: > On Wed, Mar 2, 2022 at 11:17 AM Filipe Manana wrote: > > On Tue, Mar 01, 2022 at 09:38:30AM +1100, Dave Chinner wrote: > > > On Mon, Feb 28, 2022 at 02:32:03PM +, fdman...@kernel.org wrote: > >

Re: [Cluster-devel] [PATCH] iomap: fix incomplete async dio reads when using IOMAP_DIO_PARTIAL

2022-02-28 Thread Dave Chinner
ed the EOF boundary. Do this even > if the read has IOCB_NOWAIT set, as it's the only way to avoid providing > an unexpected result to an application. That's highly specific and ultimately will be fragile, IMO. I'd much prefer that *_iomap_begin_write() implementations simply follow IOMAP_NOWAIT requirements to ensure that any DIO that needs multiple mappings if punted to a context that can block... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-24 Thread Dave Chinner
On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote: > On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote: > > > Fair enough; on the other hand, we could also view this as making ext4 > > > more robust against buggy code in other subsystems, and while oth

Re: [Cluster-devel] [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers()

2022-02-23 Thread Dave Chinner
operly first informing the file system in a context where it can > block and potentially do I/O to do things like allocate blocks. I'm not sure that replacing the BUG() with a warning is good enough - it's still indicative of an application doing something dangerous that could result in silent data corruption and/or other problems. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2.1 11/30] iomap: add the new iomap_iter model

2021-08-16 Thread Dave Chinner
ng > Signed-off-by: Darrick J. Wong Looks like a straight translation of Christoph's original. Seems fine to me as a simepl step towards preserving the git history. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 31/30] iomap: move iomap iteration code to iter.c

2021-08-16 Thread Dave Chinner
/fs/iomap/apply.c b/fs/iomap/iter.c > similarity index 100% > rename from fs/iomap/apply.c > rename to fs/iomap/iter.c LGTM, Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v2.1 24/30] iomap: remove iomap_apply

2021-08-16 Thread Dave Chinner
J. Wong > Signed-off-by: Darrick J. Wong > --- > fs/iomap/apply.c | 91 > - > fs/iomap/trace.h | 40 -- > include/linux/iomap.h | 10 - > 3 files changed, 141 deletions(-) Looks good. Re

Re: [Cluster-devel] [PATCH v2.1 19/30] iomap: switch iomap_bmap to use iomap_iter

2021-08-16 Thread Dave Chinner
ewed-by: Darrick J. Wong > Signed-off-by: Darrick J. Wong > --- > fs/iomap/fiemap.c | 31 +-- > 1 file changed, 13 insertions(+), 18 deletions(-) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 11/30] iomap: add the new iomap_iter model

2021-08-09 Thread Dave Chinner
the older pre-disaggregation fs/iomap.c without having to take the tree back in time to find those files... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 08/27] iomap: add the new iomap_iter model

2021-07-19 Thread Dave Chinner
_iterate() is fine as the function name - there's no need for abbreviation here because it's not an overly long name. This will makes it clearly different to the struct iomap_iter that is passed to it and it will also make grep, cscope and other code searching tools much more precise... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH 20/27] fsdax: switch dax_iomap_rw to use iomap_iter

2021-07-19 Thread Dave Chinner
y changing when both are used in the same function. Would it be better to avoid any possible confusion simply by using "iomi" for all iomap_iter variables throughout the patchset from the start? That way nobody is going to confuse iov_iter with iomap_iter iteration variables and code that

Re: [Cluster-devel] RFC: iomap write invalidation

2020-07-14 Thread Dave Chinner
cratering. Hopefully this will only be a rare event. So, to hoist myself on my own petard: correctness first, performance second. Acked-by: Dave Chinner Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] always fall back to buffered I/O after invalidation failures, was: Re: [PATCH 2/6] iomap: IOMAP_DIO_RWF_NO_STALE_PAGECACHE return if page invalidation fails

2020-07-09 Thread Dave Chinner
On Thu, Jul 09, 2020 at 10:10:38AM -0700, Darrick J. Wong wrote: > On Thu, Jul 09, 2020 at 06:05:19PM +0100, Matthew Wilcox wrote: > > On Thu, Jul 09, 2020 at 09:09:26AM -0700, Darrick J. Wong wrote: > > > On Thu, Jul 09, 2020 at 12:25:27PM +1000,

Re: [Cluster-devel] always fall back to buffered I/O after invalidation failures, was: Re: [PATCH 2/6] iomap: IOMAP_DIO_RWF_NO_STALE_PAGECACHE return if page invalidation fails

2020-07-08 Thread Dave Chinner
On Wed, Jul 08, 2020 at 02:54:37PM +0100, Matthew Wilcox wrote: > On Wed, Jul 08, 2020 at 04:51:27PM +1000, Dave Chinner wrote: > > On Tue, Jul 07, 2020 at 03:00:30PM +0200, Christoph Hellwig wrote: > > > On Tue, Jul 07, 2020 at 01:57:05PM +0100, Matthew Wilcox wrote: > > &

Re: [Cluster-devel] always fall back to buffered I/O after invalidation failures, was: Re: [PATCH 2/6] iomap: IOMAP_DIO_RWF_NO_STALE_PAGECACHE return if page invalidation fails

2020-07-08 Thread Dave Chinner
ed by unpredicably switching between direct IO and buffered IO (e.g. suddening DIO writes serialise -all IO-) will cause unacceptible performance regressions for many applications and be -very difficult to diagnose- in production systems. IOWs, we need to let the individual filesystems decide how they want to use the page cache for direct IO. Just because we have new direct IO infrastructure (i.e. iomap) it does not mean we can just make wholesale changes to the direct IO path behaviour... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 17/19] iomap: Restructure iomap_readpages_actor

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 10:04:15PM -0800, Matthew Wilcox wrote: > On Wed, Feb 19, 2020 at 02:29:00PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:46:11AM -0800, Matthew Wilcox wrote: > > > @@ -418,6 +412,15 @@ iomap_readpages_actor(struct inode *inode, loff_t > &

Re: [Cluster-devel] [PATCH v6 00/19] Change readahead API

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 07:48:32PM -0800, Matthew Wilcox wrote: > On Wed, Feb 19, 2020 at 02:45:25PM +1100, Dave Chinner wrote: > > On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote: > > > On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > > > &

Re: [Cluster-devel] [PATCH v6 00/19] Change readahead API

2020-02-18 Thread Dave Chinner
On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote: > On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > > > Latest version in your git tree: > > > > > > $ ▶ glo -n 5 w

Re: [Cluster-devel] [PATCH v6 18/19] iomap: Convert from readpages to readahead

2020-02-18 Thread Dave Chinner
if (!ctx->cur_page_in_bio) unlock_page(ctx->cur_page); put_page(ctx->cur_page); ctx->cur_page = NULL; readahead_next(ctx->rac); } Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 17/19] iomap: Restructure iomap_readpages_actor

2020-02-18 Thread Dave Chinner
ctx->cur_page = NULL; > + } > } > > return done; > @@ -451,11 +454,7 @@ iomap_readpages(struct address_space *mapping, struct > list_head *pages, > done: > if (ctx.bio) > submit_bio(ctx.bio); > - if (ctx.cur_page) { > -

Re: [Cluster-devel] [PATCH v6 16/19] fuse: Convert from readpages to readahead

2020-02-18 Thread Dave Chinner
id of the get_page() call in fuse_readpages_fill(). > > Signed-off-by: Matthew Wilcox (Oracle) Looks OK. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 14/19] ext4: Convert from readpages to readahead

2020-02-18 Thread Dave Chinner
ext4. I'll re-introduce the patch and see if it falls over again. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 13/19] erofs: Convert compressed files from readpages to readahead

2020-02-18 Thread Dave Chinner
file changed, 9 insertions(+), 20 deletions(-) Looks fine. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 12/19] erofs: Convert uncompressed files from readpages to readahead

2020-02-18 Thread Dave Chinner
--- > fs/erofs/zdata.c | 2 +- > include/trace/events/erofs.h | 6 +++--- > 3 files changed, 18 insertions(+), 29 deletions(-) Looks fine from the perspective of page iteration and error handling. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 11/19] btrfs: Convert from readpages to readahead

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 01:12:28PM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 05:57:58PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:45:59AM -0800, Matthew Wilcox wrote: > > > From: "Matthew Wilcox (Oracle)" &g

Re: [Cluster-devel] [PATCH v6 09/19] mm: Add page_cache_readahead_limit

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 11:54:04AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 05:31:10PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:45:56AM -0800, Matthew Wilcox wrote: > > > From: "Matthew Wilcox (Oracle)" > > > > &g

Re: [Cluster-devel] [PATCH v6 08/19] mm: Add readahead address space operation

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 08:10:04AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 05:21:47PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:45:54AM -0800, Matthew Wilcox wrote: > > > From: "Matthew Wilcox (Oracle)" > > > > > > Th

Re: [Cluster-devel] [PATCH v6 07/19] mm: Put readahead pages in cache earlier

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 07:42:22AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 05:14:59PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:45:52AM -0800, Matthew Wilcox wrote: > > > From: "Matthew Wilcox (Oracle)" > > > > > >

Re: [Cluster-devel] [PATCH v6 04/19] mm: Rearrange readahead loop

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 05:57:36AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 04:08:24PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:45:45AM -0800, Matthew Wilcox wrote: > > > From: "Matthew Wilcox (Oracle)" > > > > > > Mov

Re: [Cluster-devel] [PATCH v6 03/19] mm: Use readahead_control to pass arguments

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 05:56:18AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 04:03:00PM +1100, Dave Chinner wrote: > > On Mon, Feb 17, 2020 at 10:45:44AM -0800, Matthew Wilcox wrote: > > > +static void read_pages(struct readahead_control *rac, struct list_

Re: [Cluster-devel] [PATCH v6 00/19] Change readahead API

2020-02-18 Thread Dave Chinner
On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote: > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote: > > Latest version in your git tree: > > > > $ ▶ glo -n 5 willy/readahead > > 4be497096c04 mm: Use memalloc_nofs_save in readahead path >

Re: [Cluster-devel] [PATCH v6 11/19] btrfs: Convert from readpages to readahead

2020-02-17 Thread Dave Chinner
; + break; > + } > + > + return batch; > +} Seems a bit big for an inline function. > + > +#define readahead_for_each_batch(rac, array, size, nr) > \ > + for (; (nr = readahead_page_batch(rac, array, size)); \ > + readahead_next(rac)) I had to go look at the caller to work out what "size" refered to here. This is complex enough that it needs proper API documentation. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 10/19] fs: Convert mpage_readpages to mpage_readahead

2020-02-17 Thread Dave Chinner
ed, 73 insertions(+), 126 deletions(-) That's actually pretty simple changeover. Nothing really scary there. :) Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 09/19] mm: Add page_cache_readahead_limit

2020-02-17 Thread Dave Chinner
st certainly not the function you want to call. Use page_cache_async_readahead or page_cache_sync_readahead() instead." Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 08/19] mm: Add readahead address space operation

2020-02-17 Thread Dave Chinner
ahead.c > index 9e430daae42f..975ff5e387be 100644 > --- a/mm/readahead.c > +++ b/mm/readahead.c > @@ -121,7 +121,13 @@ static void read_pages(struct readahead_control *rac, > struct list_head *pages) > > blk_start_plug(); > > - if (aops->readpages) { > + if (aops->readahead) { > + aops->readahead(rac); > + readahead_for_each(rac, page) { > + unlock_page(page); > + put_page(page); > + } This needs a comment to explain the unwinding that needs to be done here. I'm not going to remember in a year's time that this is just for the pages that weren't submitted by ->readahead Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 07/19] mm: Put readahead pages in cache earlier

2020-02-17 Thread Dave Chinner
_list) { > + page->index = offset; > + list_add(>lru, _pool); > + } else if (add_to_page_cache_lru(page, mapping, offset, > + gfp_mask) < 0) { > + put_page(page); > + goto read; > + } Ok, so that's why you put read code at the end of the loop. To turn the code into spaghetti :/ How much does this simplify down when we get rid of ->readpages and can restructure the loop? This really seems like you're trying to flatten two nested loops into one by the use of goto Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 06/19] mm: rename readahead loop variable to 'i'

2020-02-17 Thread Dave Chinner
-off-by: Matthew Wilcox (Oracle) > --- > mm/readahead.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Looks fine. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 05/19] mm: Remove 'page_offset' from readahead loop

2020-02-17 Thread Dave Chinner
t; > Signed-off-by: Matthew Wilcox (Oracle) > --- > mm/readahead.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) Looks ok, but having the readahead dispatch out of line from the case that triggers it makes it hard to follow. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 04/19] mm: Rearrange readahead loop

2020-02-17 Thread Dave Chinner
Also, why? This adds a goto from branched code that continues, then adds a continue so the unbranched code doesn't execute the code the goto jumps to. In absence of any explanation, this isn't an improvement and doesn't make any sense... -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 03/19] mm: Use readahead_control to pass arguments

2020-02-17 Thread Dave Chinner
the next >* batch. >*/ > - if (nr_pages) > - read_pages(mapping, filp, _pool, nr_pages, > - gfp_mask); > - nr_pages = 0; > + if (readahead_count()) > + read_pages(, _pool, gfp_mask); > + rac._nr_pages = 0; Hmmm. Wondering ig it make sense to move the gfp_mask to the readahead control structure - if we have to pass the gfp_mask down all the way along side the rac, then I think it makes sense to do that... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 00/19] Change readahead API

2020-02-17 Thread Dave Chinner
e2/0xfa [2.479776] ret_from_fork+0x1f/0x30 [2.480737] ---[ end trace e77079de9b22dc6a ]--- I just dropped the ext4 conversion from my local tree so I can boot the machine and test XFS. Might have some more info when that crashes and burns... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 02/19] mm: Ignore return value of ->readpages

2020-02-17 Thread Dave Chinner
Reviewed-by: Christoph Hellwig > --- > mm/readahead.c | 8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) Simple enough. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v6 01/19] mm: Return void from various readahead functions

2020-02-17 Thread Dave Chinner
nd we don't need to worry that a present page in the readahead > window causes us to return a smaller nr_pages than we ought to have. > > Signed-off-by: Matthew Wilcox (Oracle) Looks good. Reviewed-by: Dave Chinner -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-11 Thread Dave Chinner
On Tue, Feb 11, 2020 at 04:54:13AM -0800, Matthew Wilcox wrote: > On Tue, Feb 11, 2020 at 03:52:30PM +1100, Dave Chinner wrote: > > > +struct readahead_control { > > > + struct file *file; > > > + struct address_space *mapping; > > > +/* private

Re: [Cluster-devel] [PATCH v5 04/13] mm: Add readahead address space operation

2020-02-10 Thread Dave Chinner
r_pages(page); > + rac->start += rac->batch_count; There's no mention of large page support in the patch description and I don't recall this sort of large page batching in previous iterations. This seems like new functionality to me, not directly related to the initial ->readahead API change? What have I missed? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] RFC: hold i_rwsem until aio completes

2020-02-03 Thread Dave Chinner
On Mon, Feb 03, 2020 at 06:46:41PM +0100, Christoph Hellwig wrote: > On Sat, Jan 18, 2020 at 08:28:38PM +1100, Dave Chinner wrote: > > I think it's pretty gross, actually. It makes the same mistake made > > with locking in the old direct IO code - it encodes specific lock > >

Re: [Cluster-devel] [PATCH 04/12] mm: Add readahead address space operation

2020-01-28 Thread Dave Chinner
s = 2, left = 1, this looks up the page at index 2, which is the one we issued IO on, not the one we "left behind" which is at index 3. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] RFC: hold i_rwsem until aio completes

2020-01-18 Thread Dave Chinner
ystem wants/needs unlock on IO competion semantics, and it's completely filesystem IO-lock implementation agnostic. And for filesystems that use the inode i_rwsem, we can just provide simple helper functions for their read/write unlock methods. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] Interest in DAX for OCFS2 and/or GFS2?

2019-10-17 Thread Dave Chinner
tion. GFS2 is already partially ported to use the iomap infrastructure, though more work is needed to provide the iomap functionality DAX requires. OCFS2 would require a lot more work on this front Cheers, Dave. -- Dave Chinner dchin...@redhat.com

Re: [Cluster-devel] [Q] gfs2: mmap write vs. punch_hole consistency

2019-09-06 Thread Dave Chinner
On Fri, Sep 06, 2019 at 11:48:31PM +0200, Andreas Gruenbacher wrote: > On Fri, Sep 6, 2019 at 11:28 PM Dave Chinner wrote: > > On Fri, Sep 06, 2019 at 10:52:41PM +0200, Andreas Gruenbacher wrote: > > > Hi, > > > > > > I've just fixed a mmap write v

Re: [Cluster-devel] [Q] gfs2: mmap write vs. punch_hole consistency

2019-09-06 Thread Dave Chinner
); See xfs_flush_unmap_range(), which is run under XFS_MMAPLOCK_EXCL to serialise against incoming page faults... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] RFC: use the iomap writepage path in gfs2

2019-07-07 Thread Dave Chinner
al conversion - what needs to be done to iomap and gfs2 to support the journalled data path so there's a single data IO path? Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH] fs: Move mark_inode_dirty out of __generic_write_end

2019-06-19 Thread Dave Chinner
the same time some other operation is changing the size of the file, and that means this code no longer does what you are intending it to do because the inode->i_size is no longer constant across these operations... Hence I think adding code that depends on i_rwsem to be held to function correctly is the wrong direction to be taking the iomap infrastructure. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v7 0/5] iomap and gfs2 fixes

2019-04-30 Thread Dave Chinner
ts for the iomap and XFS branches - IMO, there's no really need to have a complete new tree for it... Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] gfs2 iomap dealock, IOMAP_F_UNBALANCED

2019-03-21 Thread Dave Chinner
p granularity, rather than a per-syscall granularity. i.e. if we do write(2GB), we want more than one balancing call during that syscall, so it woul dbe up to the filesystem to a) limit the size of write mappings to something smaller (e.g. 1024 pages) so that there are still frequent balancing calls for large writes. Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH v4 06/11] iomap: Add write_{begin, end} iomap operations

2018-05-14 Thread Dave Chinner
l bufferhead free iomap IO path Let's see what Christoph thinks first, though Cheers, Dave. -- Dave Chinner da...@fromorbit.com

Re: [Cluster-devel] [PATCH] fs: gfs2: Adding new return type vm_fault_t

2018-04-16 Thread Dave Chinner
hole series (all the mm changes and all the | required fs changes) sent out for review prior to the merge window? We're not asking for a description of what you are doing - we are asking why the normal processes for proposing and merging such a change is not being followed, and how you plan to rect

Re: [Cluster-devel] [PATCH] gfs2: Fsync parent directories

2018-02-20 Thread Dave Chinner
is made stable, so must be the directory modification done during file creation. This has nothing to do with POSIX or what the "linux standard" is - this is testing whether the implementation of strictly ordered metadata journalling is correct or not. If gfs2 does not have strictly ordered metadata journalling, then it probably shouldn't run these tests Cheers, Dave. -- Dave Chinner dchin...@redhat.com

  1   2   >