On Fri, Sep 08, 2023 at 01:29:55AM +0100, Pavel Begunkov wrote:
> On 9/3/23 23:30, Dave Chinner wrote:
> > On Wed, Aug 30, 2023 at 02:11:31PM +0800, Hao Xu wrote:
> > > On 8/29/23 19:53, Matthew Wilcox wrote:
> > > > On Tue, Aug 29, 2023 at 03:46:13PM +0800, Hao Xu
On Sun, Aug 27, 2023 at 09:28:26PM +0800, Hao Xu wrote:
> From: Hao Xu
>
> Implement NOWAIT semantics for readdir. Return EAGAIN error to the
> caller if it would block, like failing to get locks, or going to
> do IO.
>
> Co-developed-by: Dave Chinner
Not really.
"C
>
> Hi Matthew,
> The previous discussion shows this does cause issues in real
> producations:
> https://lore.kernel.org/io-uring/2785f009-2ebb-028d-8250-d5f3a3051...@gmail.com/#:~:text=fwiw%2C%20we%27ve%20just%20recently%20had%20similar%20problems%20with%20io_uring%20read/write
>
Then separate it out into it's own patch set so we can have a
discussion on the merits of requiring using noatime, relatime or
lazytime for really latency sensitive IO applications. Changing code
is not always the right solution...
-Dave.
--
Dave Chinner
da...@fromorbit.com
tarted with and don't try to solve every little
blocking problem that might exist in the VFS and filesystems...
-Dave
--
Dave Chinner
da...@fromorbit.com
ven worse than that - once we have committed intents, the
whole chain of intent processing must be run to completionr. Hence
we can't tolerate backing out of that defered processing chain half
way through because we might have to block.
Until we can roll back partial dirty transactions and partially
completed defered intent chains at any random point of completion,
XFS_TRANS_NOWAIT will not work.
-Dave.
--
Dave Chinner
da...@fromorbit.com
tem shutdown as we cannot back out from failure
with dirty log items gracefully at this point.
-Dave.
--
Dave Chinner
da...@fromorbit.com
is point with shutting down the filesystem.
This points to XFS_TRANS_NOWAIT being completely broken, too,
because we don't call xfs_trans_set_sync() until just before we
commit the transaction. At this point, it is -too late- for
nowait+sync to be handled gracefully, and it will *always* go bad.
IOWs,
to my first comments that XBF_TRYLOCK cannot simpy
be replaced with XBF_NOWAIT semantics...
-Dave.
--
Dave Chinner
da...@fromorbit.com
XBF_INCORE(1u << 29)/* lookup only, return if found in cache */
> -#define XBF_TRYLOCK (1u << 30)/* lock requested, but do not wait */
> +#define XBF_NOWAIT(1u << 30)/* mem/lock requested, but do not wait */
That's now a really poor comment. It doesn't describe the semantics
or constraints that NOWAIT might imply.
-Dave.
--
Dave Chinner
da...@fromorbit.com
ropping
the refcount to zero and freeing occuring in a different context...
> + /*
> + * We have already exited the read-side of rcu critical section
> + * before calling do_shrink_slab(), the shrinker_info may be
> + * released in expand_one_shrinker_info(), so reacquire the
> + * shrinker_info.
> + */
> + index++;
> + goto again;
With that, what makes the use of shrinker_info in
xchg_nr_deferred_memcg() in do_shrink_slab() coherent and valid?
-Dave.
--
Dave Chinner
da...@fromorbit.com
ker_put(shrinker);
> + wait_for_completion(>done);
> + }
Needs a comment explaining why we need to wait here...
> +
> down_write(_rwsem);
> if (shrinker->flags & SHRINKER_REGISTERED) {
> - list_del(>list);
> + /*
> + * Lookups on the shrinker are over and will fail in the future,
> + * so we can now remove it from the lists and free it.
> + */
rather than here after the wait has been done and provided the
guarantee that no shrinker is running or will run again...
-Dave.
--
Dave Chinner
da...@fromorbit.com
> unsigned long ret, freed = 0;
> - int i;
> + int offset, index = 0;
>
> if (!mem_cgroup_online(memcg))
> return 0;
> @@ -419,56 +470,63 @@ static unsigned long shrink_slab_memcg(gfp_t gfp_mask,
> int nid,
> if (unlikely(!inf
lowing cases.
> This commit uses the refcount+RCU method [5] proposed by Dave Chinner
> to re-implement the lockless global slab shrink. The memcg slab shrink is
> handled in the subsequent patch.
> ---
> include/linux/shrinker.h | 17 ++
>
obviously correct" that what we have
now.
> So not adding that super simple
> helper is not exactly the best choice in my opinion.
Each to their own - I much prefer the existing style/API over having
to go look up a helper function every time I want to check some
random shrinker has been set up correctly
-Dave.
--
Dave Chinner
da...@fromorbit.com
shrinker);
up_write(_rwsem);
if (debugfs_entry)
shrinker_debugfs_remove(debugfs_entry, debugfs_id);
kfree(shrinker->nr_deferred);
kfree(shrinker);
}
EXPORT_SYMBOL_GPL(shrinker_free);
--
Dave Chinner
da...@fromorbit.com
> + WARN_ON_ONCE(!(flags & XFS_ICHGTIME_CHG));
Make that an ASSERT(flags & XFS_ICHGTIME_CHG), please. There's no
need to verify this at runtime on production kernels.
-Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Jun 01, 2023 at 06:20:21AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 01, 2023 at 08:36:59AM +1000, Dave Chinner wrote:
> > We lose adjacent page merging with this change.
>
> This is only used for adding the first folio to a brand new bio,
> so there is nothing
h more performant when it only
has to do a single large DMA instead of (potentially) hundreds of
single page DMAs for the same amount of data...
What performance regression testing has been done on this change?
-Dave.
--
Dave Chinner
da...@fromorbit.com
t; - iomap->addr = iomap->offset;
> + if (WARN_ON_ONCE(iomap->offset >= isize)) {
> + iomap->type = IOMAP_HOLE;
> + iomap->addr = IOMAP_NULL_ADDR;
> + } else {
> + iomap->type = IOMAP_MAPPED;
> + iomap->addr = iomap->offset;
> + }
I think Christoph's code is correct. IMO, any attempt to read beyond
the end of the device should throw out a warning and return an
error, not silently return zeros.
If readahead is trying to read beyond the end of the device, then it
really seems to me like the problem here is readahead, not the iomap
code detecting the OOB read request
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
o do for directory
names) except for the xattrs that hold the encryption keys
themselves. That means the merkle tree blocks should get encrypted
without any extra work needing to be done anywhere. This will
simply require the fscrypt keys to be held in a private internal
xattr namespace that isn't encrypted, but that's realtively trivial
to do...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Apr 05, 2023 at 10:54:06PM +, Eric Biggers wrote:
> On Thu, Apr 06, 2023 at 08:26:46AM +1000, Dave Chinner wrote:
> > > We could certainly think about moving to a design where fs/verity/ asks
> > > the
> > > filesystem to just *read* a Mer
de doesn't work with data
structures that span multiple disjoint pages...
Another problem is that the xfs-buf might be backed by heap memory
(e.g. 4kB fs block size on 64kB PAGE_SIZE) and so it cannot be
treated like a page cache page by the fsverity merkle tree code. We
most definitely do not want to be passing pages containing heap
memory to functions expecting to be passed lru-resident page cache
pages
That said, xfs-bufs do have a stable method of addressing the data
in the buffers, and all the XFS code uses this to access and
manipulate data directly in the buffers.
That is, xfs_buf_offset() returns a mapped kaddr that points to the
contiguous memory region containing the metadata in the buffer. If
the xfs_buf spans multiple pages, it will return a kaddr pointing
into the contiguous vmapped memory address that maps the entire
buffer data range. If it is heap memory, it simply returns a pointer
into that heap memory. If it's a single page, then it returns the
kaddr for the data within the page.
If you move all the assumptions about how the merkle tree data is
managed out of fsverity and require the fielsystems to do the
mapping to kaddrs and reference counting to guarantee life times,
then the need for multiple different methods for reading merkle tree
data go away...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ey's new drop callout.
fsverity doesn't need to care what the buffer is made from, how it
is cached, what it's life cycle is, etc. The caching mechanism and
reference counting is entirely controlled by the filesystem callout
implementations, and fsverity only needs to deal with memory buffers
that are guaranteed to live for the entire walk of the merkle
tree
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
lly want the same memory buffer based
interface for the merkle tree reading so that the merkle tree code
can read directly from the xfs-buf rather than requiring us to copy
it out of the xfsbuf into temporary pages...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
more -- I suppose we shouldn't just go
> breaking directio reads from a verity file if we can help it. Is there
> a way to ask fsverity to perform its validation against some arbitrary
> memory buffer that happens to be fs-block aligned?
The memory buffer doesn't even need to be fs-block aligned - it just
needs to be a pointer to memory the kernel can read...
We also need fsverity to be able to handle being passed mapped
kernel memory rather than pages/folios for the merkle tree
interfaces. That way we can just pass it the mapped buffer memory
straight from the xfs-buf and we don't have to do the whacky "copy
from xattr xfs_bufs into pages so fsverity can take temporary
reference counts on what it thinks are page cache pages" as it walks
the merkle tree.
-Dave.
--
Dave Chinner
da...@fromorbit.com
I think they should get it automatically now that it has been
defined for FS_IOC_FSGETXATTR and added to the generic fileattr flag
fill functions in fs/ioctl.c.
> I'd also like all ways of getting the verity flag to continue to be mentioned
> in
> Documentation/filesystems/fsverity.rst. The existing methods (FS_IOC_GETFLAGS
> and statx) are already mentioned there.
*nod*
-Dave.
--
Dave Chinner
da...@fromorbit.com
p_page(page);
else
put_page(page);
}
And then you don't need to add the functions to each of the
filesystems nor make an indirect call just to run put_page().
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
with DIO write vs buffered
read, nor DIO write vs mmap IO. It's not totally clear to me
whether we need ->iomap_valid checks in the buffered read paths
to avoid the completion races with DIO writes, but there are windows
there where cached iomaps could be considered stale
Chee
c error handling from a
single caller?
If there's half a dozen cases that need this sort of handling, then
maybe it's the right thing to do. But for a single calling context
that only needs to add a null return check in one specific case?
There's absolutely no need to make generic infrastructure violate
layering abstractions to handle that...
-Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Jan 09, 2023 at 07:45:27PM +0100, Andreas Gruenbacher wrote:
> On Sun, Jan 8, 2023 at 10:59 PM Dave Chinner wrote:
> > On Sun, Jan 08, 2023 at 08:40:32PM +0100, Andreas Gruenbacher wrote:
> > > Eliminate the ->iomap_valid() handler by switching to a ->g
the iomap core needs to be able to formally verify
the iomap is valid at any point in time, not just at the point in
time a folio in the page cache has been locked...
-Dave.
--
Dave Chinner
da...@fromorbit.com
ing a big hole in it by requiring filesystems
to actually allocation page cache folios on behalf of the iomap
core.
Given that I recently got major push-back for fixing an XFS-only bug
by walking the page cache directly instead of abstracting it via the
iomap core, punching an even bigger hole in the abstraction layer to
fix a GFS2-only problem is just as bad
-Dave.
--
Dave Chinner
da...@fromorbit.com
On Fri, Dec 02, 2022 at 02:54:00AM +0100, Andreas Gruenbacher wrote:
> On Thu, Dec 1, 2022 at 10:30 PM Dave Chinner wrote:
> > On Thu, Dec 01, 2022 at 07:09:54PM +0100, Andreas Gruenbacher wrote:
> > > Hi again,
> > >
> > > [Same thing, but with t
s.
To avoid all this, can we simple move the ->page_done() callout in
the error path and iomap_write_end() to before we unlock the folio?
You've already done that for pagecache_isize_extended(), and I can't
see anything obvious in the gfs2 ->page_done callout that
would cause issues if it is called with a locked dirty folio...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
end trace ]---
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>
--
Dave Chinner
da...@fromorbit.com
On Thu, Nov 03, 2022 at 07:45:01PM -0700, Darrick J. Wong wrote:
> On Fri, Nov 04, 2022 at 11:32:35AM +1100, Dave Chinner wrote:
> > On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote:
> > > On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote:
> > > &
On Thu, Nov 03, 2022 at 03:28:05PM -0700, Vishal Moola wrote:
> On Wed, Oct 19, 2022 at 08:01:52AM +1100, Dave Chinner wrote:
> > On Thu, Sep 01, 2022 at 03:01:19PM -0700, Vishal Moola (Oracle) wrote:
> > > Converted function to use folios throughout. This is in preparation for
On Thu, Nov 03, 2022 at 09:38:48AM -0700, Vishal Moola wrote:
> On Thu, Nov 3, 2022 at 12:08 AM Dave Chinner wrote:
> >
> > On Wed, Nov 02, 2022 at 09:10:08AM -0700, Vishal Moola (Oracle) wrote:
> > > This patch series replaces find_get_pages_range_tag() with
>
e other changes (afs, ceph, cifs, gfs2) would be appreciated.
Same question as last time: have you tested this with multipage
folios enabled? If you haven't tested XFS, then I'm guessing the
answer is no, and you haven't fixed the bug I pointed out in
the write_cache_pages() implementation
-Dave.
--
Dave Chinner
da...@fromorbit.com
o ensure that it correctly handles multi-page folios. And
the only way you can do that fully at this point in time is via
testing XFS or AFS...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
error = (*writepage)(page, wbc, data);
> + error = writepage(>page, wbc, data);
Yet, IIUC, this treats all folios as if they are single page folios.
i.e. it passes the head page of a multi-page folio to a callback
that will treat it as a single PAGE_SIZE page, because that's all
the writepage callbacks are currently expected to be passed...
So won't this break writeback of dirty multipage folios?
-Dave.
--
Dave Chinner
da...@fromorbit.com
, btrfs and XFS), ->writepage no longer plays any part in memory
reclaim on their systems.
So why should we try to maintain the fiction that ->writepage is
required functionality in a filesystem when it clearly isn't?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Mar 02, 2022 at 02:03:28PM +0100, Andreas Gruenbacher wrote:
> On Wed, Mar 2, 2022 at 11:17 AM Filipe Manana wrote:
> > On Tue, Mar 01, 2022 at 09:38:30AM +1100, Dave Chinner wrote:
> > > On Mon, Feb 28, 2022 at 02:32:03PM +, fdman...@kernel.org wrote:
> >
ed the EOF boundary. Do this even
> if the read has IOCB_NOWAIT set, as it's the only way to avoid providing
> an unexpected result to an application.
That's highly specific and ultimately will be fragile, IMO. I'd much
prefer that *_iomap_begin_write() implementations simply follow
IOMAP_NOWAIT requirements to ensure that any DIO that needs multiple
mappings if punted to a context that can block...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote:
> On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote:
> > > Fair enough; on the other hand, we could also view this as making ext4
> > > more robust against buggy code in other subsystems, and while oth
operly first informing the file system in a context where it can
> block and potentially do I/O to do things like allocate blocks.
I'm not sure that replacing the BUG() with a warning is good enough
- it's still indicative of an application doing something dangerous
that could result in silent data corruption and/or other problems.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ng
> Signed-off-by: Darrick J. Wong
Looks like a straight translation of Christoph's original. Seems
fine to me as a simepl step towards preserving the git history.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
/fs/iomap/apply.c b/fs/iomap/iter.c
> similarity index 100%
> rename from fs/iomap/apply.c
> rename to fs/iomap/iter.c
LGTM,
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
J. Wong
> Signed-off-by: Darrick J. Wong
> ---
> fs/iomap/apply.c | 91
> -
> fs/iomap/trace.h | 40 --
> include/linux/iomap.h | 10 -
> 3 files changed, 141 deletions(-)
Looks good.
Re
ewed-by: Darrick J. Wong
> Signed-off-by: Darrick J. Wong
> ---
> fs/iomap/fiemap.c | 31 +--
> 1 file changed, 13 insertions(+), 18 deletions(-)
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
the older pre-disaggregation
fs/iomap.c without having to take the tree back in time to find
those files...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
_iterate() is fine as the function name - there's
no need for abbreviation here because it's not an overly long name.
This will makes it clearly different to the struct iomap_iter that
is passed to it and it will also make grep, cscope and other
code searching tools much more precise...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
y changing when both are used in the same function.
Would it be better to avoid any possible confusion simply by using
"iomi" for all iomap_iter variables throughout the patchset from the
start? That way nobody is going to confuse iov_iter with iomap_iter
iteration variables and code that
cratering. Hopefully this will only be a rare event.
So, to hoist myself on my own petard: correctness first, performance
second.
Acked-by: Dave Chinner
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Thu, Jul 09, 2020 at 10:10:38AM -0700, Darrick J. Wong wrote:
> On Thu, Jul 09, 2020 at 06:05:19PM +0100, Matthew Wilcox wrote:
> > On Thu, Jul 09, 2020 at 09:09:26AM -0700, Darrick J. Wong wrote:
> > > On Thu, Jul 09, 2020 at 12:25:27PM +1000,
On Wed, Jul 08, 2020 at 02:54:37PM +0100, Matthew Wilcox wrote:
> On Wed, Jul 08, 2020 at 04:51:27PM +1000, Dave Chinner wrote:
> > On Tue, Jul 07, 2020 at 03:00:30PM +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 07, 2020 at 01:57:05PM +0100, Matthew Wilcox wrote:
> > &
ed by unpredicably
switching between direct IO and buffered IO (e.g. suddening DIO
writes serialise -all IO-) will cause unacceptible performance
regressions for many applications and be -very difficult to
diagnose- in production systems.
IOWs, we need to let the individual filesystems decide how they want
to use the page cache for direct IO. Just because we have new direct
IO infrastructure (i.e. iomap) it does not mean we can just make
wholesale changes to the direct IO path behaviour...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Tue, Feb 18, 2020 at 10:04:15PM -0800, Matthew Wilcox wrote:
> On Wed, Feb 19, 2020 at 02:29:00PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:46:11AM -0800, Matthew Wilcox wrote:
> > > @@ -418,6 +412,15 @@ iomap_readpages_actor(struct inode *inode, loff_t
> &
On Tue, Feb 18, 2020 at 07:48:32PM -0800, Matthew Wilcox wrote:
> On Wed, Feb 19, 2020 at 02:45:25PM +1100, Dave Chinner wrote:
> > On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote:
> > > On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote:
> > > &
On Wed, Feb 19, 2020 at 08:26:52AM +1100, Dave Chinner wrote:
> On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote:
> > > Latest version in your git tree:
> > >
> > > $ ▶ glo -n 5 w
if (!ctx->cur_page_in_bio)
unlock_page(ctx->cur_page);
put_page(ctx->cur_page);
ctx->cur_page = NULL;
readahead_next(ctx->rac);
}
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ctx->cur_page = NULL;
> + }
> }
>
> return done;
> @@ -451,11 +454,7 @@ iomap_readpages(struct address_space *mapping, struct
> list_head *pages,
> done:
> if (ctx.bio)
> submit_bio(ctx.bio);
> - if (ctx.cur_page) {
> -
id of the get_page() call in fuse_readpages_fill().
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks OK.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
ext4.
I'll re-introduce the patch and see if it falls over again.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
file changed, 9 insertions(+), 20 deletions(-)
Looks fine.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
---
> fs/erofs/zdata.c | 2 +-
> include/trace/events/erofs.h | 6 +++---
> 3 files changed, 18 insertions(+), 29 deletions(-)
Looks fine from the perspective of page iteration and error handling.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
On Tue, Feb 18, 2020 at 01:12:28PM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 05:57:58PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:59AM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)"
&g
On Tue, Feb 18, 2020 at 11:54:04AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 05:31:10PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:56AM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)"
> > >
> &g
On Tue, Feb 18, 2020 at 08:10:04AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 05:21:47PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:54AM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)"
> > >
> > > Th
On Tue, Feb 18, 2020 at 07:42:22AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 05:14:59PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:52AM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)"
> > >
> > >
On Tue, Feb 18, 2020 at 05:57:36AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 04:08:24PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:45AM -0800, Matthew Wilcox wrote:
> > > From: "Matthew Wilcox (Oracle)"
> > >
> > > Mov
On Tue, Feb 18, 2020 at 05:56:18AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 04:03:00PM +1100, Dave Chinner wrote:
> > On Mon, Feb 17, 2020 at 10:45:44AM -0800, Matthew Wilcox wrote:
> > > +static void read_pages(struct readahead_control *rac, struct list_
On Tue, Feb 18, 2020 at 05:42:30AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 18, 2020 at 03:56:33PM +1100, Dave Chinner wrote:
> > Latest version in your git tree:
> >
> > $ ▶ glo -n 5 willy/readahead
> > 4be497096c04 mm: Use memalloc_nofs_save in readahead path
>
; + break;
> + }
> +
> + return batch;
> +}
Seems a bit big for an inline function.
> +
> +#define readahead_for_each_batch(rac, array, size, nr)
> \
> + for (; (nr = readahead_page_batch(rac, array, size)); \
> + readahead_next(rac))
I had to go look at the caller to work out what "size" refered to
here.
This is complex enough that it needs proper API documentation.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ed, 73 insertions(+), 126 deletions(-)
That's actually pretty simple changeover. Nothing really scary
there. :)
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
st certainly not the function you want to call.
Use page_cache_async_readahead or page_cache_sync_readahead()
instead."
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ahead.c
> index 9e430daae42f..975ff5e387be 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -121,7 +121,13 @@ static void read_pages(struct readahead_control *rac,
> struct list_head *pages)
>
> blk_start_plug();
>
> - if (aops->readpages) {
> + if (aops->readahead) {
> + aops->readahead(rac);
> + readahead_for_each(rac, page) {
> + unlock_page(page);
> + put_page(page);
> + }
This needs a comment to explain the unwinding that needs to be done
here. I'm not going to remember in a year's time that this is just
for the pages that weren't submitted by ->readahead
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
_list) {
> + page->index = offset;
> + list_add(>lru, _pool);
> + } else if (add_to_page_cache_lru(page, mapping, offset,
> + gfp_mask) < 0) {
> + put_page(page);
> + goto read;
> + }
Ok, so that's why you put read code at the end of the loop. To turn
the code into spaghetti :/
How much does this simplify down when we get rid of ->readpages and
can restructure the loop? This really seems like you're trying to
flatten two nested loops into one by the use of goto
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
-off-by: Matthew Wilcox (Oracle)
> ---
> mm/readahead.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
Looks fine.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
t;
> Signed-off-by: Matthew Wilcox (Oracle)
> ---
> mm/readahead.c | 10 ++
> 1 file changed, 6 insertions(+), 4 deletions(-)
Looks ok, but having the readahead dispatch out of line from the
case that triggers it makes it hard to follow.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Also, why? This adds a goto from branched code that continues, then
adds a continue so the unbranched code doesn't execute the code the
goto jumps to. In absence of any explanation, this isn't an
improvement and doesn't make any sense...
--
Dave Chinner
da...@fromorbit.com
the next
>* batch.
>*/
> - if (nr_pages)
> - read_pages(mapping, filp, _pool, nr_pages,
> - gfp_mask);
> - nr_pages = 0;
> + if (readahead_count())
> + read_pages(, _pool, gfp_mask);
> + rac._nr_pages = 0;
Hmmm. Wondering ig it make sense to move the gfp_mask to the readahead
control structure - if we have to pass the gfp_mask down all the
way along side the rac, then I think it makes sense to do that...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
e2/0xfa
[2.479776] ret_from_fork+0x1f/0x30
[2.480737] ---[ end trace e77079de9b22dc6a ]---
I just dropped the ext4 conversion from my local tree so I can boot
the machine and test XFS. Might have some more info when that
crashes and burns...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
Reviewed-by: Christoph Hellwig
> ---
> mm/readahead.c | 8 ++--
> 1 file changed, 2 insertions(+), 6 deletions(-)
Simple enough.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
nd we don't need to worry that a present page in the readahead
> window causes us to return a smaller nr_pages than we ought to have.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good.
Reviewed-by: Dave Chinner
--
Dave Chinner
da...@fromorbit.com
On Tue, Feb 11, 2020 at 04:54:13AM -0800, Matthew Wilcox wrote:
> On Tue, Feb 11, 2020 at 03:52:30PM +1100, Dave Chinner wrote:
> > > +struct readahead_control {
> > > + struct file *file;
> > > + struct address_space *mapping;
> > > +/* private
r_pages(page);
> + rac->start += rac->batch_count;
There's no mention of large page support in the patch description
and I don't recall this sort of large page batching in previous
iterations.
This seems like new functionality to me, not directly related to
the initial ->readahead API change? What have I missed?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
On Mon, Feb 03, 2020 at 06:46:41PM +0100, Christoph Hellwig wrote:
> On Sat, Jan 18, 2020 at 08:28:38PM +1100, Dave Chinner wrote:
> > I think it's pretty gross, actually. It makes the same mistake made
> > with locking in the old direct IO code - it encodes specific lock
> >
s = 2, left = 1, this looks up the
page at index 2, which is the one we issued IO on, not the one we
"left behind" which is at index 3.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ystem wants/needs unlock on IO competion
semantics, and it's completely filesystem IO-lock implementation
agnostic. And for filesystems that use the inode i_rwsem, we can
just provide simple helper functions for their read/write unlock
methods.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
tion. GFS2 is already partially ported to
use the iomap infrastructure, though more work is needed to provide
the iomap functionality DAX requires. OCFS2 would require a lot more
work on this front
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
On Fri, Sep 06, 2019 at 11:48:31PM +0200, Andreas Gruenbacher wrote:
> On Fri, Sep 6, 2019 at 11:28 PM Dave Chinner wrote:
> > On Fri, Sep 06, 2019 at 10:52:41PM +0200, Andreas Gruenbacher wrote:
> > > Hi,
> > >
> > > I've just fixed a mmap write v
);
See xfs_flush_unmap_range(), which is run under XFS_MMAPLOCK_EXCL
to serialise against incoming page faults...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
al conversion - what needs to be done to iomap and gfs2
to support the journalled data path so there's a single data IO
path?
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
the same time some other operation is
changing the size of the file, and that means this code no longer
does what you are intending it to do because the inode->i_size is no
longer constant across these operations...
Hence I think adding code that depends on i_rwsem to be held to
function correctly is the wrong direction to be taking the iomap
infrastructure.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
ts for the iomap
and XFS branches - IMO, there's no really need to have a complete
new tree for it...
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
p granularity, rather than a per-syscall granularity.
i.e. if we do write(2GB), we want more than one balancing call
during that syscall, so it woul dbe up to the filesystem to a) limit
the size of write mappings to something smaller (e.g. 1024 pages)
so that there are still frequent balancing calls for large writes.
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
l
bufferhead free iomap IO path
Let's see what Christoph thinks first, though
Cheers,
Dave.
--
Dave Chinner
da...@fromorbit.com
hole series (all the mm changes and all the
| required fs changes) sent out for review prior to the merge window?
We're not asking for a description of what you are doing - we are
asking why the normal processes for proposing and merging such a
change is not being followed, and how you plan to rect
is made stable, so must
be the directory modification done during file creation.
This has nothing to do with POSIX or what the "linux standard" is -
this is testing whether the implementation of strictly ordered
metadata journalling is correct or not. If gfs2 does not have
strictly ordered metadata journalling, then it probably shouldn't
run these tests
Cheers,
Dave.
--
Dave Chinner
dchin...@redhat.com
1 - 100 of 141 matches
Mail list logo