On Wed, May 09, 2018 at 09:47:58AM +0200, Christoph Hellwig wrote:
> +/**
> + * __bio_try_merge_page - try adding data to an existing bvec
> + * @bio: destination bio
> + * @page: page to add
> + * @len: length of the range to add
> + * @off: offset into @page
> + *
> + * Try adding the data descri
On Wed, May 09, 2018 at 09:47:59AM +0200, Christoph Hellwig wrote:
> }
> EXPORT_SYMBOL(generic_write_end);
>
> +
> /*
Spurious?
On Wed, May 09, 2018 at 09:48:03AM +0200, Christoph Hellwig wrote:
> It counts the number of pages acted on, so name it nr_pages to make that
> obvious.
>
> Signed-off-by: Christoph Hellwig
Yes!
Also, it can't return an error, so how about changing it to unsigned int?
And deleting the error che
On Wed, May 09, 2018 at 09:48:04AM +0200, Christoph Hellwig wrote:
> That way file systems don't have to go spotting for non-contiguous pages
> and work around them. It also kicks off I/O earlier, allowing it to
> finish earlier and reduce latency.
Makes sense.
> + /*
> +
On Thu, Oct 18, 2018 at 09:18:12PM +0800, Ming Lei wrote:
> Hi,
>
> Filesystems may allocate io buffer from slab, and use this buffer to
> submit bio. This way may break storage drivers if they have special
> requirement on DMA alignment.
Before we go down this road, could we have a discussion ab
On Thu, Oct 18, 2018 at 04:05:51PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 18, 2018 at 07:03:42AM -0700, Matthew Wilcox wrote:
> > Before we go down this road, could we have a discussion about what
> > hardware actually requires this? Storage has this weird assumption that
&
On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote:
> This all seems quite complicated.
>
> I think the interface we'd want is more one that has a little
> cache of a single page in the queue, and a little bitmap which
> sub-page size blocks of it are used.
>
> Something like (pseu
On Tue, Feb 19, 2019 at 02:20:26PM +0100, Jan Kara wrote:
> Thanks for information. Yeah, that makes somewhat more sense. Can you ever
> see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your
> findings still seem to indicate that there' some problem with page
> migration and Alph
[adding linux-nvme and linux-block for opinions on the critical-page-first
idea in the second and third paragraphs below]
On Wed, Feb 20, 2019 at 07:07:29AM -0700, William Kucharski wrote:
> > On Feb 20, 2019, at 6:44 AM, Matthew Wilcox wrote:
> > That interface would need to ha
On Wed, Feb 20, 2019 at 09:39:22AM -0700, Keith Busch wrote:
> On Wed, Feb 20, 2019 at 06:43:46AM -0800, Matthew Wilcox wrote:
> > What NVMe doesn't have is a way for the host to tell the controller
> > "Here's a 2MB sized I/O; bytes 40960 to 45056 are most importa
On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote:
> > Or what is the exact size of sub-page IO in xfs most of time? For
>
> Determined by mkfs parameters. Any power of 2 between 512 bytes and
> 64kB needs to be supported. e.g:
>
> # mkfs.xfs -s size=512 -b size=1k -i size=2k -n size=8
On Tue, Feb 26, 2019 at 07:12:49PM +0800, Ming Lei wrote:
> On Tue, Feb 26, 2019 at 6:07 PM Vlastimil Babka wrote:
> > On 2/26/19 10:33 AM, Ming Lei wrote:
> > > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote:
> > >> On Mon, Feb 25, 2019 at 07:27:37
On Tue, Feb 26, 2019 at 08:35:46PM +0800, Ming Lei wrote:
> On Tue, Feb 26, 2019 at 04:12:09AM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 26, 2019 at 07:12:49PM +0800, Ming Lei wrote:
> > > The buffer needs to be device block size aligned for dio, and now the
> > >
On Tue, Feb 26, 2019 at 09:42:48PM +0800, Ming Lei wrote:
> On Tue, Feb 26, 2019 at 05:02:30AM -0800, Matthew Wilcox wrote:
> > Wait, we're imposing a ridiculous amount of complexity on XFS for no
> > reason at all? We should just change this to 512-byte alignment. Tying
>
On Tue, Feb 26, 2019 at 08:14:33AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 26, 2019 at 06:04:40AM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 26, 2019 at 09:42:48PM +0800, Ming Lei wrote:
> > > On Tue, Feb 26, 2019 at 05:02:30AM -0800, Matthew Wilcox wrote:
> > &g
On Fri, Feb 22, 2019 at 09:45:05AM -0700, Keith Busch wrote:
> On Thu, Feb 21, 2019 at 09:51:12PM -0500, Martin K. Petersen wrote:
> >
> > Keith,
> >
> > > With respect to fs block sizes, one thing making discards suck is that
> > > many high capacity SSDs' physical page sizes are larger than the
Replace the IDR with the XArray. Includes converting the lookup from
being protected by a spinlock to being protected by RCU.
Signed-off-by: Matthew Wilcox
---
block/genhd.c | 42 --
1 file changed, 16 insertions(+), 26 deletions(-)
diff --git a/block
Use xa_insert_irq() to do the allocation before grabbing the other
locks. This user appears to be able to race, so use xa_cmpxchg() to
handle the race effectively.
Signed-off-by: Matthew Wilcox
---
block/blk-ioc.c | 23 +--
include/linux/iocontext.h | 6 +++---
2
Signed-off-by: Matthew Wilcox
---
block/bsg.c | 20 ++--
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/block/bsg.c b/block/bsg.c
index f306853c6b08..e24420a21383 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -16,9 +16,9 @@
#include
#include
#include
Convert brd_pages from a radix tree to an XArray. Simpler and smaller
code; in particular another user of radix_tree_preload is eliminated.
Signed-off-by: Matthew Wilcox
---
drivers/block/brd.c | 93 ++---
1 file changed, 28 insertions(+), 65 deletions
ry().
- idr_replace() has no exact equivalent. Some users relied on its exact
semantics of only storing if the entry was non-NULL, but all users of
idr_replace() were able to use xa_store() or xa_cmpxchg().
- The family of radix tree gang lookup functions have been replaced with
xa_extract().
We can now take and release the blkg_array lock within blkg_destroy()
instead of forcing the caller to hold it across the call.
Signed-off-by: Matthew Wilcox
---
block/blk-cgroup.c | 9 +
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk
t fail, so we can remove the error checks.
Signed-off-by: Matthew Wilcox
---
block/bfq-cgroup.c | 4 +--
block/blk-cgroup.c | 69 --
include/linux/blk-cgroup.h | 5 ++-
3 files changed, 33 insertions(+), 45 deletions(-)
diff --git a/block/bfq-
Use the XArray's iterator instead of this hlist.
Signed-off-by: Matthew Wilcox
---
block/blk-ioc.c | 15 ++-
include/linux/iocontext.h | 16 +---
2 files changed, 11 insertions(+), 20 deletions(-)
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
Signed-off-by: Matthew Wilcox
---
drivers/block/drbd/drbd_int.h | 4 +-
drivers/block/drbd/drbd_main.c | 25 +++--
drivers/block/drbd/drbd_nl.c | 35 +-
drivers/block/drbd/drbd_receiver.c | 29 ---
drivers/block/drbd/drbd_state.c| 59
We can iterate over all blkcgs using the XArray iterator instead of
maintaining a separate hlist. This removes a nasty locking inversion
in blkcg_destroy_blkgs().
Signed-off-by: Matthew Wilcox
---
block/bfq-cgroup.c | 3 ++-
block/blk-cgroup.c | 38
Signed-off-by: Matthew Wilcox
---
drivers/block/nbd.c | 145 ++--
1 file changed, 59 insertions(+), 86 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 90ba9f4c03f3..6e64884973dd 100644
--- a/drivers/block/nbd.c
+++ b/drivers
Signed-off-by: Matthew Wilcox
---
drivers/block/drbd/drbd_debugfs.c | 16 -
drivers/block/drbd/drbd_int.h | 6 ++--
drivers/block/drbd/drbd_main.c | 56 ++
drivers/block/drbd/drbd_nl.c | 42 +++---
drivers/block/drbd
Signed-off-by: Matthew Wilcox
---
drivers/block/zram/zram_drv.c | 40 +--
1 file changed, 15 insertions(+), 25 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e7a5f1d1c314..f7e53a681637 100644
--- a/drivers/block/zram
By changing the locking we could remove the slightly awkward dance in
null_insert_page(), but I'll leave that for someone who's more familiar
with the driver.
Signed-off-by: Matthew Wilcox
---
drivers/block/null_blk.h | 4 +-
drivers/block/null_blk_m
Signed-off-by: Matthew Wilcox
---
drivers/block/loop.c | 88
1 file changed, 31 insertions(+), 57 deletions(-)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 1e6edd568214..d1a0f689788d 100644
--- a/drivers/block/loop.c
+++ b/drivers
On Thu, Apr 11, 2019 at 02:52:08PM +0200, Vlastimil Babka wrote:
> In the session I hope to resolve the question whether this is indeed the
> right thing to do for all kmalloc() users, without an explicit alignment
> requests, and if it's worth the potentially worse
> performance/fragmentation it w
On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote:
> In continuation of the fsync barrier patch from the other day, I
> reworked that patch to turn it into a general primitive instead. This
> means that any command can be flagged with IOSQE_IO_DRAIN, which will
> insert a sequence point in
On Thu, Apr 11, 2019 at 06:28:19AM -0700, Matthew Wilcox wrote:
> On Thu, Apr 11, 2019 at 02:52:08PM +0200, Vlastimil Babka wrote:
> > In the session I hope to resolve the question whether this is indeed the
> > right thing to do for all kmalloc() users, without an explicit alignme
--end quoted text---
If Reviewed-by dominates Suggested-by, then
Reviewed-by: Matthew Wilcox
e buffers. That
> will be sufficient for this case, and help other callers detect
> these errors properly as well.
>
> With that, we don't need to twiddle it in ext2.
>
> Suggested-by: Jan Kara
> Signed-off-by: Jeff Layton
> Reviewed-by: Christoph Hellwig
> Reviewed-by: Jan Kara
Reviewed-by: Matthew Wilcox
On Tue, May 09, 2017 at 11:49:16AM -0400, Jeff Layton wrote:
> +++ b/lib/errseq.c
> @@ -0,0 +1,199 @@
> +#include
> +#include
> +#include
> +#include
> +
> +/*
> + * An errseq_t is a way of recording errors in one place, and allowing any
> + * number of "subscribers" to tell whether it has chan
On Mon, Jun 26, 2017 at 08:09:59PM +0800, Ming Lei wrote:
> bio_for_each_segment_all(bvec, bio, i) {
> - org_vec = bio_orig->bi_io_vec + i + start;
> -
> - if (bvec->bv_page == org_vec->bv_page)
> - continue;
> + orig_vec = bio_iter_iove
On Mon, Nov 12, 2018 at 04:28:40AM -0800, Greg Kroah-Hartman wrote:
> On Mon, Nov 12, 2018 at 10:56:21AM +0100, Paolo Valente wrote:
> > From: Angelo Ruocco
> >
> > The kernfs pseudo file system doesn't export any function to only find
> > a node by name, without also getting a reference on it.
>
On Tue, Jan 08, 2019 at 09:56:29AM -0700, Jens Axboe wrote:
> After some arm twisting from Christoph, I finally caved and divorced the
> aio-poll patches from aio/libaio itself. The io_uring interface itself
> is useful and efficient, and after rebasing all the new goodies on top
> of that, there w
On Mon, Dec 04, 2017 at 02:16:19PM +0900, Byungchul Park wrote:
> For now, wait_for_completion() / complete() works with lockdep, add
> lock_page() / unlock_page() and its family to lockdep support.
>
> Changes from v1
> - Move lockdep_map_cross outside of page_ext to make it flexible
> - Preven
On Tue, Dec 05, 2017 at 03:19:46PM +0900, Byungchul Park wrote:
> On 12/5/2017 2:46 PM, Byungchul Park wrote:
> > On 12/5/2017 2:30 PM, Matthew Wilcox wrote:
> > > On Mon, Dec 04, 2017 at 02:16:19PM +0900, Byungchul Park wrote:
> > > > For now, wait_for_completion() /
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote:
> On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote:
> > The *problem* is false positives, since locks and waiters in
> > kernel are not classified properly, at the moment, which is just
> > a fact that is not related to cr
On Fri, Dec 29, 2017 at 04:28:51PM +0900, Byungchul Park wrote:
> On Thu, Dec 28, 2017 at 10:51:46PM -0500, Theodore Ts'o wrote:
> > On Fri, Dec 29, 2017 at 10:47:36AM +0900, Byungchul Park wrote:
> > >
> > >(1) The best way: To classify all waiters correctly.
> >
> > It's really not all wait
On Sat, Dec 30, 2017 at 10:40:41AM -0500, Theodore Ts'o wrote:
> On Fri, Dec 29, 2017 at 10:16:24PM -0800, Matthew Wilcox wrote:
> > > The problems come from wrong classification. Waiters either classfied
> > > well or invalidated properly won't bitrot.
> >
&g
On Sat, Dec 30, 2017 at 06:00:57PM -0500, Theodore Ts'o wrote:
> On Sat, Dec 30, 2017 at 05:40:28PM -0500, Theodore Ts'o wrote:
> > On Sat, Dec 30, 2017 at 12:44:17PM -0800, Matthew Wilcox wrote:
> > >
> > > I'm not sure I agree with this part. Wha
I see the improvements that Facebook have been making to the nbd driver,
and I think that's a wonderful thing. Maybe the outcome of this topic
is simply: "Shut up, Matthew, this is good enough".
It's clear that there's an appetite for userspace block devices; not for
swap devices or the root dev
On Wed, Jan 17, 2018 at 10:49:24AM +0800, Ming Lei wrote:
> Userfaultfd might be another choice:
>
> 1) map the block LBA space into a range of process vm space
That would limit the size of a block device to ~200TB (with my laptop's
CPU). That's probably OK for most users, but I suspect there ar
On Mon, Jan 22, 2018 at 08:28:54PM -0700, Jens Axboe wrote:
> On 1/22/18 8:18 PM, Goldwyn Rodrigues wrote:
> >> that their application was "already broken". I'd hate for a kernel
> >> upgrade to break them.
> >>
> >> I do wish we could make the change, and maybe we can. But it probably
> >> needs s
From: Jeff Layton [mailto:jlay...@poochiereds.net]
> On Thu, 2017-06-29 at 10:11 -0700, Darrick J. Wong wrote:
> > On Thu, Jun 29, 2017 at 09:19:48AM -0400, jlay...@kernel.org wrote:
> > > +Handling errors during writeback
> > > +
> > > +Most applications that utiliz
On Tue, Oct 03, 2017 at 10:05:11PM +0200, Luis R. Rodriguez wrote:
> On Wed, Oct 04, 2017 at 03:33:01AM +0800, Ming Lei wrote:
> > On Tue, Oct 03, 2017 at 11:53:08AM -0700, Luis R. Rodriguez wrote:
> > > INFO: task kworker/u8:8:1320 blocked for more than 10 seconds.
> > > Tainted: G
On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote:
> I have a patchset that creates a vsprintf extension for
> print_vma_addr and removes all the uses similar to the
> print_symbol() removal.
>
> This now avoids any possible printk interleaving.
>
> Unfortunately, without some #ifdef in
On Mon, Mar 26, 2018 at 04:16:26PM -0700, Omar Sandoval wrote:
> Even after the previous patch to drop lo_ctl_mutex while calling
> vfs_getattr(), there are other cases where we can end up sleeping for a
> long time while holding lo_ctl_mutex. Let's avoid the uninterruptible
> sleep from the ioctls
On Wed, Mar 28, 2018 at 03:18:52PM +0200, David Sterba wrote:
> On Mon, Mar 26, 2018 at 05:04:21PM -0700, Matthew Wilcox wrote:
> > On Mon, Mar 26, 2018 at 04:16:26PM -0700, Omar Sandoval wrote:
> > > Even after the previous patch to drop lo_ctl_mutex while calling
> > &g
Please explain:
commit 6a15674d1e90917f1723a814e2e8c949000440f7
Author: Bart Van Assche
Date: Thu Nov 9 10:49:54 2017 -0800
block: Introduce blk_get_request_flags()
A side effect of this patch is that the GFP mask that is passed to
several allocation functions in the legacy b
On Sun, Apr 08, 2018 at 04:40:59PM +, Bart Van Assche wrote:
> __GFP_KSWAPD_RECLAIM wasn't stripped off on purpose for non-atomic
> allocations. That was an oversight.
OK, good.
> Do you perhaps want me to prepare a patch that makes blk_get_request() again
> respect the full gfp mask passed
On Mon, Apr 09, 2018 at 01:26:50AM -0700, Christoph Hellwig wrote:
> On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote:
> > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop
> > the 'flags' argument completely?
> > Looks a bit pointless to me, having two arguments
On Mon, Apr 09, 2018 at 05:39:15PM +0200, Christoph Hellwig wrote:
> Same numerical value (for now at least), but a much better documentation
> of intent.
> @@ -499,7 +499,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk
> *disk, fmode_t mode,
> break;
> }
>
>
On Mon, Apr 09, 2018 at 05:39:16PM +0200, Christoph Hellwig wrote:
> blk_get_request is used for pass-through style I/O and thus doesn't need
> GFP_NOIO.
Obviously GFP_KERNEL is a big improvement over GFP_NOIO! But can we take
it all the way to GFP_USER, if this is always done in the ioctl path
(
On Wed, Apr 18, 2018 at 12:34:03PM +0200, Christoph Hellwig wrote:
> s/blk/block/ for block patches.
I think this is something we should put in MAINTAINERS. Eventually
some tooling can pull it out, but I don't think this is something
that people can reasonably be expected to know.
diff --git a/
On Thu, Apr 19, 2018 at 10:38:25AM -0400, Jerome Glisse wrote:
> Oh can i get one more small slot for fs ? I want to ask if they are
> any people against having a callback everytime a struct file is added
> to a task_struct and also having a secondary array so that special
> file like device file c
On Thu, Apr 19, 2018 at 03:31:08PM -0400, Jerome Glisse wrote:
> > > Basicly i want a callback in __fd_install(), do_dup2(), dup_fd() and
> > > add void * *private_data; to struct fdtable (also a default array to
> > > struct files_struct). The callback would be part of struct
> > > file_operation
On Thu, Apr 19, 2018 at 04:15:02PM -0400, Jerome Glisse wrote:
> On Thu, Apr 19, 2018 at 12:56:37PM -0700, Matthew Wilcox wrote:
> > > Well scratch that whole idea, i would need to add a new array to task
> > > struct which make it a lot less appealing. Hence a better solutio
I hate renting unnecessary cars, and the various transportation companies
offer a better deal if multiple people book at once.
I'm scheduled to arrive on Sunday at 3:18pm local time if anyone wants to
share transport. Does anyone have a wiki we can use to coordinate this?
On Thu, May 03, 2018 at 12:05:14PM -0400, Jeff Layton wrote:
> On Thu, 2018-05-03 at 16:42 +0200, Jan Kara wrote:
> > On Wed 25-04-18 17:07:48, Fabiano Rosas wrote:
> > > I'm looking into an issue where removing a virtio disk via sysfs while
> > > another
> > > process is issuing write() calls res
On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote:
> If we want to avoid bloating struct kiocb, I suggest we turn the private
> field
> into a union of the private and ki_ioprio field. It seems like the users of
> the private field all use it at a point where we can yank th
On Thu, May 03, 2018 at 02:24:58PM -0600, Jens Axboe wrote:
> On 5/3/18 2:15 PM, Adam Manzanares wrote:
> > On 5/3/18 11:33 AM, Matthew Wilcox wrote:
> >> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need
> >> 32 bits of ki_hint. (currently de
From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of Tejun Heo
> Ah, yeah, great to see the silly implementation being replaced the
> radix tree. ida_pre_get() looks suspicious tho. idr_preload()
> immedicately being followed by idr_preload_end() probably is broken.
> Maybe what we need is movi
From: Andrew Morton [mailto:a...@linux-foundation.org]
> On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes
> wrote:
> > TL;DR: these patches save 250 KB of memory, with more low-hanging
> > fruit ready to pick.
> >
> > While browsing through the lib/idr.c code, I noticed that the code at
> > the
From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> On Fri, Dec 16 2016, Matthew Wilcox wrote:
> > Thanks for your work on this; you've really put some effort into
> > proving your work has value. My motivation was purely aesthetic, but
> > you've got some g
From: Matthew Wilcox
> From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> > This sounds good. I think there may still be a lot of users that never
> > allocate more than a handful of IDAs, making a 128 byte allocation still
> > somewhat excessive. One thing I consi
From: Matthew Wilcox
> From: Matthew Wilcox
> > Heh, I was thinking about that too. The radix tree supports "exceptional
> > entries" which have the bottom bit set. On a 64-bit machine, we could use
> 62
> > of the bits in the radix tree root to store the ID
From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk]
> Nice work! A few random comments/questions:
>
> - It does add some complexity, but I think a few comments would make it
> more digestable.
I'm open to adding some comments ... I need some time between writing the code
and writing the c
On Thu, Jan 26, 2017 at 02:57:44PM +0300, Kirill A. Shutemov wrote:
> This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8.
>
> After conversion of huge tmpfs to multi-order entries, we don't need
> this anymore.
Yay! Reviewed-by: Matthew Wilcox
--
To unsubscribe fro
On Thu, Jan 26, 2017 at 02:57:48PM +0300, Kirill A. Shutemov wrote:
> For filesystems that wants to be write-notified (has mkwrite), we will
> encount write-protection faults for huge PMDs in shared mappings.
>
> The easiest way to handle them is to clear the PMD and let it refault as
> wriable.
On Thu, Jan 26, 2017 at 02:57:43PM +0300, Kirill A. Shutemov wrote:
> +++ b/include/linux/pagemap.h
> @@ -332,6 +332,15 @@ static inline struct page *grab_cache_page_nowait(struct
> address_space *mapping,
> mapping_gfp_mask(mapping));
> }
>
> +static inline struct page *f
On Thu, Jan 26, 2017 at 02:57:45PM +0300, Kirill A. Shutemov wrote:
> These flags are in use for filesystems with backing storage: PG_error,
> PG_writeback and PG_readahead.
Oh ;-) Then I amend my comment on patch 1 to be "patch 3 needs to go
ahead of patch 1" ;-)
> Signed-off-by: Kirill A. Shut
as
> wriable.
>
> Signed-off-by: Kirill A. Shutemov
> Reviewed-by: Jan Kara
Reviewed-by: Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:46PM +0300, Kirill A. Shutemov wrote:
> Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps.
> It indicates how many times we allocate and map file THP.
>
> Signed-off-by: Kirill A. Shutemov
Reviewed-by: Matthew Wilcox
think it's correct, but it still looks weird.
Reviewed-by: Matthew Wilcox
ps->readpage(file, page);
}
put_page(page);
} while (ret == AOP_TRUNCATED_PAGE);
But ... maybe it's OK to retry the huge page. I mean, not many
filesystems return AOP_TRUNCATED_PAGE, and they only do so rarely.
Anyway, I'm fine with the patch going in as-is. I just wanted to type out
my review notes.
Reviewed-by: Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> +++ b/mm/filemap.c
> @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file *filp,
> loff_t *ppos,
> if (unlikely(page == NULL))
> goto no_cached_page;
>
utemov
Reviewed-by: Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:52PM +0300, Kirill A. Shutemov wrote:
> @@ -405,9 +405,14 @@ static int __filemap_fdatawait_range(struct
> address_space *mapping,
> if (page->index > end)
> continue;
>
> + page = compound_head
On Thu, Jan 26, 2017 at 02:57:53PM +0300, Kirill A. Shutemov wrote:
> Most page cache allocation happens via readahead (sync or async), so if
> we want to have significant number of huge pages in page cache we need
> to find a ways to allocate them from readahead.
>
> Unfortunately, huge pages doe
On Thu, Feb 09, 2017 at 05:23:31PM -0700, Andreas Dilger wrote:
> On Feb 9, 2017, at 4:34 PM, Matthew Wilcox wrote:
> > Well ... what if we made readahead 2 hugepages in size for inodes which
> > are using huge pages? That's only 8x our current readahead window, and
>
On Thu, Jan 26, 2017 at 02:57:55PM +0300, Kirill A. Shutemov wrote:
> We writeback whole huge page a time. Let's adjust iteration this way.
>
> Signed-off-by: Kirill A. Shutemov
I think a lot of the complexity in this patch is from pagevec_lookup_tag
giving you subpages rather than head pages...
ointers on x86-64 -- 'arr' is allocated with kmalloc() for
> huge pages.
>
> Signed-off-by: Kirill A. Shutemov
Reviewed-by: Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:54PM +0300, Kirill A. Shutemov wrote:
> Do not assume length of bio segment is never larger than PAGE_SIZE.
> With huge pages it's HPAGE_PMD_SIZE (2M on x86-64).
I don't think we even need hugepages for BRD to be buggy. I think there are
already places which allocate
On Thu, Jan 26, 2017 at 02:57:58PM +0300, Kirill A. Shutemov wrote:
> We want mmap(NULL) to return PMD-aligned address if the inode can have
> huge pages in page cache.
>
> Signed-off-by: Kirill A. Shutemov
Reviewed-by: Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:57PM +0300, Kirill A. Shutemov wrote:
> Slab pages can be compound, but we shouldn't threat them as THP for
> pupose of hpage_* helpers, otherwise it would lead to confusing results.
>
> For instance, ext4 uses slab pages for journal pages and we shouldn't
> confuse t
On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote:
> No. pagecache_get_page() returns subpage. See description of the first
> patch.
Your description says:
> We also change interface for page-cache lookup function:
>
> - functions that lookup for pages[1] would return subpages
On Mon, Feb 13, 2017 at 08:01:17AM -0800, Matthew Wilcox wrote:
> On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote:
> > No. pagecache_get_page() returns subpage. See description of the first
> > patch.
Oh, I re-read patch 1 and it made sense now. I misse
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote:
> Most of work happans on head page. Only when we need to do copy data to
> userspace we find relevant subpage.
>
> We are still limited by PAGE_SIZE per iteration. Lifting this limitation
> would require some more work.
Now that
On Tue, Feb 28, 2017 at 05:36:05PM -0600, Goldwyn Rodrigues wrote:
> Find out if the write will trigger a wait due to writeback. If yes,
> return -EAGAIN.
>
> This introduces a new function filemap_range_has_page() which
> returns true if the file's mapping has a page within the range
> mentioned.
On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote:
> On Wed 01-03-17 07:38:57, Christoph Hellwig wrote:
> > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote:
> > > But what's going to kick these pages out of cache? Shouldn't we rather
> >
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote:
> This introduces a new function filemap_range_has_page() which
> returns true if the file's mapping has a page within the range
> mentioned.
I thought you were going to replace this patch with one that starts
writeback for these p
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues
>
> Find out if the write will trigger a wait due to writeback. If yes,
> return -EAGAIN.
>
> This introduces a new function filemap_range_has_page() which
> returns true if the file's mapping has a page
On Thu, Jul 25, 2019 at 11:23:23AM -0600, Logan Gunthorpe wrote:
> nvme_get_by_path() is analagous to blkdev_get_by_path() except it
> gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0).
>
> The purpose of this function is to support NVMe-OF target passthru.
I can't find anywhere
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote:
>
>
> On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote:
> > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote:
> >> cdev_get_by_path() attempts to retrieve a struct cdev from
> >> a path name. It is analagous to blkd
1 - 100 of 116 matches
Mail list logo