Re: [PATCH 01/33] block: add a lower-level bio_add_page interface

2018-05-09 Thread Matthew Wilcox
On Wed, May 09, 2018 at 09:47:58AM +0200, Christoph Hellwig wrote: > +/** > + * __bio_try_merge_page - try adding data to an existing bvec > + * @bio: destination bio > + * @page: page to add > + * @len: length of the range to add > + * @off: offset into @page > + * > + * Try adding the data descri

Re: [PATCH 02/33] fs: factor out a __generic_write_end helper

2018-05-09 Thread Matthew Wilcox
On Wed, May 09, 2018 at 09:47:59AM +0200, Christoph Hellwig wrote: > } > EXPORT_SYMBOL(generic_write_end); > > + > /* Spurious?

Re: [PATCH 06/33] mm: give the 'ret' variable a better name __do_page_cache_readahead

2018-05-09 Thread Matthew Wilcox
On Wed, May 09, 2018 at 09:48:03AM +0200, Christoph Hellwig wrote: > It counts the number of pages acted on, so name it nr_pages to make that > obvious. > > Signed-off-by: Christoph Hellwig Yes! Also, it can't return an error, so how about changing it to unsigned int? And deleting the error che

Re: [PATCH 07/33] mm: split ->readpages calls to avoid non-contiguous pages lists

2018-05-09 Thread Matthew Wilcox
On Wed, May 09, 2018 at 09:48:04AM +0200, Christoph Hellwig wrote: > That way file systems don't have to go spotting for non-contiguous pages > and work around them. It also kicks off I/O earlier, allowing it to > finish earlier and reduce latency. Makes sense. > + /* > +

Re: [PATCH 0/5] block: introduce helpers for allocating io buffer from slab

2018-10-18 Thread Matthew Wilcox
On Thu, Oct 18, 2018 at 09:18:12PM +0800, Ming Lei wrote: > Hi, > > Filesystems may allocate io buffer from slab, and use this buffer to > submit bio. This way may break storage drivers if they have special > requirement on DMA alignment. Before we go down this road, could we have a discussion ab

Re: [PATCH 0/5] block: introduce helpers for allocating io buffer from slab

2018-10-18 Thread Matthew Wilcox
On Thu, Oct 18, 2018 at 04:05:51PM +0200, Christoph Hellwig wrote: > On Thu, Oct 18, 2018 at 07:03:42AM -0700, Matthew Wilcox wrote: > > Before we go down this road, could we have a discussion about what > > hardware actually requires this? Storage has this weird assumption that &

Re: [PATCH 4/5] block: introduce helpers for allocating IO buffers from slab

2018-10-18 Thread Matthew Wilcox
On Thu, Oct 18, 2018 at 04:42:07PM +0200, Christoph Hellwig wrote: > This all seems quite complicated. > > I think the interface we'd want is more one that has a little > cache of a single page in the queue, and a little bitmap which > sub-page size blocks of it are used. > > Something like (pseu

Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

2019-02-19 Thread Matthew Wilcox
On Tue, Feb 19, 2019 at 02:20:26PM +0100, Jan Kara wrote: > Thanks for information. Yeah, that makes somewhat more sense. Can you ever > see the failure if you disable CONFIG_TRANSPARENT_HUGEPAGE? Because your > findings still seem to indicate that there' some problem with page > migration and Alph

Re: Read-only Mapping of Program Text using Large THP Pages

2019-02-20 Thread Matthew Wilcox
[adding linux-nvme and linux-block for opinions on the critical-page-first idea in the second and third paragraphs below] On Wed, Feb 20, 2019 at 07:07:29AM -0700, William Kucharski wrote: > > On Feb 20, 2019, at 6:44 AM, Matthew Wilcox wrote: > > That interface would need to ha

Re: Read-only Mapping of Program Text using Large THP Pages

2019-02-20 Thread Matthew Wilcox
On Wed, Feb 20, 2019 at 09:39:22AM -0700, Keith Busch wrote: > On Wed, Feb 20, 2019 at 06:43:46AM -0800, Matthew Wilcox wrote: > > What NVMe doesn't have is a way for the host to tell the controller > > "Here's a 2MB sized I/O; bytes 40960 to 45056 are most importa

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-25 Thread Matthew Wilcox
On Tue, Feb 26, 2019 at 02:02:14PM +1100, Dave Chinner wrote: > > Or what is the exact size of sub-page IO in xfs most of time? For > > Determined by mkfs parameters. Any power of 2 between 512 bytes and > 64kB needs to be supported. e.g: > > # mkfs.xfs -s size=512 -b size=1k -i size=2k -n size=8

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-26 Thread Matthew Wilcox
On Tue, Feb 26, 2019 at 07:12:49PM +0800, Ming Lei wrote: > On Tue, Feb 26, 2019 at 6:07 PM Vlastimil Babka wrote: > > On 2/26/19 10:33 AM, Ming Lei wrote: > > > On Tue, Feb 26, 2019 at 03:58:26PM +1100, Dave Chinner wrote: > > >> On Mon, Feb 25, 2019 at 07:27:37

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-26 Thread Matthew Wilcox
On Tue, Feb 26, 2019 at 08:35:46PM +0800, Ming Lei wrote: > On Tue, Feb 26, 2019 at 04:12:09AM -0800, Matthew Wilcox wrote: > > On Tue, Feb 26, 2019 at 07:12:49PM +0800, Ming Lei wrote: > > > The buffer needs to be device block size aligned for dio, and now the > > >

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-26 Thread Matthew Wilcox
On Tue, Feb 26, 2019 at 09:42:48PM +0800, Ming Lei wrote: > On Tue, Feb 26, 2019 at 05:02:30AM -0800, Matthew Wilcox wrote: > > Wait, we're imposing a ridiculous amount of complexity on XFS for no > > reason at all? We should just change this to 512-byte alignment. Tying >

Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

2019-02-26 Thread Matthew Wilcox
On Tue, Feb 26, 2019 at 08:14:33AM -0800, Darrick J. Wong wrote: > On Tue, Feb 26, 2019 at 06:04:40AM -0800, Matthew Wilcox wrote: > > On Tue, Feb 26, 2019 at 09:42:48PM +0800, Ming Lei wrote: > > > On Tue, Feb 26, 2019 at 05:02:30AM -0800, Matthew Wilcox wrote: > > &g

Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

2019-02-27 Thread Matthew Wilcox
On Fri, Feb 22, 2019 at 09:45:05AM -0700, Keith Busch wrote: > On Thu, Feb 21, 2019 at 09:51:12PM -0500, Martin K. Petersen wrote: > > > > Keith, > > > > > With respect to fs block sizes, one thing making discards suck is that > > > many high capacity SSDs' physical page sizes are larger than the

[PATCH 06/14] genhd: Convert to XArray

2019-03-18 Thread Matthew Wilcox
Replace the IDR with the XArray. Includes converting the lookup from being protected by a spinlock to being protected by RCU. Signed-off-by: Matthew Wilcox --- block/genhd.c | 42 -- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/block

[PATCH 04/14] blk-ioc: Convert to XArray

2019-03-18 Thread Matthew Wilcox
Use xa_insert_irq() to do the allocation before grabbing the other locks. This user appears to be able to race, so use xa_cmpxchg() to handle the race effectively. Signed-off-by: Matthew Wilcox --- block/blk-ioc.c | 23 +-- include/linux/iocontext.h | 6 +++--- 2

[PATCH 07/14] bsg: Convert bsg_minor_idr to XArray

2019-03-18 Thread Matthew Wilcox
Signed-off-by: Matthew Wilcox --- block/bsg.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/block/bsg.c b/block/bsg.c index f306853c6b08..e24420a21383 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -16,9 +16,9 @@ #include #include #include

[PATCH 08/14] brd: Convert to XArray

2019-03-18 Thread Matthew Wilcox
Convert brd_pages from a radix tree to an XArray. Simpler and smaller code; in particular another user of radix_tree_preload is eliminated. Signed-off-by: Matthew Wilcox --- drivers/block/brd.c | 93 ++--- 1 file changed, 28 insertions(+), 65 deletions

[PATCH 00/14] Convert block layer & drivers to XArray

2019-03-18 Thread Matthew Wilcox
ry(). - idr_replace() has no exact equivalent. Some users relied on its exact semantics of only storing if the entry was non-NULL, but all users of idr_replace() were able to use xa_store() or xa_cmpxchg(). - The family of radix tree gang lookup functions have been replaced with xa_extract().

[PATCH 03/14] blk-cgroup: Reduce scope of blkg_array lock

2019-03-18 Thread Matthew Wilcox
We can now take and release the blkg_array lock within blkg_destroy() instead of forcing the caller to hold it across the call. Signed-off-by: Matthew Wilcox --- block/blk-cgroup.c | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk

[PATCH 01/14] blk-cgroup: Convert to XArray

2019-03-18 Thread Matthew Wilcox
t fail, so we can remove the error checks. Signed-off-by: Matthew Wilcox --- block/bfq-cgroup.c | 4 +-- block/blk-cgroup.c | 69 -- include/linux/blk-cgroup.h | 5 ++- 3 files changed, 33 insertions(+), 45 deletions(-) diff --git a/block/bfq-

[PATCH 05/14] blk-ioc: Remove ioc's icq_list

2019-03-18 Thread Matthew Wilcox
Use the XArray's iterator instead of this hlist. Signed-off-by: Matthew Wilcox --- block/blk-ioc.c | 15 ++- include/linux/iocontext.h | 16 +--- 2 files changed, 11 insertions(+), 20 deletions(-) diff --git a/block/blk-ioc.c b/block/blk-ioc.c

[PATCH 14/14] drbd: Convert peer devices to XArray

2019-03-18 Thread Matthew Wilcox
Signed-off-by: Matthew Wilcox --- drivers/block/drbd/drbd_int.h | 4 +- drivers/block/drbd/drbd_main.c | 25 +++-- drivers/block/drbd/drbd_nl.c | 35 +- drivers/block/drbd/drbd_receiver.c | 29 --- drivers/block/drbd/drbd_state.c| 59

[PATCH 02/14] blk-cgroup: Remove blkg_list hlist

2019-03-18 Thread Matthew Wilcox
We can iterate over all blkcgs using the XArray iterator instead of maintaining a separate hlist. This removes a nasty locking inversion in blkcg_destroy_blkgs(). Signed-off-by: Matthew Wilcox --- block/bfq-cgroup.c | 3 ++- block/blk-cgroup.c | 38

[PATCH 11/14] nbd: Convert nbd_index_idr to XArray

2019-03-18 Thread Matthew Wilcox
Signed-off-by: Matthew Wilcox --- drivers/block/nbd.c | 145 ++-- 1 file changed, 59 insertions(+), 86 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 90ba9f4c03f3..6e64884973dd 100644 --- a/drivers/block/nbd.c +++ b/drivers

[PATCH 13/14] drbd: Convert drbd devices to XArray

2019-03-18 Thread Matthew Wilcox
Signed-off-by: Matthew Wilcox --- drivers/block/drbd/drbd_debugfs.c | 16 - drivers/block/drbd/drbd_int.h | 6 ++-- drivers/block/drbd/drbd_main.c | 56 ++ drivers/block/drbd/drbd_nl.c | 42 +++--- drivers/block/drbd

[PATCH 12/14] zram: Convert zram_index_idr to XArray

2019-03-18 Thread Matthew Wilcox
Signed-off-by: Matthew Wilcox --- drivers/block/zram/zram_drv.c | 40 +-- 1 file changed, 15 insertions(+), 25 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c index e7a5f1d1c314..f7e53a681637 100644 --- a/drivers/block/zram

[PATCH 09/14] null_blk: Convert to XArray

2019-03-18 Thread Matthew Wilcox
By changing the locking we could remove the slightly awkward dance in null_insert_page(), but I'll leave that for someone who's more familiar with the driver. Signed-off-by: Matthew Wilcox --- drivers/block/null_blk.h | 4 +- drivers/block/null_blk_m

[PATCH 10/14] loop: Convert loop_index_idr to XArray

2019-03-18 Thread Matthew Wilcox
Signed-off-by: Matthew Wilcox --- drivers/block/loop.c | 88 1 file changed, 31 insertions(+), 57 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 1e6edd568214..d1a0f689788d 100644 --- a/drivers/block/loop.c +++ b/drivers

Re: [LSF/MM TOPIC] guarantee natural alignment for kmalloc()?

2019-04-11 Thread Matthew Wilcox
On Thu, Apr 11, 2019 at 02:52:08PM +0200, Vlastimil Babka wrote: > In the session I hope to resolve the question whether this is indeed the > right thing to do for all kmalloc() users, without an explicit alignment > requests, and if it's worth the potentially worse > performance/fragmentation it w

Re: [PATCHSET 0/3] io_uring: add sync_file_range and drains

2019-04-11 Thread Matthew Wilcox
On Thu, Apr 11, 2019 at 09:06:54AM -0600, Jens Axboe wrote: > In continuation of the fsync barrier patch from the other day, I > reworked that patch to turn it into a general primitive instead. This > means that any command can be flagged with IOSQE_IO_DRAIN, which will > insert a sequence point in

Re: [LSF/MM TOPIC] guarantee natural alignment for kmalloc()?

2019-04-25 Thread Matthew Wilcox
On Thu, Apr 11, 2019 at 06:28:19AM -0700, Matthew Wilcox wrote: > On Thu, Apr 11, 2019 at 02:52:08PM +0200, Vlastimil Babka wrote: > > In the session I hope to resolve the question whether this is indeed the > > right thing to do for all kmalloc() users, without an explicit alignme

Re: simplify bio_for_each_segment_all

2019-04-30 Thread Matthew Wilcox
--end quoted text--- If Reviewed-by dominates Suggested-by, then Reviewed-by: Matthew Wilcox

Re: [PATCH v4 06/27] fs: check for writeback errors after syncing out buffers in generic_file_fsync

2017-05-10 Thread Matthew Wilcox
e buffers. That > will be sufficient for this case, and help other callers detect > these errors properly as well. > > With that, we don't need to twiddle it in ext2. > > Suggested-by: Jan Kara > Signed-off-by: Jeff Layton > Reviewed-by: Christoph Hellwig > Reviewed-by: Jan Kara Reviewed-by: Matthew Wilcox

Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Matthew Wilcox
On Tue, May 09, 2017 at 11:49:16AM -0400, Jeff Layton wrote: > +++ b/lib/errseq.c > @@ -0,0 +1,199 @@ > +#include > +#include > +#include > +#include > + > +/* > + * An errseq_t is a way of recording errors in one place, and allowing any > + * number of "subscribers" to tell whether it has chan

Re: [PATCH v2 16/51] block: bounce: avoid direct access to bvec table

2017-06-26 Thread Matthew Wilcox
On Mon, Jun 26, 2017 at 08:09:59PM +0800, Ming Lei wrote: > bio_for_each_segment_all(bvec, bio, i) { > - org_vec = bio_orig->bi_io_vec + i + start; > - > - if (bvec->bv_page == org_vec->bv_page) > - continue; > + orig_vec = bio_iter_iove

Re: [PATCH 01/12] kernfs: add function to find kernfs_node without increasing ref counter

2018-11-12 Thread Matthew Wilcox
On Mon, Nov 12, 2018 at 04:28:40AM -0800, Greg Kroah-Hartman wrote: > On Mon, Nov 12, 2018 at 10:56:21AM +0100, Paolo Valente wrote: > > From: Angelo Ruocco > > > > The kernfs pseudo file system doesn't export any function to only find > > a node by name, without also getting a reference on it. >

Re: [PATCHSET v1] io_uring IO interface

2019-01-09 Thread Matthew Wilcox
On Tue, Jan 08, 2019 at 09:56:29AM -0700, Jens Axboe wrote: > After some arm twisting from Christoph, I finally caved and divorced the > aio-poll patches from aio/libaio itself. The io_uring interface itself > is useful and efficient, and after rebasing all the new goodies on top > of that, there w

Re: [PATCH v2 0/4] lockdep/crossrelease: Apply crossrelease to page locks

2017-12-04 Thread Matthew Wilcox
On Mon, Dec 04, 2017 at 02:16:19PM +0900, Byungchul Park wrote: > For now, wait_for_completion() / complete() works with lockdep, add > lock_page() / unlock_page() and its family to lockdep support. > > Changes from v1 > - Move lockdep_map_cross outside of page_ext to make it flexible > - Preven

Re: [PATCH v2 0/4] lockdep/crossrelease: Apply crossrelease to page locks

2017-12-04 Thread Matthew Wilcox
On Tue, Dec 05, 2017 at 03:19:46PM +0900, Byungchul Park wrote: > On 12/5/2017 2:46 PM, Byungchul Park wrote: > > On 12/5/2017 2:30 PM, Matthew Wilcox wrote: > > > On Mon, Dec 04, 2017 at 02:16:19PM +0900, Byungchul Park wrote: > > > > For now, wait_for_completion() /

Re: [PATCH] locking/lockdep: Add CONFIG_LOCKDEP_AGGRESSIVE

2017-12-12 Thread Matthew Wilcox
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote: > On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: > > The *problem* is false positives, since locks and waiters in > > kernel are not classified properly, at the moment, which is just > > a fact that is not related to cr

Re: About the try to remove cross-release feature entirely by Ingo

2017-12-29 Thread Matthew Wilcox
On Fri, Dec 29, 2017 at 04:28:51PM +0900, Byungchul Park wrote: > On Thu, Dec 28, 2017 at 10:51:46PM -0500, Theodore Ts'o wrote: > > On Fri, Dec 29, 2017 at 10:47:36AM +0900, Byungchul Park wrote: > > > > > >(1) The best way: To classify all waiters correctly. > > > > It's really not all wait

Re: About the try to remove cross-release feature entirely by Ingo

2017-12-30 Thread Matthew Wilcox
On Sat, Dec 30, 2017 at 10:40:41AM -0500, Theodore Ts'o wrote: > On Fri, Dec 29, 2017 at 10:16:24PM -0800, Matthew Wilcox wrote: > > > The problems come from wrong classification. Waiters either classfied > > > well or invalidated properly won't bitrot. > > &g

Re: About the try to remove cross-release feature entirely by Ingo

2018-01-01 Thread Matthew Wilcox
On Sat, Dec 30, 2017 at 06:00:57PM -0500, Theodore Ts'o wrote: > On Sat, Dec 30, 2017 at 05:40:28PM -0500, Theodore Ts'o wrote: > > On Sat, Dec 30, 2017 at 12:44:17PM -0800, Matthew Wilcox wrote: > > > > > > I'm not sure I agree with this part. Wha

[LSF/MM TOPIC] A high-performance userspace block driver

2018-01-16 Thread Matthew Wilcox
I see the improvements that Facebook have been making to the nbd driver, and I think that's a wonderful thing. Maybe the outcome of this topic is simply: "Shut up, Matthew, this is good enough". It's clear that there's an appetite for userspace block devices; not for swap devices or the root dev

Re: [LSF/MM TOPIC] A high-performance userspace block driver

2018-01-17 Thread Matthew Wilcox
On Wed, Jan 17, 2018 at 10:49:24AM +0800, Ming Lei wrote: > Userfaultfd might be another choice: > > 1) map the block LBA space into a range of process vm space That would limit the size of a block device to ~200TB (with my laptop's CPU). That's probably OK for most users, but I suspect there ar

Re: [PATCH v5 1/2] Return bytes transferred for partial direct I/O

2018-01-22 Thread Matthew Wilcox
On Mon, Jan 22, 2018 at 08:28:54PM -0700, Jens Axboe wrote: > On 1/22/18 8:18 PM, Goldwyn Rodrigues wrote: > >> that their application was "already broken". I'd hate for a kernel > >> upgrade to break them. > >> > >> I do wish we could make the change, and maybe we can. But it probably > >> needs s

RE: [PATCH v8 12/18] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors

2017-06-29 Thread Matthew Wilcox
From: Jeff Layton [mailto:jlay...@poochiereds.net] > On Thu, 2017-06-29 at 10:11 -0700, Darrick J. Wong wrote: > > On Thu, Jun 29, 2017 at 09:19:48AM -0400, jlay...@kernel.org wrote: > > > +Handling errors during writeback > > > + > > > +Most applications that utiliz

Re: [RFC 0/5] fs: replace kthread freezing with filesystem freeze/thaw

2017-10-03 Thread Matthew Wilcox
On Tue, Oct 03, 2017 at 10:05:11PM +0200, Luis R. Rodriguez wrote: > On Wed, Oct 04, 2017 at 03:33:01AM +0800, Ming Lei wrote: > > On Tue, Oct 03, 2017 at 11:53:08AM -0700, Luis R. Rodriguez wrote: > > > INFO: task kworker/u8:8:1320 blocked for more than 10 seconds. > > > Tainted: G

Re: rfc: remove print_vma_addr ? (was Re: [PATCH 00/16] remove eight obsolete architectures)

2018-03-15 Thread Matthew Wilcox
On Thu, Mar 15, 2018 at 09:56:46AM -0700, Joe Perches wrote: > I have a patchset that creates a vsprintf extension for > print_vma_addr and removes all the uses similar to the > print_symbol() removal. > > This now avoids any possible printk interleaving. > > Unfortunately, without some #ifdef in

Re: [PATCH 2/2] loop: use interruptible lock in ioctls

2018-03-26 Thread Matthew Wilcox
On Mon, Mar 26, 2018 at 04:16:26PM -0700, Omar Sandoval wrote: > Even after the previous patch to drop lo_ctl_mutex while calling > vfs_getattr(), there are other cases where we can end up sleeping for a > long time while holding lo_ctl_mutex. Let's avoid the uninterruptible > sleep from the ioctls

Re: [PATCH 2/2] loop: use interruptible lock in ioctls

2018-03-28 Thread Matthew Wilcox
On Wed, Mar 28, 2018 at 03:18:52PM +0200, David Sterba wrote: > On Mon, Mar 26, 2018 at 05:04:21PM -0700, Matthew Wilcox wrote: > > On Mon, Mar 26, 2018 at 04:16:26PM -0700, Omar Sandoval wrote: > > > Even after the previous patch to drop lo_ctl_mutex while calling > > &g

Block layer use of __GFP flags

2018-04-07 Thread Matthew Wilcox
Please explain: commit 6a15674d1e90917f1723a814e2e8c949000440f7 Author: Bart Van Assche Date: Thu Nov 9 10:49:54 2017 -0800 block: Introduce blk_get_request_flags() A side effect of this patch is that the GFP mask that is passed to several allocation functions in the legacy b

Re: Block layer use of __GFP flags

2018-04-08 Thread Matthew Wilcox
On Sun, Apr 08, 2018 at 04:40:59PM +, Bart Van Assche wrote: > __GFP_KSWAPD_RECLAIM wasn't stripped off on purpose for non-atomic > allocations. That was an oversight. OK, good. > Do you perhaps want me to prepare a patch that makes blk_get_request() again > respect the full gfp mask passed

Re: Block layer use of __GFP flags

2018-04-09 Thread Matthew Wilcox
On Mon, Apr 09, 2018 at 01:26:50AM -0700, Christoph Hellwig wrote: > On Mon, Apr 09, 2018 at 08:53:49AM +0200, Hannes Reinecke wrote: > > Why don't you fold the 'flags' argument into the 'gfp_flags', and drop > > the 'flags' argument completely? > > Looks a bit pointless to me, having two arguments

Re: [PATCH 6/7] block: consistently use GFP_NOIO instead of __GFP_NORECLAIM

2018-04-09 Thread Matthew Wilcox
On Mon, Apr 09, 2018 at 05:39:15PM +0200, Christoph Hellwig wrote: > Same numerical value (for now at least), but a much better documentation > of intent. > @@ -499,7 +499,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk > *disk, fmode_t mode, > break; > } > >

Re: [PATCH 7/7] block: use GFP_KERNEL for allocations from blk_get_request

2018-04-09 Thread Matthew Wilcox
On Mon, Apr 09, 2018 at 05:39:16PM +0200, Christoph Hellwig wrote: > blk_get_request is used for pass-through style I/O and thus doesn't need > GFP_NOIO. Obviously GFP_KERNEL is a big improvement over GFP_NOIO! But can we take it all the way to GFP_USER, if this is always done in the ioctl path (

Re: [PATCH 3/4] blk: add blk_queue_fua() helper function

2018-04-18 Thread Matthew Wilcox
On Wed, Apr 18, 2018 at 12:34:03PM +0200, Christoph Hellwig wrote: > s/blk/block/ for block patches. I think this is something we should put in MAINTAINERS. Eventually some tooling can pull it out, but I don't think this is something that people can reasonably be expected to know. diff --git a/

Re: [LSF/MM] schedule suggestion

2018-04-19 Thread Matthew Wilcox
On Thu, Apr 19, 2018 at 10:38:25AM -0400, Jerome Glisse wrote: > Oh can i get one more small slot for fs ? I want to ask if they are > any people against having a callback everytime a struct file is added > to a task_struct and also having a secondary array so that special > file like device file c

Re: [LSF/MM] schedule suggestion

2018-04-19 Thread Matthew Wilcox
On Thu, Apr 19, 2018 at 03:31:08PM -0400, Jerome Glisse wrote: > > > Basicly i want a callback in __fd_install(), do_dup2(), dup_fd() and > > > add void * *private_data; to struct fdtable (also a default array to > > > struct files_struct). The callback would be part of struct > > > file_operation

Re: [LSF/MM] schedule suggestion

2018-04-19 Thread Matthew Wilcox
On Thu, Apr 19, 2018 at 04:15:02PM -0400, Jerome Glisse wrote: > On Thu, Apr 19, 2018 at 12:56:37PM -0700, Matthew Wilcox wrote: > > > Well scratch that whole idea, i would need to add a new array to task > > > struct which make it a lot less appealing. Hence a better solutio

[LSF/MM] Ride sharing

2018-04-19 Thread Matthew Wilcox
I hate renting unnecessary cars, and the various transportation companies offer a better deal if multiple people book at once. I'm scheduled to arrive on Sunday at 3:18pm local time if anyone wants to share transport. Does anyone have a wiki we can use to coordinate this?

Re: write call hangs in kernel space after virtio hot-remove

2018-05-03 Thread Matthew Wilcox
On Thu, May 03, 2018 at 12:05:14PM -0400, Jeff Layton wrote: > On Thu, 2018-05-03 at 16:42 +0200, Jan Kara wrote: > > On Wed 25-04-18 17:07:48, Fabiano Rosas wrote: > > > I'm looking into an issue where removing a virtio disk via sysfs while > > > another > > > process is issuing write() calls res

Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Matthew Wilcox
On Thu, May 03, 2018 at 11:21:14AM -0700, adam.manzana...@wdc.com wrote: > If we want to avoid bloating struct kiocb, I suggest we turn the private > field > into a union of the private and ki_ioprio field. It seems like the users of > the private field all use it at a point where we can yank th

Re: [PATCH v2] fs: Add aio iopriority support for block_dev

2018-05-03 Thread Matthew Wilcox
On Thu, May 03, 2018 at 02:24:58PM -0600, Jens Axboe wrote: > On 5/3/18 2:15 PM, Adam Manzanares wrote: > > On 5/3/18 11:33 AM, Matthew Wilcox wrote: > >> Or we could just make ki_hint a u8 or u16 ... seems unlikely we'll need > >> 32 bits of ki_hint. (currently de

RE: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-12 Thread Matthew Wilcox
From: Tejun Heo [mailto:hte...@gmail.com] On Behalf Of Tejun Heo > Ah, yeah, great to see the silly implementation being replaced the > radix tree. ida_pre_get() looks suspicious tho. idr_preload() > immedicately being followed by idr_preload_end() probably is broken. > Maybe what we need is movi

RE: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-16 Thread Matthew Wilcox
From: Andrew Morton [mailto:a...@linux-foundation.org] > On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes > wrote: > > TL;DR: these patches save 250 KB of memory, with more low-hanging > > fruit ready to pick. > > > > While browsing through the lib/idr.c code, I noticed that the code at > > the

RE: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-16 Thread Matthew Wilcox
From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk] > On Fri, Dec 16 2016, Matthew Wilcox wrote: > > Thanks for your work on this; you've really put some effort into > > proving your work has value. My motivation was purely aesthetic, but > > you've got some g

RE: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-17 Thread Matthew Wilcox
From: Matthew Wilcox > From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk] > > This sounds good. I think there may still be a lot of users that never > > allocate more than a handful of IDAs, making a 128 byte allocation still > > somewhat excessive. One thing I consi

RE: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-17 Thread Matthew Wilcox
From: Matthew Wilcox > From: Matthew Wilcox > > Heh, I was thinking about that too. The radix tree supports "exceptional > > entries" which have the bottom bit set. On a 64-bit machine, we could use > 62 > > of the bits in the radix tree root to store the ID

RE: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-23 Thread Matthew Wilcox
From: Rasmus Villemoes [mailto:li...@rasmusvillemoes.dk] > Nice work! A few random comments/questions: > > - It does add some complexity, but I think a few comments would make it > more digestable. I'm open to adding some comments ... I need some time between writing the code and writing the c

Re: [PATCHv6 02/37] Revert "radix-tree: implement radix_tree_maybe_preload_order()"

2017-01-26 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:44PM +0300, Kirill A. Shutemov wrote: > This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8. > > After conversion of huge tmpfs to multi-order entries, we don't need > this anymore. Yay! Reviewed-by: Matthew Wilcox -- To unsubscribe fro

Re: [PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-01-26 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:48PM +0300, Kirill A. Shutemov wrote: > For filesystems that wants to be write-notified (has mkwrite), we will > encount write-protection faults for huge PMDs in shared mappings. > > The easiest way to handle them is to clear the PMD and let it refault as > wriable.

Re: [PATCHv6 01/37] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2017-02-08 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:43PM +0300, Kirill A. Shutemov wrote: > +++ b/include/linux/pagemap.h > @@ -332,6 +332,15 @@ static inline struct page *grab_cache_page_nowait(struct > address_space *mapping, > mapping_gfp_mask(mapping)); > } > > +static inline struct page *f

Re: [PATCHv6 03/37] page-flags: relax page flag policy for few flags

2017-02-08 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:45PM +0300, Kirill A. Shutemov wrote: > These flags are in use for filesystems with backing storage: PG_error, > PG_writeback and PG_readahead. Oh ;-) Then I amend my comment on patch 1 to be "patch 3 needs to go ahead of patch 1" ;-) > Signed-off-by: Kirill A. Shut

Re: [PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-02-09 Thread Matthew Wilcox
as > wriable. > > Signed-off-by: Kirill A. Shutemov > Reviewed-by: Jan Kara Reviewed-by: Matthew Wilcox

Re: [PATCHv6 04/37] mm, rmap: account file thp pages

2017-02-09 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:46PM +0300, Kirill A. Shutemov wrote: > Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps. > It indicates how many times we allocate and map file THP. > > Signed-off-by: Kirill A. Shutemov Reviewed-by: Matthew Wilcox

Re: [PATCHv6 05/37] thp: try to free page's buffers before attempt split

2017-02-09 Thread Matthew Wilcox
think it's correct, but it still looks weird. Reviewed-by: Matthew Wilcox

Re: [PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed

2017-02-09 Thread Matthew Wilcox
ps->readpage(file, page); } put_page(page); } while (ret == AOP_TRUNCATED_PAGE); But ... maybe it's OK to retry the huge page. I mean, not many filesystems return AOP_TRUNCATED_PAGE, and they only do so rarely. Anyway, I'm fine with the patch going in as-is. I just wanted to type out my review notes. Reviewed-by: Matthew Wilcox

Re: [PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-02-09 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote: > +++ b/mm/filemap.c > @@ -1886,6 +1886,7 @@ static ssize_t do_generic_file_read(struct file *filp, > loff_t *ppos, > if (unlikely(page == NULL)) > goto no_cached_page; >

Re: [PATCHv6 09/37] filemap: allocate huge page in pagecache_get_page(), if allowed

2017-02-09 Thread Matthew Wilcox
utemov Reviewed-by: Matthew Wilcox

Re: [PATCHv6 10/37] filemap: handle huge pages in filemap_fdatawait_range()

2017-02-09 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:52PM +0300, Kirill A. Shutemov wrote: > @@ -405,9 +405,14 @@ static int __filemap_fdatawait_range(struct > address_space *mapping, > if (page->index > end) > continue; > > + page = compound_head

Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

2017-02-09 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:53PM +0300, Kirill A. Shutemov wrote: > Most page cache allocation happens via readahead (sync or async), so if > we want to have significant number of huge pages in page cache we need > to find a ways to allocate them from readahead. > > Unfortunately, huge pages doe

Re: [PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

2017-02-10 Thread Matthew Wilcox
On Thu, Feb 09, 2017 at 05:23:31PM -0700, Andreas Dilger wrote: > On Feb 9, 2017, at 4:34 PM, Matthew Wilcox wrote: > > Well ... what if we made readahead 2 hugepages in size for inodes which > > are using huge pages? That's only 8x our current readahead window, and >

Re: [PATCHv6 13/37] mm: make write_cache_pages() work on huge pages

2017-02-10 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:55PM +0300, Kirill A. Shutemov wrote: > We writeback whole huge page a time. Let's adjust iteration this way. > > Signed-off-by: Kirill A. Shutemov I think a lot of the complexity in this patch is from pagevec_lookup_tag giving you subpages rather than head pages...

Re: [PATCHv6 17/37] fs: make block_read_full_page() be able to read huge page

2017-02-10 Thread Matthew Wilcox
ointers on x86-64 -- 'arr' is allocated with kmalloc() for > huge pages. > > Signed-off-by: Kirill A. Shutemov Reviewed-by: Matthew Wilcox

Re: [PATCHv6 12/37] brd: make it handle huge pages

2017-02-10 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:54PM +0300, Kirill A. Shutemov wrote: > Do not assume length of bio segment is never larger than PAGE_SIZE. > With huge pages it's HPAGE_PMD_SIZE (2M on x86-64). I don't think we even need hugepages for BRD to be buggy. I think there are already places which allocate

Re: [PATCHv6 16/37] thp: make thp_get_unmapped_area() respect S_HUGE_MODE

2017-02-10 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:58PM +0300, Kirill A. Shutemov wrote: > We want mmap(NULL) to return PMD-aligned address if the inode can have > huge pages in page cache. > > Signed-off-by: Kirill A. Shutemov Reviewed-by: Matthew Wilcox

Re: [PATCHv6 15/37] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask}

2017-02-10 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:57PM +0300, Kirill A. Shutemov wrote: > Slab pages can be compound, but we shouldn't threat them as THP for > pupose of hpage_* helpers, otherwise it would lead to confusing results. > > For instance, ext4 uses slab pages for journal pages and we shouldn't > confuse t

Re: [PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-02-13 Thread Matthew Wilcox
On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote: > No. pagecache_get_page() returns subpage. See description of the first > patch. Your description says: > We also change interface for page-cache lookup function: > > - functions that lookup for pages[1] would return subpages

Re: [PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-02-13 Thread Matthew Wilcox
On Mon, Feb 13, 2017 at 08:01:17AM -0800, Matthew Wilcox wrote: > On Mon, Feb 13, 2017 at 06:33:42PM +0300, Kirill A. Shutemov wrote: > > No. pagecache_get_page() returns subpage. See description of the first > > patch. Oh, I re-read patch 1 and it made sense now. I misse

Re: [PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-02-13 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:50PM +0300, Kirill A. Shutemov wrote: > Most of work happans on head page. Only when we need to do copy data to > userspace we find relevant subpage. > > We are still limited by PAGE_SIZE per iteration. Lifting this limitation > would require some more work. Now that

Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-02-28 Thread Matthew Wilcox
On Tue, Feb 28, 2017 at 05:36:05PM -0600, Goldwyn Rodrigues wrote: > Find out if the write will trigger a wait due to writeback. If yes, > return -EAGAIN. > > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page within the range > mentioned.

Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-03-02 Thread Matthew Wilcox
On Thu, Mar 02, 2017 at 11:38:45AM +0100, Jan Kara wrote: > On Wed 01-03-17 07:38:57, Christoph Hellwig wrote: > > On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote: > > > But what's going to kick these pages out of cache? Shouldn't we rather > >

Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-03-16 Thread Matthew Wilcox
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote: > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page within the range > mentioned. I thought you were going to replace this patch with one that starts writeback for these p

Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback

2017-03-16 Thread Matthew Wilcox
On Wed, Mar 15, 2017 at 04:51:02PM -0500, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues > > Find out if the write will trigger a wait due to writeback. If yes, > return -EAGAIN. > > This introduces a new function filemap_range_has_page() which > returns true if the file's mapping has a page

Re: [PATCH v6 04/16] nvme-core: introduce nvme_get_by_path()

2019-07-25 Thread Matthew Wilcox
On Thu, Jul 25, 2019 at 11:23:23AM -0600, Logan Gunthorpe wrote: > nvme_get_by_path() is analagous to blkdev_get_by_path() except it > gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0). > > The purpose of this function is to support NVMe-OF target passthru. I can't find anywhere

Re: [PATCH v6 02/16] chardev: introduce cdev_get_by_path()

2019-07-25 Thread Matthew Wilcox
On Thu, Jul 25, 2019 at 11:53:20AM -0600, Logan Gunthorpe wrote: > > > On 2019-07-25 11:40 a.m., Greg Kroah-Hartman wrote: > > On Thu, Jul 25, 2019 at 11:23:21AM -0600, Logan Gunthorpe wrote: > >> cdev_get_by_path() attempts to retrieve a struct cdev from > >> a path name. It is analagous to blkd

  1   2   >