Re: [PATCH 0/6] mmc: block: command issue cleanups

2017-01-26 Thread Ulf Hansson
+Maxime On 26 January 2017 at 09:07, Ulf Hansson wrote: > On 24 January 2017 at 11:17, Linus Walleij wrote: >> The function mmc_blk_issue_rw_rq() is hopelessly convoluted and >> need to be refactored to it can be understood by humans. >> >> In

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 06:22 PM, Jens Axboe wrote: > On 01/26/2017 06:15 PM, Bart Van Assche wrote: >> On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote: >>> On 01/26/2017 05:38 PM, Bart Van Assche wrote: I see similar behavior with the blk-mq-sched branch of git://git.kernel.dk/linux-block.git

Re: [PATCH 0/4 RFC] BDI lifetime fix

2017-01-26 Thread Dan Williams
On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara wrote: > Hello, > > this patch series attempts to solve the problems with the life time of a > backing_dev_info structure. Currently it lives inside request_queue structure > and thus it gets destroyed as soon as request queue goes away.

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 06:15 PM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote: >> On 01/26/2017 05:38 PM, Bart Van Assche wrote: >>> I see similar behavior with the blk-mq-sched branch of >>> git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf): >>> booting

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Thu, 2017-01-26 at 17:41 -0700, Jens Axboe wrote: > On 01/26/2017 05:38 PM, Bart Van Assche wrote: > > I see similar behavior with the blk-mq-sched branch of > > git://git.kernel.dk/linux-block.git (git commit ID 0efe27068ecf): > > booting happens much slower than usual and I/O hangs if I run

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 05:38 PM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 16:50 -0700, Jens Axboe wrote: >> Clearly we are missing some requests. How do I setup dm similarly to >> you? >> >> Does it reproduce without Christoph's patchset? > > Hello Jens, > > I see similar behavior with the

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Thu, 2017-01-26 at 16:50 -0700, Jens Axboe wrote: > Clearly we are missing some requests. How do I setup dm similarly to > you? > > Does it reproduce without Christoph's patchset? Hello Jens, I see similar behavior with the blk-mq-sched branch of git://git.kernel.dk/linux-block.git (git

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 04:50 PM, Jens Axboe wrote: > On 01/26/2017 04:47 PM, Bart Van Assche wrote: >> On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote: >>> What device is stuck? Is it running with an mq scheduler attached, or >>> with "none"? >>> >>> Would also be great to see the output of

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Thu, 2017-01-26 at 16:26 -0700, Jens Axboe wrote: > What device is stuck? Is it running with an mq scheduler attached, or > with "none"? > > Would also be great to see the output of /sys/block/*/mq/*/tags and > sched_tags so we can see if they have anything pending. > > From a quick look at

Re: [PATCH 04/18] block: simplify blk_init_allocated_queue

2017-01-26 Thread Bart Van Assche
On Wed, 2017-01-25 at 18:25 +0100, Christoph Hellwig wrote: > Return an errno value instead of the passed in queue so that the callers > don't have to keep track of two queues, and move the assignment of the > request_fn and lock to the caller as passing them as argument doesn't > simplify

Re: [PATCH 03/18] block: fix elevator init check

2017-01-26 Thread Bart Van Assche
On Wed, 2017-01-25 at 18:25 +0100, Christoph Hellwig wrote: > We can't initalize the elevator fields for flushes as flush share space > in struct request with the elevator data. But currently we can't > commnicate that a request is a flush through blk_get_request as we > can only pass READ or

Re: [PATCH 02/18] md: cleanup bio op / flags handling in raid1_write_request

2017-01-26 Thread Bart Van Assche
On Wed, 2017-01-25 at 18:25 +0100, Christoph Hellwig wrote: > No need for the local variables, the bio is still live and we can just > assigned the bits we want directly. Make me wonder why we can't assign > all the bio flags to start with. I assume that you ment "assign" in the patch

Re: [PATCH 01/18] block: add a op_is_flush helper

2017-01-26 Thread Bart Van Assche
On Wed, 2017-01-25 at 18:25 +0100, Christoph Hellwig wrote: > This centralizes the checks for bios that needs to be go into the flush > state machine. Reviewed-by: Bart Van Assche -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 02:47 PM, Bart Van Assche wrote: > (gdb) list *(blk_mq_sched_get_request+0x310) > 0x8132dcf0 is in blk_mq_sched_get_request (block/blk-mq-sched.c:136). > 131 rq->rq_flags |= RQF_QUEUED; > 132 } else > 133

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Thu, 2017-01-26 at 14:12 -0700, Jens Axboe wrote: > On 01/26/2017 02:01 PM, Bart Van Assche wrote: > > On Thu, 2017-01-26 at 13:54 -0700, Jens Axboe wrote: > > > Your call path has blk_get_request() in it, I don't have > > > that in my tree. Is it passing in the right mask? > > > > Hello Jens,

blocked task timeout with high IO load across multiple luns (using btrfs)

2017-01-26 Thread Cheyenne Wills
Opened a bug -> https://bugzilla.kernel.org/show_bug.cgi?id=193331 tried multiple linux levels (4.4.21, 4.7, 4.9.5) (gentoo-sources). Have been testing using the 4.9.5-gentoo sources kernel. Might possible be a btrfs issue, but in an IRC chat with one of the gentoo linux-kernel folks, they

Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()

2017-01-26 Thread Omar Sandoval
On Thu, Jan 26, 2017 at 01:59:23PM -0700, Jens Axboe wrote: > On 01/26/2017 01:54 PM, Omar Sandoval wrote: > > On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote: > >> When we invoke dispatch_requests(), the scheduler empties everything > >> into the passed in list. This isn't always a

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Thu, 2017-01-26 at 13:54 -0700, Jens Axboe wrote: > Your call path has blk_get_request() in it, I don't have > that in my tree. Is it passing in the right mask? Hello Jens, There is only one blk_get_request() call in drivers/md/dm-mpath.c and it looks as follows: clone =

Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()

2017-01-26 Thread Jens Axboe
On 01/26/2017 01:54 PM, Omar Sandoval wrote: > On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote: >> When we invoke dispatch_requests(), the scheduler empties everything >> into the passed in list. This isn't always a good thing, since it >> means that we remove items that we could have

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 01:47 PM, Bart Van Assche wrote: > On 01/26/2017 11:01 AM, Jens Axboe wrote: >> On 01/26/2017 11:59 AM, h...@lst.de wrote: >>> On Thu, Jan 26, 2017 at 11:57:36AM -0700, Jens Axboe wrote: It's against my for-4.11/block, which you were running under Christoph's patches. Maybe

Re: [PATCH 5/5] blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()

2017-01-26 Thread Omar Sandoval
On Thu, Jan 26, 2017 at 12:48:18PM -0700, Jens Axboe wrote: > When we invoke dispatch_requests(), the scheduler empties everything > into the passed in list. This isn't always a good thing, since it > means that we remove items that we could have potentially merged > with. > > Change the function

Re: [dm-devel] split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On 01/26/2017 11:01 AM, Jens Axboe wrote: > On 01/26/2017 11:59 AM, h...@lst.de wrote: >> On Thu, Jan 26, 2017 at 11:57:36AM -0700, Jens Axboe wrote: >>> It's against my for-4.11/block, which you were running under Christoph's >>> patches. Maybe he's using an older version? In any case, should be

Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 11:59 AM, h...@lst.de wrote: > On Thu, Jan 26, 2017 at 11:57:36AM -0700, Jens Axboe wrote: >> It's against my for-4.11/block, which you were running under Christoph's >> patches. Maybe he's using an older version? In any case, should be >> pretty trivial for you to hand apply. Just

Re: [PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info

2017-01-26 Thread Dan Williams
On Thu, Jan 26, 2017 at 9:45 AM, Jan Kara wrote: > Instead of storing backing_dev_info inside struct request_queue, > allocate it dynamically, reference count it, and free it when the last > reference is dropped. Currently only request_queue holds the reference > but in the

Re: [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags

2017-01-26 Thread Omar Sandoval
On Thu, Jan 26, 2017 at 12:48:17PM -0700, Jens Axboe wrote: > If we have both multiple hardware queues and shared tag map between > devices, we need to ensure that we propagate the hardware queue > restart bit higher up. This is because we can get into a situation > where we don't have any IO

Re: [PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags

2017-01-26 Thread Jens Axboe
On 01/26/2017 01:25 PM, Omar Sandoval wrote: > On Thu, Jan 26, 2017 at 12:48:17PM -0700, Jens Axboe wrote: >> If we have both multiple hardware queues and shared tag map between >> devices, we need to ensure that we propagate the hardware queue >> restart bit higher up. This is because we can get

Re: [PATCH 1/5] blk-mq: improve scheduler queue sync/async running

2017-01-26 Thread Omar Sandoval
On Thu, Jan 26, 2017 at 12:48:14PM -0700, Jens Axboe wrote: > We'll use the same criteria for whether we need to run the queue sync > or async when we have a scheduler, as we do without one. Reviewed-by: Omar Sandoval > Signed-off-by: Jens Axboe > --- >

Re: [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation

2017-01-26 Thread Omar Sandoval
On Thu, Jan 26, 2017 at 12:52:15PM -0700, Jens Axboe wrote: > I screwed this up when splitting up the patchset, that last break needs to > be removed as well, of course. Updated below: > > > From 9d68cf9232c06a793e305d10b6d655df4beae928 Mon Sep 17 00:00:00 2001 > From: Jens Axboe

Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 11:29 AM, Bart Van Assche wrote: > On Wed, 2017-01-25 at 18:25 +0100, Christoph Hellwig wrote: >> Hi all, >> >> this series splits the support for SCSI passthrough commands from the >> main struct request used all over the block layer into a separate >> scsi_request structure that

Re: [PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation

2017-01-26 Thread Jens Axboe
On 01/26/2017 12:48 PM, Jens Axboe wrote: > Once we mark the queue as needing a restart, re-check if we can > get a driver tag. This fixes a theoretical issue where the needed > IO completes _after_ blk_mq_get_driver_tag() fails, but before we > manage to set the restart bit. > > Signed-off-by:

[PATCH 4/5] blk-mq-sched: fix starvation for multiple hardware queues and shared tags

2017-01-26 Thread Jens Axboe
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO.

[PATCH 1/5] blk-mq: improve scheduler queue sync/async running

2017-01-26 Thread Jens Axboe
We'll use the same criteria for whether we need to run the queue sync or async when we have a scheduler, as we do without one. Signed-off-by: Jens Axboe --- block/blk-mq.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c

[PATCH 2/5] blk-mq: fix potential race in queue restart and driver tag allocation

2017-01-26 Thread Jens Axboe
Once we mark the queue as needing a restart, re-check if we can get a driver tag. This fixes a theoretical issue where the needed IO completes _after_ blk_mq_get_driver_tag() fails, but before we manage to set the restart bit. Signed-off-by: Jens Axboe --- block/blk-mq.c | 9

Re: [PATCH 15/16] block: split scsi_request out of struct request

2017-01-26 Thread Jens Axboe
On 01/26/2017 12:37 PM, Christoph Hellwig wrote: > On Thu, Jan 26, 2017 at 11:12:51AM -0800, Bart Van Assche wrote: >> Where does the '* 3' come from? I think that deserves a comment. >> Additionally, this patch introduces a new warning when building with W=1: > > It's a magic factor copied from

Re: [PATCH 15/16] block: split scsi_request out of struct request

2017-01-26 Thread Christoph Hellwig
On Thu, Jan 26, 2017 at 11:12:51AM -0800, Bart Van Assche wrote: > Where does the '* 3' come from? I think that deserves a comment. > Additionally, this patch introduces a new warning when building with W=1: It's a magic factor copied from the old code :( That beeing said I really wonder if we

Re: [PATCH] queue stall with blk-mq-sched

2017-01-26 Thread Jens Axboe
On 01/26/2017 09:42 AM, Jens Axboe wrote: > On 01/26/2017 09:35 AM, Hannes Reinecke wrote: >> On 01/25/2017 11:27 PM, Jens Axboe wrote: >>> On 01/25/2017 10:42 AM, Jens Axboe wrote: On 01/25/2017 10:03 AM, Jens Axboe wrote: > On 01/25/2017 09:57 AM, Hannes Reinecke wrote: >> On

Re: [PATCH 15/16] block: split scsi_request out of struct request

2017-01-26 Thread Bart Van Assche
On 01/23/2017 07:29 AM, Christoph Hellwig wrote: > +int scsi_cmd_buf_len(struct request *rq) > +{ > + return scsi_req(rq)->cmd_len * 3; > +} > +EXPORT_SYMBOL(scsi_cmd_buf_len); Hello Christoph, Where does the '* 3' come from? I think that deserves a comment. Additionally, this patch

Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread h...@lst.de
On Thu, Jan 26, 2017 at 11:57:36AM -0700, Jens Axboe wrote: > It's against my for-4.11/block, which you were running under Christoph's > patches. Maybe he's using an older version? In any case, should be > pretty trivial for you to hand apply. Just ensure that .flags is set to > 0 for the common

Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread Jens Axboe
On 01/26/2017 11:52 AM, Bart Van Assche wrote: > On Thu, 2017-01-26 at 11:44 -0700, Jens Axboe wrote: >> I think this may be my bug - does the below help? > > Hello Jens, > > What tree has that patch been generated against? It does not apply > cleanly on top of Christoph's tree: > > $ git

Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Thu, 2017-01-26 at 11:44 -0700, Jens Axboe wrote: > I think this may be my bug - does the below help? Hello Jens, What tree has that patch been generated against? It does not apply cleanly on top of Christoph's tree: $ git checkout hch-block-pc-refactor $ patch -p1 --dry-run -f -s <

Re: split scsi passthrough fields out of struct request V2

2017-01-26 Thread Bart Van Assche
On Wed, 2017-01-25 at 18:25 +0100, Christoph Hellwig wrote: > Hi all, > > this series splits the support for SCSI passthrough commands from the > main struct request used all over the block layer into a separate > scsi_request structure that drivers that want to support SCSI passthough > need to

[PATCH 4/4] block: Make blk_get_backing_dev_info() safe without open bdev

2017-01-26 Thread Jan Kara
Currenly blk_get_backing_dev_info() is not safe to be called when the block device is not open as bdev->bd_disk is NULL in that case. However inode_to_bdi() uses this function and may be call called from flusher worker or other writeback related functions without bdev being open which leads to

[PATCH 3/4] block: Dynamically allocate and refcount backing_dev_info

2017-01-26 Thread Jan Kara
Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by:

[PATCH 0/4 RFC] BDI lifetime fix

2017-01-26 Thread Jan Kara
Hello, this patch series attempts to solve the problems with the life time of a backing_dev_info structure. Currently it lives inside request_queue structure and thus it gets destroyed as soon as request queue goes away. However the block device inode still stays around and thus inode_to_bdi()

[PATCH 1/4] block: Unhash block device inodes on gendisk destruction

2017-01-26 Thread Jan Kara
Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily.

Re: [PATCH] queue stall with blk-mq-sched

2017-01-26 Thread Hannes Reinecke
On 01/25/2017 11:27 PM, Jens Axboe wrote: > On 01/25/2017 10:42 AM, Jens Axboe wrote: >> On 01/25/2017 10:03 AM, Jens Axboe wrote: >>> On 01/25/2017 09:57 AM, Hannes Reinecke wrote: On 01/25/2017 04:52 PM, Jens Axboe wrote: > On 01/25/2017 04:10 AM, Hannes Reinecke wrote: [ .. ]

Re: [PATCH] queue stall with blk-mq-sched

2017-01-26 Thread Jens Axboe
On 01/26/2017 09:35 AM, Hannes Reinecke wrote: > On 01/25/2017 11:27 PM, Jens Axboe wrote: >> On 01/25/2017 10:42 AM, Jens Axboe wrote: >>> On 01/25/2017 10:03 AM, Jens Axboe wrote: On 01/25/2017 09:57 AM, Hannes Reinecke wrote: > On 01/25/2017 04:52 PM, Jens Axboe wrote: >> On

Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime

2017-01-26 Thread Dan Williams
On Thu, Jan 26, 2017 at 5:17 AM, Christoph Hellwig wrote: > On Thu, Jan 26, 2017 at 11:06:53AM +0100, Jan Kara wrote: >> Yeah, so my patches (and I suspect your as well), have a problem when the >> backing_device_info stays around because blkdev inode still exists, device >> gets

[PATCH] lightnvm: free properly on target creation error

2017-01-26 Thread Javier González
Fix a memory leak when target creation fails. More specifically, free the entire device structure given to the target (tgt_dev). Signed-off-by: Javier González --- drivers/lightnvm/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

Re: [PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-01-26 Thread Kirill A. Shutemov
On Thu, Jan 26, 2017 at 07:44:39AM -0800, Matthew Wilcox wrote: > On Thu, Jan 26, 2017 at 02:57:48PM +0300, Kirill A. Shutemov wrote: > > For filesystems that wants to be write-notified (has mkwrite), we will > > encount write-protection faults for huge PMDs in shared mappings. > > > > The

Re: [PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-01-26 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:48PM +0300, Kirill A. Shutemov wrote: > For filesystems that wants to be write-notified (has mkwrite), we will > encount write-protection faults for huge PMDs in shared mappings. > > The easiest way to handle them is to clear the PMD and let it refault as > wriable.

Re: [PATCHv6 02/37] Revert "radix-tree: implement radix_tree_maybe_preload_order()"

2017-01-26 Thread Matthew Wilcox
On Thu, Jan 26, 2017 at 02:57:44PM +0300, Kirill A. Shutemov wrote: > This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8. > > After conversion of huge tmpfs to multi-order entries, we don't need > this anymore. Yay! Reviewed-by: Matthew Wilcox -- To

[PATCH 3/6] mmc: core: rename mmc_start_req() to *areq()

2017-01-26 Thread Linus Walleij
With the coexisting __mmc_start_request(), mmc_start_request() and __mmc_start_req() it is a bit confusing that mmc_start_req() actually does not start a normal request, but an asynchronous request. Rename it to mmc_start_areq() to make it explicit what the function is doing, also fix the

[PATCH 2/6] mmc: block: rename rqc and req

2017-01-26 Thread Linus Walleij
In the function mmc_blk_issue_rw_rq() the new request coming in from the block layer is called "rqc" and the old request that was potentially just returned back from the asynchronous mechanism is called "req". This is really confusing when trying to analyze and understand the code, it becomes a

[PATCH 6/6] mmc: queue: turn queue flags into bools

2017-01-26 Thread Linus Walleij
Instead of masking and setting two bits in the "flags" field for the mmc_queue, just use two bools named "suspended" and "new_request". The masking and setting would likely have race conditions anyways, it is better to use a simple member like this. Signed-off-by: Linus Walleij

[PATCH 5/6] mmc: block: rename mmc_active to areq

2017-01-26 Thread Linus Walleij
The mmc_active member of struct mmc_queue_req has a very confusing name: this is certainly not always "active", it is the asynchronous request associated by the mmc_queue_req but it is not guaranteed to be "active" in any sense, such as being running on the host. Simply rename this member to

[PATCH 1/6] mmc: block: inline the command abort and start new goto:s

2017-01-26 Thread Linus Walleij
The goto statements sprinkled over the mmc_blk_issue_rw_rq() function has grown over the years and makes the code pretty hard to read. Inline the calls such that: goto cmd_abort; -> mmc_blk_rw_cmd_abort(card, req); mmc_blk_rw_start_new(mq, card, rqc); return; goto start_new_req; ->

[PATCH 4/6] mmc: block: refactor mmc_blk_rw_try_restart()

2017-01-26 Thread Linus Walleij
The mmc_blk_rw_start_new() was named after the label inside mmc_blk_issue_rw_rq() but is really a confusing name for this function: what it does is to try to restart the latest issued command on the host and card of the current MMC queue. So rename it mmc_blk_rw_try_restart() that reflects what

Re: [PATCH 1/6] mmc: block: break out mmc_blk_rw_cmd_abort()

2017-01-26 Thread Linus Walleij
On Wed, Jan 25, 2017 at 10:23 AM, Mateusz Nowak wrote: > On 1/24/2017 11:17, Linus Walleij wrote: >> >> As a first step toward breaking apart the very complex function >> mmc_blk_issue_rw_rq() we break out the command abort code. >> This code assumes "ret" is != 0

Re: [PATCHSET v4] blk-mq-scheduling framework

2017-01-26 Thread Paolo Valente
> Il giorno 25 gen 2017, alle ore 17:13, Jens Axboe ha scritto: > > On 01/25/2017 01:46 AM, Paolo Valente wrote: >> >>> Il giorno 23 gen 2017, alle ore 18:42, Jens Axboe ha scritto: >>> >>> On 01/23/2017 10:04 AM, Paolo Valente wrote: > Il giorno 18 gen

Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-26 Thread Christoph Hellwig
On Thu, Jan 26, 2017 at 10:17:58AM +0100, Greg KH wrote: > Ok, but do you feel the "loop method" of using a char device node to > create/control these devices is a good model to follow for new devices > like ndb? Yes. We've done the same for NVMe over fabrics. -- To unsubscribe from this list:

[PATCHv6 04/37] mm, rmap: account file thp pages

2017-01-26 Thread Kirill A. Shutemov
Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map file THP. Signed-off-by: Kirill A. Shutemov --- drivers/base/node.c| 6 ++ fs/proc/meminfo.c | 4 fs/proc/task_mmu.c

[PATCHv6 22/37] mm, hugetlb: switch hugetlbfs to multi-order radix-tree entries

2017-01-26 Thread Kirill A. Shutemov
From: Naoya Horiguchi Currently, hugetlb pages are linked to page cache on the basis of hugepage offset (derived from vma_hugecache_offset()) for historical reason, which doesn't match to the generic usage of page cache and requires some routines to covert page offset

[PATCHv6 09/37] filemap: allocate huge page in pagecache_get_page(), if allowed

2017-01-26 Thread Kirill A. Shutemov
Write path allocate pages using pagecache_get_page(). We should be able to allocate huge pages there, if it's allowed. As usually, fallback to small pages, if failed. Signed-off-by: Kirill A. Shutemov --- mm/filemap.c | 17 +++-- 1 file changed, 15

[PATCHv6 25/37] ext4: make ext4_writepage() work on huge pages

2017-01-26 Thread Kirill A. Shutemov
Change ext4_writepage() and underlying ext4_bio_write_page(). It basically removes assumption on page size, infer it from struct page instead. Signed-off-by: Kirill A. Shutemov --- fs/ext4/inode.c | 10 +- fs/ext4/page-io.c | 11 +-- 2 files

[PATCHv6 07/37] filemap: allocate huge page in page_cache_read(), if allowed

2017-01-26 Thread Kirill A. Shutemov
This patch adds basic functionality to put huge page into page cache. At the moment we only put huge pages into radix-tree if the range covered by the huge page is empty. We ignore shadow entires for now, just remove them from the tree before inserting huge page. Later we can add logic to

[PATCHv6 10/37] filemap: handle huge pages in filemap_fdatawait_range()

2017-01-26 Thread Kirill A. Shutemov
We writeback whole huge page a time. Signed-off-by: Kirill A. Shutemov --- mm/filemap.c | 5 + 1 file changed, 5 insertions(+) diff --git a/mm/filemap.c b/mm/filemap.c index 4e398d5e4134..f5cd654b3662 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -405,9

[PATCHv6 16/37] thp: make thp_get_unmapped_area() respect S_HUGE_MODE

2017-01-26 Thread Kirill A. Shutemov
We want mmap(NULL) to return PMD-aligned address if the inode can have huge pages in page cache. Signed-off-by: Kirill A. Shutemov --- mm/huge_memory.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c

[PATCHv6 26/37] ext4: handle huge pages in ext4_page_mkwrite()

2017-01-26 Thread Kirill A. Shutemov
Trivial: remove assumption on page size. Signed-off-by: Kirill A. Shutemov --- fs/ext4/inode.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 8d1b5e63cb15..a25be1cf4506 100644 ---

[PATCHv6 15/37] thp: do not threat slab pages as huge in hpage_{nr_pages,size,mask}

2017-01-26 Thread Kirill A. Shutemov
Slab pages can be compound, but we shouldn't threat them as THP for pupose of hpage_* helpers, otherwise it would lead to confusing results. For instance, ext4 uses slab pages for journal pages and we shouldn't confuse them with THPs. The easiest way is to exclude them in hpage_* helpers.

[PATCHv6 11/37] HACK: readahead: alloc huge pages, if allowed

2017-01-26 Thread Kirill A. Shutemov
Most page cache allocation happens via readahead (sync or async), so if we want to have significant number of huge pages in page cache we need to find a ways to allocate them from readahead. Unfortunately, huge pages doesn't fit into current readahead design: 128 max readahead window, assumption

[PATCHv6 08/37] filemap: handle huge pages in do_generic_file_read()

2017-01-26 Thread Kirill A. Shutemov
Most of work happans on head page. Only when we need to do copy data to userspace we find relevant subpage. We are still limited by PAGE_SIZE per iteration. Lifting this limitation would require some more work. Signed-off-by: Kirill A. Shutemov --- mm/filemap.c

[PATCHv6 17/37] fs: make block_read_full_page() be able to read huge page

2017-01-26 Thread Kirill A. Shutemov
The approach is straight-forward: for compound pages we read out whole huge page. For huge page we cannot have array of buffer head pointers on stack -- it's 4096 pointers on x86-64 -- 'arr' is allocated with kmalloc() for huge pages. Signed-off-by: Kirill A. Shutemov

[PATCHv6 20/37] truncate: make truncate_inode_pages_range() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
As with shmem_undo_range(), truncate_inode_pages_range() removes huge pages, if it fully within range. Partial truncate of huge pages zero out this part of THP. Unlike with shmem, it doesn't prevent us having holes in the middle of huge page we still can skip writeback not touched buffers. With

[PATCHv6 23/37] mm: account huge pages to dirty, writaback, reclaimable, etc.

2017-01-26 Thread Kirill A. Shutemov
We need to account huge pages according to its size to get background writaback work properly. Signed-off-by: Kirill A. Shutemov --- fs/fs-writeback.c | 10 +++--- include/linux/backing-dev.h | 10 ++ include/linux/memcontrol.h | 22 ++---

[PATCHv6 29/37] ext4: handle huge pages in ext4_da_write_end()

2017-01-26 Thread Kirill A. Shutemov
Call ext4_da_should_update_i_disksize() for head page with offset relative to head page. Signed-off-by: Kirill A. Shutemov --- fs/ext4/inode.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index

[PATCHv6 35/37] ext4: reserve larger jounral transaction for huge pages

2017-01-26 Thread Kirill A. Shutemov
If huge pages enabled, in worst case with 2048 blocks underlying a page, each possibly in a different block group we have much more metadata to commit. Let's update estimates accordingly. I was not able to trigger bad situation without the patch as it's hard to construct very fragmented

[PATCHv6 19/37] fs: make block_page_mkwrite() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
Adjust check on whether part of the page beyond file size and apply compound_head() and page_mapping() where appropriate. Signed-off-by: Kirill A. Shutemov --- fs/buffer.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/buffer.c

[PATCHv6 37/37] ext4, vfs: add huge= mount option

2017-01-26 Thread Kirill A. Shutemov
The same four values as in tmpfs case. Encyption code is not yet ready to handle huge page, so we disable huge pages support if the inode has EXT4_INODE_ENCRYPT. Signed-off-by: Kirill A. Shutemov --- fs/ext4/ext4.h | 5 + fs/ext4/inode.c | 32

[PATCHv6 13/37] mm: make write_cache_pages() work on huge pages

2017-01-26 Thread Kirill A. Shutemov
We writeback whole huge page a time. Let's adjust iteration this way. Signed-off-by: Kirill A. Shutemov --- include/linux/mm.h | 1 + include/linux/pagemap.h | 1 + mm/page-writeback.c | 17 - 3 files changed, 14 insertions(+), 5

[PATCHv6 36/37] mm, fs, ext4: expand use of page_mapping() and page_to_pgoff()

2017-01-26 Thread Kirill A. Shutemov
With huge pages in page cache we see tail pages in more code paths. This patch replaces direct access to struct page fields with macros which can handle tail pages properly. Signed-off-by: Kirill A. Shutemov --- fs/buffer.c | 2 +- fs/ext4/inode.c |

[PATCHv6 33/37] ext4: fix SEEK_DATA/SEEK_HOLE for huge pages

2017-01-26 Thread Kirill A. Shutemov
ext4_find_unwritten_pgoff() needs few tweaks to work with huge pages. Mostly trivial page_mapping()/page_to_pgoff() and adjustment to how we find relevant block. Signe-off-by: Kirill A. Shutemov --- fs/ext4/file.c | 18 ++ 1 file changed, 14

[PATCHv6 01/37] mm, shmem: swich huge tmpfs to multi-order radix-tree entries

2017-01-26 Thread Kirill A. Shutemov
We would need to use multi-order radix-tree entires for ext4 and other filesystems to have coherent view on tags (dirty/towrite) in the tree. This patch converts huge tmpfs implementation to multi-order entries, so we will be able to use the same code patch for all filesystems. We also change

[PATCHv6 28/37] ext4: make ext4_block_write_begin() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
It simply matches changes to __block_write_begin_int(). Signed-off-by: Kirill A. Shutemov --- fs/ext4/inode.c | 35 +-- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index

[PATCHv6 14/37] thp: introduce hpage_size() and hpage_mask()

2017-01-26 Thread Kirill A. Shutemov
Introduce new helpers which return size/mask of the page: HPAGE_PMD_SIZE/HPAGE_PMD_MASK if the page is PageTransHuge() and PAGE_SIZE/PAGE_MASK otherwise. Signed-off-by: Kirill A. Shutemov --- include/linux/huge_mm.h | 16 1 file changed, 16

[PATCHv6 30/37] ext4: make ext4_da_page_release_reservation() aware about huge pages

2017-01-26 Thread Kirill A. Shutemov
For huge pages 'stop' must be within HPAGE_PMD_SIZE. Let's use hpage_size() in the BUG_ON(). We also need to change how we calculate lblk for cluster deallocation. Signed-off-by: Kirill A. Shutemov --- fs/ext4/inode.c | 5 +++-- 1 file changed, 3 insertions(+),

[PATCHv6 32/37] ext4: make EXT4_IOC_MOVE_EXT work with huge pages

2017-01-26 Thread Kirill A. Shutemov
Adjust how we find relevant block within page and how we clear the required part of the page. Signed-off-by: Kirill A. Shutemov --- fs/ext4/move_extent.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/ext4/move_extent.c

[PATCHv6 34/37] ext4: make fallocate() operations work with huge pages

2017-01-26 Thread Kirill A. Shutemov
__ext4_block_zero_page_range() adjusted to calculate starting iblock correctry for huge pages. ext4_{collapse,insert}_range() requires page cache invalidation. We need the invalidation to be aligning to huge page border if huge pages are possible in page cache. Signed-off-by: Kirill A. Shutemov

[PATCHv6 06/37] thp: handle write-protection faults for file THP

2017-01-26 Thread Kirill A. Shutemov
For filesystems that wants to be write-notified (has mkwrite), we will encount write-protection faults for huge PMDs in shared mappings. The easiest way to handle them is to clear the PMD and let it refault as wriable. Signed-off-by: Kirill A. Shutemov

[PATCHv6 00/37] ext4: support of huge pages

2017-01-26 Thread Kirill A. Shutemov
Here's respin of my huge ext4 patchset on top v4.10-rc/5 + my recent patchset that fixes rmap-related THP bugs. That patchset also bring required for huge-ext4 page_mkclean() changes. Please review and consider applying. I don't see any xfstests regressions with huge pages enabled. Patch with

[PATCHv6 18/37] fs: make block_write_{begin,end}() be able to handle huge pages

2017-01-26 Thread Kirill A. Shutemov
It's more or less straight-forward. Most changes are around getting offset/len withing page right and zero out desired part of the page. Signed-off-by: Kirill A. Shutemov --- fs/buffer.c | 70 +++-- 1 file

[PATCH 3/3] lightnvm: Add CRC read error

2017-01-26 Thread Javier González
Let the host differentiate between a read error and a CRC check error on the device side. Signed-off-by: Javier González --- include/linux/lightnvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/lightnvm.h b/include/linux/lightnvm.h index

[PATCH 1/3] lightnvm: submit erases using the I/O path

2017-01-26 Thread Javier González
Until now erases has been submitted as synchronous commands through a dedicated erase function. In order to allow targets implementing asynchronous erases, refactor the erase path so that it uses the normal async I/O submission path. If a target requires sync I/O, it can implement it internally.

Re: [RFC PATCH v2 0/2] block: fix backing_dev_info lifetime

2017-01-26 Thread Jan Kara
On Wed 25-01-17 13:43:58, Dan Williams wrote: > On Mon, Jan 23, 2017 at 1:17 PM, Thiago Jung Bauermann > wrote: > > Hello Dan, > > > > Am Freitag, 6. Januar 2017, 17:02:51 BRST schrieb Dan Williams: > >> v1 of these changes [1] was a one line change to

Re: [Nbd] [PATCH 4/4] nbd: add a nbd-control interface

2017-01-26 Thread Christoph Hellwig
On Wed, Jan 25, 2017 at 03:36:20PM -0600, Eric Blake wrote: > How do you get an fd to existing nbd block device? Your intent is to > use an ioctl to request creating/opening a new nbd device that no one > else is using; opening an existing device in order to send that ioctl > may have negative

Re: [PATCH 0/6] mmc: block: command issue cleanups

2017-01-26 Thread Ulf Hansson
On 24 January 2017 at 11:17, Linus Walleij wrote: > The function mmc_blk_issue_rw_rq() is hopelessly convoluted and > need to be refactored to it can be understood by humans. > > In the process I found some weird magic return values passed > around for no good reason. >