[PATCH v4 1/2] blk-mq: Export queue state through /sys/kernel/debug/block/*/state

2017-03-31 Thread Bart Van Assche
Make it possible to check whether or not a block layer queue has been stopped. Make it possible to start and to run a blk-mq queue from user space. Signed-off-by: Bart Van Assche Cc: Omar Sandoval Cc: Hannes Reinecke --- Changes

[PATCH 0/3] Avoid that scsi-mq queue processing stalls

2017-03-31 Thread Bart Van Assche
Hello Jens, The three patches in this patch series fix the queue lockup I reported a few days ago on the linux-block mailing list. Please consider these patches for kernel v4.11. Thanks, Bart. Bart Van Assche (3): blk-mq: Introduce blk_mq_ops.restart_queues scsi: Add scsi_restart_queues()

[PATCH 3/3] scsi: Ensure that scsi_run_queue() runs all hardware queues

2017-03-31 Thread Bart Van Assche
commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq() for the BLK_MQ_RQ_QUEUE_BUSY case. blk_mq_start_stopped_hw_queues() only runs queues that had been stopped. Hence change the blk_mq_start_stopped_hw_queues()

[PATCH 1/3] blk-mq: Introduce blk_mq_ops.restart_queues

2017-03-31 Thread Bart Van Assche
If a tag set is shared among multiple request queues, leave it to the block driver to restart queues. Hence remove QUEUE_FLAG_RESTART and introduce blk_mq_ops.restart_queues. Remove blk_mq_sched_mark_restart_queue() because this function has no callers. Signed-off-by: Bart Van Assche

[PATCH 2/3] scsi: Add scsi_restart_queues()

2017-03-31 Thread Bart Van Assche
This patch avoids that if multiple SCSI devices are associated with a SCSI host that a queue can get stuck if scsi_queue_rq() returns "busy". Signed-off-by: Bart Van Assche Cc: Martin K. Petersen Cc: James Bottomley

Re: [PATCH 6/8] bio-integrity: add bio_integrity_setup helper

2017-03-31 Thread kbuild test robot
Hi Dmitry, [auto build test ERROR on linus/master] [also build test ERROR on v4.11-rc4] [cannot apply to block/for-next next-20170331] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Dmitry

Re: [PATCH 7/8] T10: Move opencoded contants to common header

2017-03-31 Thread kbuild test robot
Hi Dmitry, [auto build test ERROR on linus/master] [also build test ERROR on v4.11-rc4] [cannot apply to block/for-next next-20170331] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Dmitry

Re: [PATCH v3] blk-mq: remap queues when adding/removing hardware queues

2017-03-31 Thread Keith Busch
On Fri, Mar 31, 2017 at 01:48:35PM -0700, Omar Sandoval wrote: > From: Omar Sandoval > > blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the > behavior that drivers expect. However, commit 4e68a011428a changed > blk_mq_queue_reinit() to not remap queues for

[PATCH v3] blk-mq: remap queues when adding/removing hardware queues

2017-03-31 Thread Omar Sandoval
From: Omar Sandoval blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the behavior that drivers expect. However, commit 4e68a011428a changed blk_mq_queue_reinit() to not remap queues for the case of CPU hotplugging, inadvertently making

Re: [PATCH v2] blk-mq: remap queues when adding/removing hardware queues

2017-03-31 Thread Omar Sandoval
On Fri, Mar 31, 2017 at 01:43:41PM -0700, Omar Sandoval wrote: > @@ -2634,6 +2640,7 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set > *set, int nr_hw_queues) > > list_for_each_entry(q, >tag_list, tag_set_list) > blk_mq_unfreeze_queue(q); > + Stupid whitespace

[PATCH v2] blk-mq: remap queues when adding/removing hardware queues

2017-03-31 Thread Omar Sandoval
From: Omar Sandoval blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the behavior that drivers expect. However, commit 4e68a011428a changed blk_mq_queue_reinit() to not remap queues for the case of CPU hotplugging, inadvertently making

Re: [PATCH] blk-mq: remap queues when adding/removing hardware queues

2017-03-31 Thread Keith Busch
On Fri, Mar 31, 2017 at 01:30:15PM -0700, Omar Sandoval wrote: > On Fri, Mar 31, 2017 at 04:30:44PM -0400, Keith Busch wrote: > > On Fri, Mar 31, 2017 at 11:59:24AM -0700, Omar Sandoval wrote: > > > @@ -2629,11 +2639,12 @@ void blk_mq_update_nr_hw_queues(struct > > > blk_mq_tag_set *set, int

Re: [PATCH] blk-mq: remap queues when adding/removing hardware queues

2017-03-31 Thread Omar Sandoval
On Fri, Mar 31, 2017 at 04:30:44PM -0400, Keith Busch wrote: > On Fri, Mar 31, 2017 at 11:59:24AM -0700, Omar Sandoval wrote: > > @@ -2629,11 +2639,12 @@ void blk_mq_update_nr_hw_queues(struct > > blk_mq_tag_set *set, int nr_hw_queues) > > set->nr_hw_queues = nr_hw_queues; > >

[PATCH 04/25] sd: implement REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 31 ++- drivers/scsi/sd_zbc.c | 1 + 2 files changed, 27 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index b853f91fb3da..d8d9c0bdd93c 100644 ---

[PATCH 06/25] dm io: discards don't take a payload

2017-03-31 Thread Christoph Hellwig
Fix up do_region to not allocate a bio_vec for discards. We've got rid of the discard payload allocated by the caller years ago. Obviously this wasn't actually harmful given how long it's been there, but it's still good to avoid the pointless allocation. Signed-off-by: Christoph Hellwig

[PATCH 13/25] block_dev: use blkdev_issue_zerout for hole punches

2017-03-31 Thread Christoph Hellwig
This gets us support for non-discard efficient write of zeroes (e.g. NVMe) and prepare for removing the discard_zeroes_data flag. Also remove a pointless discard support check, which is done in blkdev_issue_discard already. Signed-off-by: Christoph Hellwig --- fs/block_dev.c | 10

[PATCH 19/25] rbd: remove the discard_zeroes_data flag

2017-03-31 Thread Christoph Hellwig
rbd only supports discarding on large alignments, so the zeroing code would always fall back to explicit writings of zeroes. Signed-off-by: Christoph Hellwig --- drivers/block/rbd.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index

[PATCH 14/25] sd: implement unmapping Write Zeroes

2017-03-31 Thread Christoph Hellwig
Try to use a write same with unmap bit variant if the device supports it and the caller allows for it. Signed-off-by: Christoph Hellwig --- drivers/scsi/sd.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index

[PATCH 11/25] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
If this flag is set logical provisioning capable device should release space for the zeroed blocks if possible, if it is not set devices should keep the blocks anchored. Also remove an out of sync kerneldoc comment for a static function that would have become even more out of data with this

[PATCH 21/25] mmc: remove the discard_zeroes_data flag

2017-03-31 Thread Christoph Hellwig
mmc only supports discarding on large alignments, so the zeroing code would always fall back to explicit writings of zeroes. Signed-off-by: Christoph Hellwig --- drivers/mmc/core/queue.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/mmc/core/queue.c

[PATCH 25/25] block: remove the discard_zeroes_data flag

2017-03-31 Thread Christoph Hellwig
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can kill this hack. Signed-off-by: Christoph Hellwig --- Documentation/ABI/testing/sysfs-block | 10 ++- Documentation/block/queue-sysfs.txt | 5 block/blk-lib.c | 7 +

[PATCH 23/25] drbd: make intelligent use of blkdev_issue_zeroout

2017-03-31 Thread Christoph Hellwig
drbd always wants its discard wire operations to zero the blocks, so use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of reinventing it poorly. Signed-off-by: Christoph Hellwig --- drivers/block/drbd/drbd_debugfs.c | 3 -- drivers/block/drbd/drbd_int.h | 6

[PATCH 22/25] block: stop using discards for zeroing

2017-03-31 Thread Christoph Hellwig
Now that we have REQ_OP_WRITE_ZEROES implemented for all devices that support efficient zeroing of devices we can remove the call to blkdev_issue_discard. This means we only have two ways of zeroing left and can simply the code. Signed-off-by: Christoph Hellwig --- block/blk-lib.c

[PATCH 12/25] block: add a new BLKDEV_ZERO_NOFALLBACK flag

2017-03-31 Thread Christoph Hellwig
This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if the caller doesn't want them. Also clean up the convoluted check for the return condition that this new flag is added to. Signed-off-by: Christoph Hellwig --- block/blk-lib.c| 5 -

[PATCH 16/25] zram: implement REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
Just the same as discard if the block size equals the system page size. Signed-off-by: Christoph Hellwig --- drivers/block/zram/zram_drv.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c

[PATCH 24/25] drbd: implement REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
It seems like DRBD assumes its on the wire TRIM request always zeroes data. Use that fact to implement REQ_OP_WRITE_ZEROES. Signed-off-by: Christoph Hellwig --- drivers/block/drbd/drbd_main.c | 3 ++- drivers/block/drbd/drbd_nl.c | 2 ++ drivers/block/drbd/drbd_receiver.c

[PATCH 15/25] nvme: implement REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
But now for the real NVMe Write Zeroes yet, just to get rid of the discard abuse for zeroing. Also rename the quirk flag to be a bit more self-explanatory. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 10 +- drivers/nvme/host/nvme.h | 6 +++---

[PATCH 07/25] dm: support REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
Copy & paste from the REQ_OP_WRITE_SAME code. Signed-off-by: Christoph Hellwig --- drivers/md/dm-core.h | 1 + drivers/md/dm-io.c| 8 ++-- drivers/md/dm-linear.c| 1 + drivers/md/dm-mpath.c | 1 + drivers/md/dm-rq.c| 11

[PATCH 05/25] md: support REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
Copy & paste from the REQ_OP_WRITE_SAME code. Signed-off-by: Christoph Hellwig --- drivers/md/linear.c| 1 + drivers/md/md.h| 7 +++ drivers/md/multipath.c | 1 + drivers/md/raid0.c | 2 ++ drivers/md/raid1.c | 4 +++- drivers/md/raid10.c| 1 +

[PATCH 10/25] block: add a flags argument to (__)blkdev_issue_zeroout

2017-03-31 Thread Christoph Hellwig
Turn the existin discard flag into a new BLKDEV_ZERO_UNMAP flag with similar semantics, but without referring to diѕcard. Signed-off-by: Christoph Hellwig --- block/blk-lib.c| 31 ++- block/ioctl.c | 2 +-

[PATCH 09/25] block: stop using blkdev_issue_write_same for zeroing

2017-03-31 Thread Christoph Hellwig
We'll always use the WRITE ZEROES code for zeroing now. Signed-off-by: Christoph Hellwig --- block/blk-lib.c | 4 1 file changed, 4 deletions(-) diff --git a/block/blk-lib.c b/block/blk-lib.c index e5b853f2b8a2..2a8d638544a7 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c

[PATCH 08/25] dm kcopyd: switch to use REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
It seems like the code currently passes whatever it was using for writes to WRITE SAME. Just switch it to WRITE ZEROES, although that doesn't need any payload. Untested, and confused by the code, maybe someone who understands it better than me can help.. Not-yet-signed-off-by: Christoph Hellwig

[PATCH 02/25] block: renumber REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
Make life easy for implementations that needs to send a data buffer to the device (e.g. SCSI) by numbering it as a data out command. Signed-off-by: Christoph Hellwig --- include/linux/blk_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[PATCH 03/25] block: implement splitting of REQ_OP_WRITE_ZEROES bios

2017-03-31 Thread Christoph Hellwig
Copy and past the REQ_OP_WRITE_SAME code to prepare to implementations that limit the write zeroes size. Signed-off-by: Christoph Hellwig --- block/blk-merge.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/block/blk-merge.c

[PATCH 01/25] ѕd: split sd_setup_discard_cmnd

2017-03-31 Thread Christoph Hellwig
Split sd_setup_discard_cmnd into one function per provisioning type. While this creates some very slight duplication of boilerplate code it keeps the code modular for additions of new provisioning types, and for reusing the write same functions for the upcoming scsi implementation of the Write

always use REQ_OP_WRITE_ZEROES for zeroing offload

2017-03-31 Thread Christoph Hellwig
This series makes REQ_OP_WRITE_ZEROES the only zeroing offload supported by the block layer, and switches existing implementations of REQ_OP_DISCARD that correctly set discard_zeroes_data to it, removes incorrect discard_zeroes_data, and also switches WRITE SAME based zeroing in SCSI to this new

Re: [PATCH V2 04/16] block, bfq: modify the peak-rate estimator

2017-03-31 Thread Bart Van Assche
On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote: > -static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue > *bfqq, > -    bool compensate) > +static bool bfq_bfqq_is_slow(struct bfq_data *bfqd, struct bfq_queue *bfqq, > +   

Re: [PATCH V2 11/16] block, bfq: reduce idling only in symmetric scenarios

2017-03-31 Thread Bart Van Assche
On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote: > +   entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter), > +    GFP_ATOMIC); > +   entity->weight_counter->weight = entity->weight; GFP_ATOMIC allocations are more likely to fail

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

2017-03-31 Thread Paolo Valente
> Il giorno 06 mar 2017, alle ore 08:43, Markus Trippelsdorf > ha scritto: > > On 2017.03.04 at 17:01 +0100, Paolo Valente wrote: >> Hi, >> at last, here is my first patch series meant for merging. It adds BFQ >> to blk-mq. Don't worry, in this message I won't bore you

[PATCH V2 03/16] block, bfq: improve throughput boosting

2017-03-31 Thread Paolo Valente
The feedback-loop algorithm used by BFQ to compute queue (process) budgets is basically a set of three update rules, one for each of the main reasons why a queue may be expired. If many processes suddenly switch from sporadic I/O to greedy and sequential I/O, then these rules are quite slow to

[PATCH V2 00/16] Introduce the BFQ I/O scheduler

2017-03-31 Thread Paolo Valente
Hi, with respect to the previous submission [1], these new patch series: - contains all the changes suggested by Jens and Bart [1], apart from those for which I raised doubts that either have been acknowledged, or have not received a reply yet (I will of course apply also the latter changes

[PATCH V2 07/16] block, bfq: reduce I/O latency for soft real-time applications

2017-03-31 Thread Paolo Valente
To guarantee a low latency also to the I/O requests issued by soft real-time applications, this patch introduces a further heuristic, which weight-raises (in the sense explained in the previous patch) also the queues associated to applications deemed as soft real-time. To be deemed as soft

[PATCH V2 04/16] block, bfq: modify the peak-rate estimator

2017-03-31 Thread Paolo Valente
Unless the maximum budget B_max that BFQ can assign to a queue is set explicitly by the user, BFQ automatically updates B_max. In particular, BFQ dynamically sets B_max to the number of sectors that can be read, at the current estimated peak rate, during the maximum time, T_max, allowed before a

[PATCH V2 05/16] block, bfq: add more fairness with writes and slow processes

2017-03-31 Thread Paolo Valente
This patch deals with two sources of unfairness, which can also cause high latencies and throughput loss. The first source is related to write requests. Write requests tend to starve read requests, basically because, on one side, writes are slower than reads, whereas, on the other side, storage

[PATCH V2 08/16] block, bfq: preserve a low latency also with NCQ-capable drives

2017-03-31 Thread Paolo Valente
I/O schedulers typically allow NCQ-capable drives to prefetch I/O requests, as NCQ boosts the throughput exactly by prefetching and internally reordering requests. Unfortunately, as discussed in detail and shown experimentally in [1], this may cause fairness and latency guarantees to be violated.

[PATCH V2 09/16] block, bfq: reduce latency during request-pool saturation

2017-03-31 Thread Paolo Valente
This patch introduces an heuristic that reduces latency when the I/O-request pool is saturated. This goal is achieved by disabling device idling, for non-weight-raised queues, when there are weight- raised queues with pending or in-flight requests. In fact, as explained in more detail in the

[PATCH V2 10/16] block, bfq: add Early Queue Merge (EQM)

2017-03-31 Thread Paolo Valente
From: Arianna Avanzini A set of processes may happen to perform interleaved reads, i.e., read requests whose union would give rise to a sequential read pattern. There are two typical cases: first, processes reading fixed-size chunks of data at a fixed distance from

[PATCH V2 11/16] block, bfq: reduce idling only in symmetric scenarios

2017-03-31 Thread Paolo Valente
From: Arianna Avanzini A seeky queue (i..e, a queue containing random requests) is assigned a very small device-idling slice, for throughput issues. Unfortunately, given the process associated with a seeky queue, this behavior causes the following problem: if the

[PATCH V2 12/16] block, bfq: boost the throughput on NCQ-capable flash-based devices

2017-03-31 Thread Paolo Valente
This patch boosts the throughput on NCQ-capable flash-based devices, while still preserving latency guarantees for interactive and soft real-time applications. The throughput is boosted by just not idling the device when the in-service queue remains empty, even if the queue is sync and has a

[PATCH V2 13/16] block, bfq: boost the throughput with random I/O on NCQ-capable HDDs

2017-03-31 Thread Paolo Valente
This patch is basically the counterpart, for NCQ-capable rotational devices, of the previous patch. Exactly as the previous patch does on flash-based devices and for any workload, this patch disables device idling on rotational devices, but only for random I/O. In fact, only with these queues

[PATCH V2 14/16] block, bfq: handle bursts of queue activations

2017-03-31 Thread Paolo Valente
From: Arianna Avanzini Many popular I/O-intensive services or applications spawn or reactivate many parallel threads/processes during short time intervals. Examples are systemd during boot or git grep. These services or applications benefit mostly from a high

[PATCH V2 15/16] block, bfq: remove all get and put of I/O contexts

2017-03-31 Thread Paolo Valente
When a bfq queue is set in service and when it is merged, a reference to the I/O context associated with the queue is taken. This reference is then released when the queue is deselected from service or split. More precisely, the release of the reference is postponed to when the scheduler lock is

Re: Outstanding MQ questions from MMC

2017-03-31 Thread Arnd Bergmann
On Thu, Mar 30, 2017 at 6:39 PM, Ulf Hansson wrote: > On 30 March 2017 at 14:42, Arnd Bergmann wrote: >> On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij >> wrote: >>> In MQ, I have simply locked the host on the first request and

Re: [PATCH 1/8] Guard bvec iteration logic

2017-03-31 Thread Ming Lei
On Thu, Mar 30, 2017 at 9:49 PM, Dmitry Monakhov wrote: > If some one try to attempt advance bvec beyond it's size we simply > dump WARN_ONCE and continue to iterate beyond bvec array boundaries. > This simply means that we endup dereferencing/corrupting random memory >

Re: [PATCH 12/23] sd: handle REQ_UNMAP

2017-03-31 Thread h...@lst.de
On Thu, Mar 30, 2017 at 10:19:55PM -0400, Martin K. Petersen wrote: > > If you manually change the provisioning mode to WS10 on a device that > > must use WRITE SAME (16) to be able to address all blocks you're already > > screwed right now, and with this patch you can screw yourself through > >

Re: [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES

2017-03-31 Thread Christoph Hellwig
On Thu, Mar 30, 2017 at 07:15:50PM -0400, Mike Snitzer wrote: > I got pretty far along with implementing the DM thinp support for > WRITE_ZEROES in terms of thinp's DISCARD support (more of an > implementation detail.. or so I thought). > > But while discussing this effort with Jeff Moyer he