Make it possible to check whether or not a block layer queue has
been stopped. Make it possible to start and to run a blk-mq queue
from user space.
Signed-off-by: Bart Van Assche
Cc: Omar Sandoval
Cc: Hannes Reinecke
---
Changes
Hello Jens,
The three patches in this patch series fix the queue lockup I reported
a few days ago on the linux-block mailing list. Please consider these
patches for kernel v4.11.
Thanks,
Bart.
Bart Van Assche (3):
blk-mq: Introduce blk_mq_ops.restart_queues
scsi: Add scsi_restart_queues()
commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. blk_mq_start_stopped_hw_queues()
only runs queues that had been stopped. Hence change the
blk_mq_start_stopped_hw_queues()
If a tag set is shared among multiple request queues, leave
it to the block driver to restart queues. Hence remove
QUEUE_FLAG_RESTART and introduce blk_mq_ops.restart_queues.
Remove blk_mq_sched_mark_restart_queue() because this
function has no callers.
Signed-off-by: Bart Van Assche
This patch avoids that if multiple SCSI devices are associated with
a SCSI host that a queue can get stuck if scsi_queue_rq() returns
"busy".
Signed-off-by: Bart Van Assche
Cc: Martin K. Petersen
Cc: James Bottomley
Hi Dmitry,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc4]
[cannot apply to block/for-next next-20170331]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com/0day-ci/linux/commits/Dmitry
Hi Dmitry,
[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc4]
[cannot apply to block/for-next next-20170331]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system]
url:
https://github.com/0day-ci/linux/commits/Dmitry
On Fri, Mar 31, 2017 at 01:48:35PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval
>
> blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
> behavior that drivers expect. However, commit 4e68a011428a changed
> blk_mq_queue_reinit() to not remap queues for
From: Omar Sandoval
blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011428a changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making
On Fri, Mar 31, 2017 at 01:43:41PM -0700, Omar Sandoval wrote:
> @@ -2634,6 +2640,7 @@ void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set
> *set, int nr_hw_queues)
>
> list_for_each_entry(q, >tag_list, tag_set_list)
> blk_mq_unfreeze_queue(q);
> +
Stupid whitespace
From: Omar Sandoval
blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011428a changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making
On Fri, Mar 31, 2017 at 01:30:15PM -0700, Omar Sandoval wrote:
> On Fri, Mar 31, 2017 at 04:30:44PM -0400, Keith Busch wrote:
> > On Fri, Mar 31, 2017 at 11:59:24AM -0700, Omar Sandoval wrote:
> > > @@ -2629,11 +2639,12 @@ void blk_mq_update_nr_hw_queues(struct
> > > blk_mq_tag_set *set, int
On Fri, Mar 31, 2017 at 04:30:44PM -0400, Keith Busch wrote:
> On Fri, Mar 31, 2017 at 11:59:24AM -0700, Omar Sandoval wrote:
> > @@ -2629,11 +2639,12 @@ void blk_mq_update_nr_hw_queues(struct
> > blk_mq_tag_set *set, int nr_hw_queues)
> > set->nr_hw_queues = nr_hw_queues;
> >
Signed-off-by: Christoph Hellwig
---
drivers/scsi/sd.c | 31 ++-
drivers/scsi/sd_zbc.c | 1 +
2 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b853f91fb3da..d8d9c0bdd93c 100644
---
Fix up do_region to not allocate a bio_vec for discards. We've
got rid of the discard payload allocated by the caller years ago.
Obviously this wasn't actually harmful given how long it's been
there, but it's still good to avoid the pointless allocation.
Signed-off-by: Christoph Hellwig
This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
and prepare for removing the discard_zeroes_data flag.
Also remove a pointless discard support check, which is done in
blkdev_issue_discard already.
Signed-off-by: Christoph Hellwig
---
fs/block_dev.c | 10
rbd only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig
---
drivers/block/rbd.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index
Try to use a write same with unmap bit variant if the device supports it
and the caller allows for it.
Signed-off-by: Christoph Hellwig
---
drivers/scsi/sd.c | 9 +
1 file changed, 9 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index
If this flag is set logical provisioning capable device should
release space for the zeroed blocks if possible, if it is not set
devices should keep the blocks anchored.
Also remove an out of sync kerneldoc comment for a static function
that would have become even more out of data with this
mmc only supports discarding on large alignments, so the zeroing code
would always fall back to explicit writings of zeroes.
Signed-off-by: Christoph Hellwig
---
drivers/mmc/core/queue.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/mmc/core/queue.c
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.
Signed-off-by: Christoph Hellwig
---
Documentation/ABI/testing/sysfs-block | 10 ++-
Documentation/block/queue-sysfs.txt | 5
block/blk-lib.c | 7 +
drbd always wants its discard wire operations to zero the blocks, so
use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
reinventing it poorly.
Signed-off-by: Christoph Hellwig
---
drivers/block/drbd/drbd_debugfs.c | 3 --
drivers/block/drbd/drbd_int.h | 6
Now that we have REQ_OP_WRITE_ZEROES implemented for all devices that
support efficient zeroing of devices we can remove the call to
blkdev_issue_discard. This means we only have two ways of zeroing left
and can simply the code.
Signed-off-by: Christoph Hellwig
---
block/blk-lib.c
This avoids fallbacks to explicit zeroing in (__)blkdev_issue_zeroout if
the caller doesn't want them.
Also clean up the convoluted check for the return condition that this
new flag is added to.
Signed-off-by: Christoph Hellwig
---
block/blk-lib.c| 5 -
Just the same as discard if the block size equals the system page size.
Signed-off-by: Christoph Hellwig
---
drivers/block/zram/zram_drv.c | 13 -
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
It seems like DRBD assumes its on the wire TRIM request always zeroes data.
Use that fact to implement REQ_OP_WRITE_ZEROES.
Signed-off-by: Christoph Hellwig
---
drivers/block/drbd/drbd_main.c | 3 ++-
drivers/block/drbd/drbd_nl.c | 2 ++
drivers/block/drbd/drbd_receiver.c
But now for the real NVMe Write Zeroes yet, just to get rid of the
discard abuse for zeroing. Also rename the quirk flag to be a bit
more self-explanatory.
Signed-off-by: Christoph Hellwig
---
drivers/nvme/host/core.c | 10 +-
drivers/nvme/host/nvme.h | 6 +++---
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig
---
drivers/md/dm-core.h | 1 +
drivers/md/dm-io.c| 8 ++--
drivers/md/dm-linear.c| 1 +
drivers/md/dm-mpath.c | 1 +
drivers/md/dm-rq.c| 11
Copy & paste from the REQ_OP_WRITE_SAME code.
Signed-off-by: Christoph Hellwig
---
drivers/md/linear.c| 1 +
drivers/md/md.h| 7 +++
drivers/md/multipath.c | 1 +
drivers/md/raid0.c | 2 ++
drivers/md/raid1.c | 4 +++-
drivers/md/raid10.c| 1 +
Turn the existin discard flag into a new BLKDEV_ZERO_UNMAP flag with
similar semantics, but without referring to diѕcard.
Signed-off-by: Christoph Hellwig
---
block/blk-lib.c| 31 ++-
block/ioctl.c | 2 +-
We'll always use the WRITE ZEROES code for zeroing now.
Signed-off-by: Christoph Hellwig
---
block/blk-lib.c | 4
1 file changed, 4 deletions(-)
diff --git a/block/blk-lib.c b/block/blk-lib.c
index e5b853f2b8a2..2a8d638544a7 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
It seems like the code currently passes whatever it was using for writes
to WRITE SAME. Just switch it to WRITE ZEROES, although that doesn't
need any payload.
Untested, and confused by the code, maybe someone who understands it
better than me can help..
Not-yet-signed-off-by: Christoph Hellwig
Make life easy for implementations that needs to send a data buffer
to the device (e.g. SCSI) by numbering it as a data out command.
Signed-off-by: Christoph Hellwig
---
include/linux/blk_types.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
Copy and past the REQ_OP_WRITE_SAME code to prepare to implementations
that limit the write zeroes size.
Signed-off-by: Christoph Hellwig
---
block/blk-merge.c | 17 +++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/block/blk-merge.c
Split sd_setup_discard_cmnd into one function per provisioning type. While
this creates some very slight duplication of boilerplate code it keeps the
code modular for additions of new provisioning types, and for reusing the
write same functions for the upcoming scsi implementation of the Write
This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
supported by the block layer, and switches existing implementations
of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
removes incorrect discard_zeroes_data, and also switches WRITE SAME
based zeroing in SCSI to this new
On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote:
> -static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue
> *bfqq,
> - bool compensate)
> +static bool bfq_bfqq_is_slow(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> +
On Fri, 2017-03-31 at 14:47 +0200, Paolo Valente wrote:
> + entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
> + GFP_ATOMIC);
> + entity->weight_counter->weight = entity->weight;
GFP_ATOMIC allocations are more likely to fail
> Il giorno 06 mar 2017, alle ore 08:43, Markus Trippelsdorf
> ha scritto:
>
> On 2017.03.04 at 17:01 +0100, Paolo Valente wrote:
>> Hi,
>> at last, here is my first patch series meant for merging. It adds BFQ
>> to blk-mq. Don't worry, in this message I won't bore you
The feedback-loop algorithm used by BFQ to compute queue (process)
budgets is basically a set of three update rules, one for each of the
main reasons why a queue may be expired. If many processes suddenly
switch from sporadic I/O to greedy and sequential I/O, then these
rules are quite slow to
Hi,
with respect to the previous submission [1], these new patch series:
- contains all the changes suggested by Jens and Bart [1], apart from
those for which I raised doubts that either have been acknowledged,
or have not received a reply yet (I will of course apply also the
latter changes
To guarantee a low latency also to the I/O requests issued by soft
real-time applications, this patch introduces a further heuristic,
which weight-raises (in the sense explained in the previous patch)
also the queues associated to applications deemed as soft real-time.
To be deemed as soft
Unless the maximum budget B_max that BFQ can assign to a queue is set
explicitly by the user, BFQ automatically updates B_max. In
particular, BFQ dynamically sets B_max to the number of sectors that
can be read, at the current estimated peak rate, during the maximum
time, T_max, allowed before a
This patch deals with two sources of unfairness, which can also cause
high latencies and throughput loss. The first source is related to
write requests. Write requests tend to starve read requests, basically
because, on one side, writes are slower than reads, whereas, on the
other side, storage
I/O schedulers typically allow NCQ-capable drives to prefetch I/O
requests, as NCQ boosts the throughput exactly by prefetching and
internally reordering requests.
Unfortunately, as discussed in detail and shown experimentally in [1],
this may cause fairness and latency guarantees to be violated.
This patch introduces an heuristic that reduces latency when the
I/O-request pool is saturated. This goal is achieved by disabling
device idling, for non-weight-raised queues, when there are weight-
raised queues with pending or in-flight requests. In fact, as
explained in more detail in the
From: Arianna Avanzini
A set of processes may happen to perform interleaved reads, i.e.,
read requests whose union would give rise to a sequential read pattern.
There are two typical cases: first, processes reading fixed-size chunks
of data at a fixed distance from
From: Arianna Avanzini
A seeky queue (i..e, a queue containing random requests) is assigned a
very small device-idling slice, for throughput issues. Unfortunately,
given the process associated with a seeky queue, this behavior causes
the following problem: if the
This patch boosts the throughput on NCQ-capable flash-based devices,
while still preserving latency guarantees for interactive and soft
real-time applications. The throughput is boosted by just not idling
the device when the in-service queue remains empty, even if the queue
is sync and has a
This patch is basically the counterpart, for NCQ-capable rotational
devices, of the previous patch. Exactly as the previous patch does on
flash-based devices and for any workload, this patch disables device
idling on rotational devices, but only for random I/O. In fact, only
with these queues
From: Arianna Avanzini
Many popular I/O-intensive services or applications spawn or
reactivate many parallel threads/processes during short time
intervals. Examples are systemd during boot or git grep. These
services or applications benefit mostly from a high
When a bfq queue is set in service and when it is merged, a reference
to the I/O context associated with the queue is taken. This reference
is then released when the queue is deselected from service or
split. More precisely, the release of the reference is postponed to
when the scheduler lock is
On Thu, Mar 30, 2017 at 6:39 PM, Ulf Hansson wrote:
> On 30 March 2017 at 14:42, Arnd Bergmann wrote:
>> On Wed, Mar 29, 2017 at 5:09 AM, Linus Walleij
>> wrote:
>>> In MQ, I have simply locked the host on the first request and
On Thu, Mar 30, 2017 at 9:49 PM, Dmitry Monakhov wrote:
> If some one try to attempt advance bvec beyond it's size we simply
> dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
> This simply means that we endup dereferencing/corrupting random memory
>
On Thu, Mar 30, 2017 at 10:19:55PM -0400, Martin K. Petersen wrote:
> > If you manually change the provisioning mode to WS10 on a device that
> > must use WRITE SAME (16) to be able to address all blocks you're already
> > screwed right now, and with this patch you can screw yourself through
> >
On Thu, Mar 30, 2017 at 07:15:50PM -0400, Mike Snitzer wrote:
> I got pretty far along with implementing the DM thinp support for
> WRITE_ZEROES in terms of thinp's DISCARD support (more of an
> implementation detail.. or so I thought).
>
> But while discussing this effort with Jeff Moyer he
56 matches
Mail list logo