Re: [PATCH 3/4] blk-mq: use hw tag for scheduling if hw tag space is big enough

2017-05-01 Thread Omar Sandoval
On Mon, May 01, 2017 at 03:06:16PM +, Bart Van Assche wrote: > On Sat, 2017-04-29 at 18:35 +0800, Ming Lei wrote: > > On Fri, Apr 28, 2017 at 06:09:40PM +, Bart Van Assche wrote: > > > On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > > > > +static inline bool

[PATCH 03/13] blk: make the bioset rescue_workqueue optional.

2017-05-01 Thread NeilBrown
This patch converts bioset_create() and bioset_create_nobvec() to not create a workqueue so alloctions will never trigger punt_bios_to_rescuer(). It also introduces bioset_create_rescued() and bioset_create_nobvec_rescued() which preserve the old behaviour. All callers of bioset_create() and

[PATCH 01/13] blk: remove bio_set arg from blk_queue_split()

2017-05-01 Thread NeilBrown
blk_queue_split() is always called with the last arg being q->bio_split, where 'q' is the first arg. Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses q->bio_split. This is inconsistent and unnecessary. Remove the last arg and always use q->bio_split inside

[PATCH 04/13] blk: use non-rescuing bioset for q->bio_split.

2017-05-01 Thread NeilBrown
A rescuing bioset is only useful if there might be bios from that same bioset on the bio_list_on_stack queue at a time when bio_alloc_bioset() is called. This never applies to q->bio_split. Allocations from q->bio_split are only ever made from blk_queue_split() which is only ever called early in

[PATCH 06/13] rbd: use bio_clone_fast() instead of bio_clone()

2017-05-01 Thread NeilBrown
bio_clone() makes a copy of the bi_io_vec, but rbd never changes that, so there is no need for a copy. bio_clone_fast() can be used instead, which avoids making the copy. This requires that we provide a bio_set. bio_clone() uses fs_bio_set, but it isn't, in general, safe to use the same bio_set

[PATCH 08/13] pktcdvd: use bio_clone_fast() instead of bio_clone()

2017-05-01 Thread NeilBrown
pktcdvd doesn't change the bi_io_vec of the clone bio, so it is more efficient to use bio_clone_fast(), and not clone the bi_io_vec. This requires providing a bio_set, and it is safest to provide a dedicated bio_set rather than sharing fs_bio_set, which filesytems use. This new bio_set,

[PATCH 11/13] bcache: use kmalloc to allocate bio in bch_data_verify()

2017-05-01 Thread NeilBrown
This function allocates a bio, then a collection of pages. It copes with failure. It currently uses a mempool() to allocate the bio, but alloc_page() to allocate the pages. These fail in different ways, so the usage is inconsistent. Change the bio_clone() to bio_clone_kmalloc() so that no pool

[PATCH 12/13] block: remove bio_clone() and all references.

2017-05-01 Thread NeilBrown
bio_clone() is no longer used. Only bio_clone_bioset() or bio_clone_fast(). This is for the best, as bio_clone() used fs_bio_set, and filesystems are unlikely to want to use bio_clone(). So remove bio_clone() and all references. This includes a fix to some incorrect documentation. Reviewed-by:

[PATCH 10/13] xen-blkfront: remove bio splitting.

2017-05-01 Thread NeilBrown
bios that are re-submitted will pass through blk_queue_split() when blk_queue_bio() is called, and this will split the bio if necessary. There is no longer any need to do this splitting in xen-blkfront. Acked-by: Roger Pau Monné Signed-off-by: NeilBrown ---

[PATCH 13/13] block: don't check for BIO_MAX_PAGES in blk_bio_segment_split()

2017-05-01 Thread NeilBrown
blk_bio_segment_split() makes sure bios have no more than BIO_MAX_PAGES entries in the bi_io_vec. This was done because bio_clone_bioset() (when given a mempool bioset) could not handle larger io_vecs. No driver uses bio_clone_bioset() any more, they all use bio_clone_fast() if anything, and

[PATCH 09/13] lightnvm/pblk-read: use bio_clone_fast()

2017-05-01 Thread NeilBrown
pblk_submit_read() uses bio_clone_bioset() but doesn't change the io_vec, so bio_clone_fast() is a better choice. It also uses fs_bio_set which is intended for filesystems. Using it in a device driver can deadlock. So allocate a new bioset, and and use bio_clone_fast(). Signed-off-by: NeilBrown

[PATCH 05/13] block: Improvements to bounce-buffer handling

2017-05-01 Thread NeilBrown
Since commit 23688bf4f830 ("block: ensure to split after potentially bouncing a bio") blk_queue_bounce() is called *before* blk_queue_split(). This means that: 1/ the comments blk_queue_split() about bounce buffers are irrelevant, and 2/ a very large bio (more than BIO_MAX_PAGES) will no

[PATCH 07/13] drbd: use bio_clone_fast() instead of bio_clone()

2017-05-01 Thread NeilBrown
drbd does not modify the bi_io_vec of the cloned bio, so there is no need to clone that part. So bio_clone_fast() is the better choice. For bio_clone_fast() we need to specify a bio_set. We could use fs_bio_set, which bio_clone() uses, or drbd_md_io_bio_set, which drbd uses for metadata, but it

[PATCH 02/13] blk: replace bioset_create_nobvec() with a flags arg to bioset_create()

2017-05-01 Thread NeilBrown
"flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to

[PATCH 00/13] block: assorted cleanup for bio splitting and cloning.

2017-05-01 Thread NeilBrown
This is a revision of my series of patches working towards removing the bioset work queues. This set is based on Linus' tree as for today (2nd May) plus the for-linus branch from Shaohua's md/raid tree. This series adds a fix for the new lightnvm/pblk-read code and discards

Re: [PATCH 02/11] blk: make the bioset rescue_workqueue optional.

2017-05-01 Thread NeilBrown
On Mon, May 01 2017, Jens Axboe wrote: > On 04/30/2017 11:00 PM, NeilBrown wrote: >> On Mon, Apr 24 2017, Christoph Hellwig wrote: >> >>> On Mon, Apr 24, 2017 at 11:51:01AM +1000, NeilBrown wrote: I was following the existing practice exemplified by bioset_create_nobvec(). >>> >>>

Re: [PATCH 6/6] blk-mq-debugfs: Add 'kick' operation

2017-05-01 Thread Jens Axboe
On 05/01/2017 06:19 PM, Omar Sandoval wrote: > On Thu, Apr 27, 2017 at 08:54:37AM -0700, Bart Van Assche wrote: >> Running a queue causes the block layer to examine the per-CPU and >> hw queues but not the requeue list. Hence add a 'kick' operation >> that also examines the requeue list. > > The

Re: [PATCH 6/6] blk-mq-debugfs: Add 'kick' operation

2017-05-01 Thread Omar Sandoval
On Thu, Apr 27, 2017 at 08:54:37AM -0700, Bart Van Assche wrote: > Running a queue causes the block layer to examine the per-CPU and > hw queues but not the requeue list. Hence add a 'kick' operation > that also examines the requeue list. The naming of these operations isn't super intuitive, but

Re: [PATCH 5/6] blk-mq-debugfs: Show busy requests

2017-05-01 Thread Omar Sandoval
On Thu, Apr 27, 2017 at 08:54:36AM -0700, Bart Van Assche wrote: > Requests that got stuck in a block driver are neither on > blk_mq_ctx.rq_list nor on any hw dispatch queue. Make these > visible in debugfs through the "busy" attribute. > > Signed-off-by: Bart Van Assche

Re: [PATCH 25/27] block: remove the discard_zeroes_data flag

2017-05-01 Thread Bart Van Assche
On Wed, 2017-04-05 at 19:21 +0200, Christoph Hellwig wrote: > Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can > kill this hack. > > Signed-off-by: Christoph Hellwig > Reviewed-by: Martin K. Petersen > Reviewed-by: Hannes

[PATCH v2 08/10] dm-linear: Add support for zoned block devices

2017-05-01 Thread damien . lemoal
From: Damien Le Moal Add support for zoned block devices by allowing host-managed zoned block device mapped targets, the remapping of REQ_OP_ZONE_RESET and the post processing (reply remapping) of REQ_OP_ZONE_REPORT. Signed-off-by: Damien Le Moal

[PATCH v2 09/10] dm-kcopyd: Add sequential write feature

2017-05-01 Thread damien . lemoal
From: Damien Le Moal When copyying blocks to host-managed zoned block devices, writes must be sequential. dm_kcopyd_copy() does not howerver guarantee this as writes are issued in the completion order of reads, and reads may complete out of order despite being issued

[PATCH v2 07/10] dm-flakey: Add support for zoned block devices

2017-05-01 Thread damien . lemoal
From: Damien Le Moal With the development of file system support for zoned block devices (e.g. f2fs), having dm-flakey support for these devices is interesting to improve testing. This patch adds support for zoned block devices in dm-flakey, both host-aware and

[PATCH v2 04/10] dm: Fix REQ_OP_ZONE_RESET bio handling

2017-05-01 Thread damien . lemoal
From: Damien Le Moal The REQ_OP_ZONE_RESET bio has no payload and zero sectors. Its position is the only information used to indicate the zone to reset on the device. Due to its zero length, this bio is not cloned and sent to the target through the non-flush case in

[PATCH v2 03/10] dm-table: Check block devices zone model compatibility

2017-05-01 Thread damien . lemoal
From: Damien Le Moal When setting the dm device queue limits, several possibilities exists for zoned block devices: 1) The dm target driver may want to expose a different zone model (e.g. host-managed device emulation or regular block device on top of host-managed zoned

[PATCH v2 05/10] dm: Fix REQ_OP_ZONE_REPORT bio handling

2017-05-01 Thread damien . lemoal
From: Damien Le Moal A REQ_OP_ZONE_REPORT bio is not a medium access command. Its number of sectors indicates the maximum size allowed for the report reply size and not an amount of sectors accessed from the device. REQ_OP_ZONE_REPORT bios should thus not be split

[PATCH v2 01/10] dm-table: Introduce DM_TARGET_ZONED_HM feature

2017-05-01 Thread damien . lemoal
From: Damien Le Moal The target drivers currently available will not operate correctly if a table target maps onto a host-managed zoned block device. To avoid problems, this patch introduces the new feature flag DM_TARGET_ZONED_HM for a target driver to explicitly state

Re: [PATCH v2 cosmetic] Remove trailing newline in elevator switch error message

2017-05-01 Thread mar...@trippelsdorf.de
Trying to switch to a non-existing elevator currently results in garbled dmesg output, e.g.: # echo "foo" > /sys/block/sda/queue/scheduler elevator: type foo not found elevator: switch to foo failed (note the unintended line break.) Fix by stripping the trailing newline. Signed-off-by:

Re: [PATCH 3/4] blk-mq: use hw tag for scheduling if hw tag space is big enough

2017-05-01 Thread Bart Van Assche
On Sat, 2017-04-29 at 18:35 +0800, Ming Lei wrote: > On Fri, Apr 28, 2017 at 06:09:40PM +, Bart Van Assche wrote: > > On Fri, 2017-04-28 at 23:15 +0800, Ming Lei wrote: > > > +static inline bool blk_mq_sched_may_use_hw_tag(struct request_queue *q) > > > +{ > > > + if (q->tag_set->flags &

Re: [PATCH cosmetic] Remove trailing newline in elevator switch error message

2017-05-01 Thread Bart Van Assche
On Sat, 2017-04-29 at 07:38 +0200, Markus Trippelsdorf wrote: > Trying to switch to a non-existing elevator currently results in garbled > dmesg output, e.g.: > > # echo "foo" > /sys/block/sda/queue/scheduler > > elevator: type foo not found > elevator: switch to foo > failed > > (note the

Re: [PATCH] block: Remove elevator_change()

2017-05-01 Thread Jens Axboe
On 05/01/2017 09:58 AM, Bart Van Assche wrote: > Since commit 84253394927c ("remove the mg_disk driver") removed the > only caller of elevator_change(), also remove the elevator_change() > function itself. Nice, thanks Bart. I thought s390 dasd used it as well, but looks like that got killed a

[PATCH] block: Remove elevator_change()

2017-05-01 Thread Bart Van Assche
Since commit 84253394927c ("remove the mg_disk driver") removed the only caller of elevator_change(), also remove the elevator_change() function itself. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Markus Trippelsdorf

Re: [PATCH cosmetic] Remove trailing newline in elevator switch error message

2017-05-01 Thread Bart Van Assche
On Mon, 2017-05-01 at 17:49 +0200, mar...@trippelsdorf.de wrote: > On 2017.05.01 at 15:18 +, Bart Van Assche wrote: > > Your patch duplicates the code to remove trailing whitespace which is not > > very elegant. Please move the code that removes trailing whitespace from > > __elevator_change()

Re: [PATCH cosmetic] Remove trailing newline in elevator switch error message

2017-05-01 Thread mar...@trippelsdorf.de
On 2017.05.01 at 15:18 +, Bart Van Assche wrote: > On Sat, 2017-04-29 at 07:38 +0200, Markus Trippelsdorf wrote: > > Trying to switch to a non-existing elevator currently results in garbled > > dmesg output, e.g.: > > > > # echo "foo" > /sys/block/sda/queue/scheduler > > > > elevator: type

Re: [PATCH 02/11] blk: make the bioset rescue_workqueue optional.

2017-05-01 Thread Jens Axboe
On 04/30/2017 11:00 PM, NeilBrown wrote: > On Mon, Apr 24 2017, Christoph Hellwig wrote: > >> On Mon, Apr 24, 2017 at 11:51:01AM +1000, NeilBrown wrote: >>> >>> I was following the existing practice exemplified by >>> bioset_create_nobvec(). >> >> Which is pretty ugly to start with.. > > That is