Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Hannes Reinecke
On 10/27/2016 12:53 AM, Bart Van Assche wrote: > blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations > have finished. This function does *not* wait until all outstanding > requests have finished (this means invocation of request.end_io()). > The algorithm used by

Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()

2016-10-26 Thread Hannes Reinecke
On 10/27/2016 12:52 AM, Bart Van Assche wrote: > Move the "hctx stopped" test and the insert request calls into > blk_mq_direct_issue_request(). Rename that function into > blk_mq_try_issue_directly() to reflect its new semantics. Pass > the hctx pointer to that function instead of looking it up a

Re: [PATCH 01/12] blk-mq: Do not invoke .queue_rq() for a stopped queue

2016-10-26 Thread Hannes Reinecke
On 10/27/2016 12:50 AM, Bart Van Assche wrote: > The meaning of the BLK_MQ_S_STOPPED flag is "do not call > .queue_rq()". Hence modify blk_mq_make_request() such that requests > are queued instead of issued if a queue has been stopped. > > Reported-by: Ming Lei >

Re: Block layer state diagrams

2016-10-26 Thread Hannes Reinecke
On 10/26/2016 10:21 PM, Bart Van Assche wrote: > Hello Jens, > > Some time ago I created the attached state diagrams for myself to avoid > that I would have to reread the entire block layer core source code if > it has been a while since I had a look at it. Do you think it would be > useful to

Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Bart Van Assche
On 10/26/16 18:30, Ming Lei wrote: On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche wrote: blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations have finished. This function does *not* wait until all outstanding requests have finished (this means

Re: [PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Ming Lei
On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche wrote: > blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations > have finished. This function does *not* wait until all outstanding > requests have finished (this means invocation of request.end_io()). > The

Re: [PATCH v4 0/12] Fix race conditions related to stopping block layer queues

2016-10-26 Thread Jens Axboe
On 10/26/2016 04:49 PM, Bart Van Assche wrote: Hello Jens, Multiple block drivers need the functionality to stop a request queue and to wait until all ongoing request_fn() / queue_rq() calls have finished without waiting until all outstanding requests have finished. Hence this patch series that

[PATCH 10/12] SRP transport, scsi-mq: Wait for .queue_rq() if necessary

2016-10-26 Thread Bart Van Assche
Ensure that if scsi-mq is enabled that scsi_wait_for_queuecommand() waits until ongoing shost->hostt->queuecommand() calls have finished. Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig Cc: James Bottomley Cc:

[PATCH 06/12] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()

2016-10-26 Thread Bart Van Assche
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls are followed by kicking the requeue list. Hence add an argument to these two functions that allows to kick the requeue list. This was proposed by Christoph Hellwig. Signed-off-by: Bart Van Assche Cc:

[PATCH 02/12] blk-mq: Introduce blk_mq_hctx_stopped()

2016-10-26 Thread Bart Van Assche
Multiple functions test the BLK_MQ_S_STOPPED bit so introduce a helper function that performs this test. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Sagi Grimberg Cc: Johannes Thumshirn

[PATCH 12/12] nvme: Fix a race condition related to stopping queues

2016-10-26 Thread Bart Van Assche
Avoid that nvme_queue_rq() is still running when nvme_stop_queues() returns. Signed-off-by: Bart Van Assche Reviewed-by: Sagi Grimberg Reviewed-by: Christoph Hellwig Cc: Keith Busch --- drivers/nvme/host/core.c

[PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-26 Thread Bart Van Assche
Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations that became superfluous because of this change. Change blk_queue_stopped() tests into blk_mq_queue_stopped(). This patch fixes a race condition: using

[PATCH 07/12] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-26 Thread Bart Van Assche
Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED in the dm start and stop queue functions, only manipulate the latter flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped(). Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig

[PATCH 05/12] blk-mq: Introduce blk_mq_quiesce_queue()

2016-10-26 Thread Bart Van Assche
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations have finished. This function does *not* wait until all outstanding requests have finished (this means invocation of request.end_io()). The algorithm used by blk_mq_quiesce_queue() is as follows: * Hold either an RCU read lock or an

[PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()

2016-10-26 Thread Bart Van Assche
Move the "hctx stopped" test and the insert request calls into blk_mq_direct_issue_request(). Rename that function into blk_mq_try_issue_directly() to reflect its new semantics. Pass the hctx pointer to that function instead of looking it up a second time. These changes avoid that code has to be

[PATCH 03/12] blk-mq: Introduce blk_mq_queue_stopped()

2016-10-26 Thread Bart Van Assche
The function blk_queue_stopped() allows to test whether or not a traditional request queue has been stopped. Introduce a helper function that allows block drivers to query easily whether or not one or more hardware contexts of a blk-mq queue have been stopped. Signed-off-by: Bart Van Assche

[PATCH 01/12] blk-mq: Do not invoke .queue_rq() for a stopped queue

2016-10-26 Thread Bart Van Assche
The meaning of the BLK_MQ_S_STOPPED flag is "do not call .queue_rq()". Hence modify blk_mq_make_request() such that requests are queued instead of issued if a queue has been stopped. Reported-by: Ming Lei Signed-off-by: Bart Van Assche

Re: [dm-devel] [PATCH 1/4] brd: handle misaligned discard

2016-10-26 Thread Bart Van Assche
On 10/26/2016 02:46 PM, Mikulas Patocka wrote: I don't like the idea of complicating the code by turning discards into writes. That's not what my patch series does. The only writes added by my patch series are those for the non-aligned head and tail of the range passed to

REQ_OP for zeroing, was Re: [dm-devel] [PATCH 1/4] brd: handle misaligned discard

2016-10-26 Thread Christoph Hellwig
On Wed, Oct 26, 2016 at 05:46:11PM -0400, Mikulas Patocka wrote: > I think the proper thing would be to move "discard_zeroes_data" flag into > the bio itself - there would be REQ_OP_DISCARD and REQ_OP_DISCARD_ZERO - > and if the device doesn't support REQ_OP_DISCARD_ZERO, it rejects the bio >

Re: [dm-devel] [PATCH 1/4] brd: handle misaligned discard

2016-10-26 Thread Mikulas Patocka
On Wed, 26 Oct 2016, Bart Van Assche wrote: > On 10/26/2016 01:26 PM, Mikulas Patocka wrote: > > The brd driver refuses misaligned discard requests with an error. However, > > this is suboptimal, misaligned requests could be handled by discarding a > > part of the request that is aligned on a

Re: Device or HBA level QD throttling creates randomness in sequetial workload

2016-10-26 Thread Omar Sandoval
On Tue, Oct 25, 2016 at 12:24:24AM +0530, Kashyap Desai wrote: > > -Original Message- > > From: Omar Sandoval [mailto:osan...@osandov.com] > > Sent: Monday, October 24, 2016 9:11 PM > > To: Kashyap Desai > > Cc: linux-s...@vger.kernel.org; linux-ker...@vger.kernel.org; linux- > >

[PATCH 1/8] block: add WRITE_BG

2016-10-26 Thread Jens Axboe
This adds a new request flag, REQ_BG, that callers can use to tell the block layer that this is background (non-urgent) IO. Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 4 +++- include/linux/fs.h| 3 +++ 2 files changed, 6 insertions(+), 1 deletion(-) diff

[PATCH 7/8] blk-wbt: add general throttling mechanism

2016-10-26 Thread Jens Axboe
We can hook this up to the block layer, to help throttle buffered writes. wbt registers a few trace points that can be used to track what is happening in the system: wbt_lat: 259:0: latency 2446318 wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,

[PATCH 3/8] writeback: use WRITE_BG for kupdate and background writeback

2016-10-26 Thread Jens Axboe
If we're doing background type writes, then use the appropriate write command for that. Signed-off-by: Jens Axboe --- include/linux/writeback.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/writeback.h b/include/linux/writeback.h index

[PATCH 4/8] writeback: track if we're sleeping on progress in balance_dirty_pages()

2016-10-26 Thread Jens Axboe
Note in the bdi_writeback structure whenever a task ends up sleeping waiting for progress. We can use that information in the lower layers to increase the priority of writes. Signed-off-by: Jens Axboe --- include/linux/backing-dev-defs.h | 2 ++ mm/backing-dev.c |

[PATCH 5/8] block: add code to track actual device queue depth

2016-10-26 Thread Jens Axboe
For blk-mq, ->nr_requests does track queue depth, at least at init time. But for the older queue paths, it's simply a soft setting. On top of that, it's generally larger than the hardware setting on purpose, to allow backup of requests for merging. Fill a hole in struct request with a

[PATCH 6/8] block: add scalable completion tracking of requests

2016-10-26 Thread Jens Axboe
For legacy block, we simply track them in the request queue. For blk-mq, we track them on a per-sw queue basis, which we can then sum up through the hardware queues and finally to a per device state. The stats are tracked in, roughly, 0.1s interval windows. Add sysfs files to display the stats.

[PATCH 8/8] block: hook up writeback throttling

2016-10-26 Thread Jens Axboe
Enable throttling of buffered writeback to make it a lot more smooth, and has way less impact on other system activity. Background writeback should be, by definition, background activity. The fact that we flush huge bundles of it at the time means that it potentially has heavy impacts on

[PATCHSET] block: buffered writeback throttling

2016-10-26 Thread Jens Axboe
Since the dawn of time, our background buffered writeback has sucked. When we do background buffered writeback, it should have little impact on foreground activity. That's the definition of background activity... But for as long as I can remember, heavy buffered writers have not behaved like that.

Re: [dm-devel] [PATCH 1/4] brd: handle misaligned discard

2016-10-26 Thread Bart Van Assche
On 10/26/2016 01:26 PM, Mikulas Patocka wrote: The brd driver refuses misaligned discard requests with an error. However, this is suboptimal, misaligned requests could be handled by discarding a part of the request that is aligned on a page boundary. This patch changes the code so that it

[PATCH 0/4] brd: support discard

2016-10-26 Thread Mikulas Patocka
On Tue, 25 Oct 2016, Jens Axboe wrote: > On 10/25/2016 08:37 AM, Mike Snitzer wrote: > > On Tue, Oct 25 2016 at 9:07P -0400, > > Christoph Hellwig wrote: > > > > > I think the right fix is to kill off the BLKFLSBUF special case in > > > brd. Yes, it break compatibility -

[PATCH 2/4] brd: extend rcu read sections

2016-10-26 Thread Mikulas Patocka
This patch extends rcu read sections, so that all manipulations of the page and its data are within read sections. This patch is a prerequisite for discarding pages using rcu. Note that the page pointer escapes the rcu section in the function brd_direct_access, however, direct access is not

[PATCH 1/4] brd: handle misaligned discard

2016-10-26 Thread Mikulas Patocka
The brd driver refuses misaligned discard requests with an error. However, this is suboptimal, misaligned requests could be handled by discarding a part of the request that is aligned on a page boundary. This patch changes the code so that it handles misaligned requests. Signed-off-by: Mikulas

Block layer state diagrams

2016-10-26 Thread Bart Van Assche
Hello Jens, Some time ago I created the attached state diagrams for myself to avoid that I would have to reread the entire block layer core source code if it has been a while since I had a look at it. Do you think it would be useful to add these diagrams somewhere in the Documentation

[PATCH 3/4] brd: implement discard

2016-10-26 Thread Mikulas Patocka
Implement page discard using rcu. Each page has built-in entry rcu_head that could be used to free the page from rcu. Regarding the commant that "re-allocating the pages can result in writeback deadlocks under heavy load" - if the user is in the risk of such deadlocks, he should mount the

[PATCH 4/4] brd: remove unused brd_zero_page

2016-10-26 Thread Mikulas Patocka
Remove the function brd_zero_page. This function was used to zero a page when the discard request came in. The discard request is used for performance or space optimization, it makes no sense to zero pages on discard request, as it neither improves performance nor saves memory. Signed-off-by:

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Jens Axboe
On 10/26/2016 10:04 AM, Paolo Valente wrote: Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe ha scritto: On 10/26/2016 09:29 AM, Christoph Hellwig wrote: On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote: The question to ask first is whether to actually have

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Jens Axboe
On 10/26/2016 09:29 AM, Christoph Hellwig wrote: On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote: The question to ask first is whether to actually have pluggable schedulers on blk-mq at all, or just have one that is meant to do the right thing in every case (and possibly can be

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Christoph Hellwig
On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote: > The question to ask first is whether to actually have pluggable > schedulers on blk-mq at all, or just have one that is meant to > do the right thing in every case (and possibly can be bypassed > completely). That would be my

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Arnd Bergmann
On Wednesday, October 26, 2016 8:05:11 AM CEST Bart Van Assche wrote: > On 10/26/2016 04:34 AM, Jan Kara wrote: > > On Wed 26-10-16 03:19:03, Christoph Hellwig wrote: > >> Just as last time: > >> > >> big NAK for introducing giant new infrastructure like a new I/O scheduler > >> for the legacy

Re: [PATCH 5/6] iomap: implement direct I/O

2016-10-26 Thread Christoph Hellwig
On Wed, Oct 26, 2016 at 09:53:43AM -0400, Bob Peterson wrote: > It's unlikely, but bio_alloc can return NULL; shouldn't the code be > checking for that? No, a sleeping bio_alloc can not return NULL. If it did our I/O code would break down badly - take a look at the implementation, the

Re: [PATCH 5/6] iomap: implement direct I/O

2016-10-26 Thread Bob Peterson
- Original Message - | This adds a full fledget direct I/O implementation using the iomap | interface. Full fledged in this case means all features are supported: | AIO, vectored I/O, any iov_iter type including kernel pointers, bvecs | and pipes, support for hole filling and async

Re: [PATCH] block: flush: fix IO hang in case of flood fua req

2016-10-26 Thread Jens Axboe
On 10/26/2016 02:57 AM, Ming Lei wrote: This patch fixes one issue reported by Kent, which can be triggered in bcachefs over sata disk. Actually it is a generic issue in block flush vs. blk-tag. Looks good to me. Had to double check we don't get there for the mq path, but we have our own

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-26 Thread Dan Williams
On Wed, Oct 26, 2016 at 1:24 AM, Haggai Eran wrote: [..] >> I wonder if we could (ab)use a >> software-defined 'pasid' as the requester id for a peer-to-peer >> mapping that needs address translation. > Why would you need that? Isn't it enough to map the peer-to-peer >

Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Christoph Hellwig
Just as last time: big NAK for introducing giant new infrastructure like a new I/O scheduler for the legacy request structure. Please direct your engergy towards blk-mq instead. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to

[PATCH 03/14] block, bfq: improve throughput boosting

2016-10-26 Thread Paolo Valente
The feedback-loop algorithm used by BFQ to compute queue (process) budgets is basically a set of three update rules, one for each of the main reasons why a queue may be expired. If many processes suddenly switch from sporadic I/O to greedy and sequential I/O, then these rules are quite slow to

[PATCH 06/14] block, bfq: improve responsiveness

2016-10-26 Thread Paolo Valente
This patch introduces a simple heuristic to load applications quickly, and to perform the I/O requested by interactive applications just as quickly. To this purpose, both a newly-created queue and a queue associated with an interactive application (we explain in a moment how BFQ decides whether

[PATCH 09/14] block, bfq: reduce latency during request-pool saturation

2016-10-26 Thread Paolo Valente
This patch introduces an heuristic that reduces latency when the I/O-request pool is saturated. This goal is achieved by disabling device idling, for non-weight-raised queues, when there are weight- raised queues with pending or in-flight requests. In fact, as explained in more detail in the

[PATCH 02/14] block, bfq: add full hierarchical scheduling and cgroups support

2016-10-26 Thread Paolo Valente
From: Arianna Avanzini Add complete support for full hierarchical scheduling, with a cgroups interface. Full hierarchical scheduling is implemented through the 'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues associated with processes, and groups

[PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

2016-10-26 Thread Paolo Valente
Hi, this new patch series turns back to the initial approach, i.e., it adds BFQ as an extra scheduler, instead of replacing CFQ with BFQ. This patch series also contains all the improvements and bug fixes recommended by Tejun [5], plus new features of BFQ-v8r5. Details about old and new features

[PATCH 05/14] block, bfq: add more fairness with writes and slow processes

2016-10-26 Thread Paolo Valente
This patch deals with two sources of unfairness, which can also cause high latencies and throughput loss. The first source is related to write requests. Write requests tend to starve read requests, basically because, on one side, writes are slower than reads, whereas, on the other side, storage

[PATCH] block: flush: fix IO hang in case of flood fua req

2016-10-26 Thread Ming Lei
This patch fixes one issue reported by Kent, which can be triggered in bcachefs over sata disk. Actually it is a generic issue in block flush vs. blk-tag. Cc: Christoph Hellwig Reported-by: Kent Overstreet Signed-off-by: Ming Lei

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-26 Thread Haggai Eran
On 10/19/2016 6:51 AM, Dan Williams wrote: > On Tue, Oct 18, 2016 at 2:42 PM, Stephen Bates wrote: >> 1. Address Translation. Suggestions have been made that in certain >> architectures and topologies the dma_addr_t passed to the DMA master >> in a peer-2-peer transfer will

Re: [PATCH 5/6] iomap: implement direct I/O

2016-10-26 Thread Christoph Hellwig
On Tue, Oct 25, 2016 at 11:51:53AM -0800, Kent Overstreet wrote: > So - you're hitting inode locks on each call to iomap_begin()/iomap_end()? :/ Depends on your defintion of inode locks. In XFS we have three inode locks: (1) the IOLOCK, which this patch series actually replaces entirely by

Re: [PATCH 5/6] iomap: implement direct I/O

2016-10-26 Thread Christoph Hellwig
On Tue, Oct 25, 2016 at 09:13:29AM -0800, Kent Overstreet wrote: > Also - what are you doing about the race between shooting down the range in > the > pagecache and dirty pages being readded? The existing direct IO code falls > back > to buffered IO for that, but your code doesn't appear to - I

Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

2016-10-26 Thread Christoph Hellwig
On Tue, Oct 25, 2016 at 10:13:13PM -0600, Andreas Dilger wrote: > Why wouldn't you have all the pool sizes in between? Definitely 1MB has > been too small already for high-bandwidth IO. I wouldn't mind BIOs up to > 4MB or larger since most high-end RAID hardware does best with 4MB IOs. I/O

Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

2016-10-26 Thread Christoph Hellwig
On Wed, Oct 26, 2016 at 03:30:05PM +0800, Ming Lei wrote: > I am preparing for the multipage bvec support[1], and once it is ready the > default 256 bvecs should be enough for normal cases. Yes, multipage bvecs are defintively the way to got to efficiently support I/O on huge pages. -- To

Re: [PATCHv4 18/43] block: define BIO_MAX_PAGES to HPAGE_PMD_NR if huge page cache enabled

2016-10-26 Thread Ming Lei
On Wed, Oct 26, 2016 at 12:13 PM, Andreas Dilger wrote: > On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov wrote: >> >> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote: >>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov