On 10/27/2016 12:53 AM, Bart Van Assche wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The algorithm used by
On 10/27/2016 12:52 AM, Bart Van Assche wrote:
> Move the "hctx stopped" test and the insert request calls into
> blk_mq_direct_issue_request(). Rename that function into
> blk_mq_try_issue_directly() to reflect its new semantics. Pass
> the hctx pointer to that function instead of looking it up a
On 10/27/2016 12:50 AM, Bart Van Assche wrote:
> The meaning of the BLK_MQ_S_STOPPED flag is "do not call
> .queue_rq()". Hence modify blk_mq_make_request() such that requests
> are queued instead of issued if a queue has been stopped.
>
> Reported-by: Ming Lei
>
On 10/26/2016 10:21 PM, Bart Van Assche wrote:
> Hello Jens,
>
> Some time ago I created the attached state diagrams for myself to avoid
> that I would have to reread the entire block layer core source code if
> it has been a while since I had a look at it. Do you think it would be
> useful to
On 10/26/16 18:30, Ming Lei wrote:
On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
wrote:
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means
On Thu, Oct 27, 2016 at 6:53 AM, Bart Van Assche
wrote:
> blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
> have finished. This function does *not* wait until all outstanding
> requests have finished (this means invocation of request.end_io()).
> The
On 10/26/2016 04:49 PM, Bart Van Assche wrote:
Hello Jens,
Multiple block drivers need the functionality to stop a request queue
and to wait until all ongoing request_fn() / queue_rq() calls have
finished without waiting until all outstanding requests have finished.
Hence this patch series that
Ensure that if scsi-mq is enabled that scsi_wait_for_queuecommand()
waits until ongoing shost->hostt->queuecommand() calls have finished.
Signed-off-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
Cc: James Bottomley
Cc:
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.
Signed-off-by: Bart Van Assche
Cc:
Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
a helper function that performs this test.
Signed-off-by: Bart Van Assche
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Sagi Grimberg
Cc: Johannes Thumshirn
Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns.
Signed-off-by: Bart Van Assche
Reviewed-by: Sagi Grimberg
Reviewed-by: Christoph Hellwig
Cc: Keith Busch
---
drivers/nvme/host/core.c
Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. Change
blk_queue_stopped() tests into blk_mq_queue_stopped().
This patch fixes a race condition: using
Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED
in the dm start and stop queue functions, only manipulate the latter
flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped().
Signed-off-by: Bart Van Assche
Reviewed-by: Christoph Hellwig
blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an
Move the "hctx stopped" test and the insert request calls into
blk_mq_direct_issue_request(). Rename that function into
blk_mq_try_issue_directly() to reflect its new semantics. Pass
the hctx pointer to that function instead of looking it up a
second time. These changes avoid that code has to be
The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.
Signed-off-by: Bart Van Assche
The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.
Reported-by: Ming Lei
Signed-off-by: Bart Van Assche
On 10/26/2016 02:46 PM, Mikulas Patocka wrote:
I don't like the idea of complicating the code by turning discards into
writes.
That's not what my patch series does. The only writes added by my patch
series are those for the non-aligned head and tail of the range passed
to
On Wed, Oct 26, 2016 at 05:46:11PM -0400, Mikulas Patocka wrote:
> I think the proper thing would be to move "discard_zeroes_data" flag into
> the bio itself - there would be REQ_OP_DISCARD and REQ_OP_DISCARD_ZERO -
> and if the device doesn't support REQ_OP_DISCARD_ZERO, it rejects the bio
>
On Wed, 26 Oct 2016, Bart Van Assche wrote:
> On 10/26/2016 01:26 PM, Mikulas Patocka wrote:
> > The brd driver refuses misaligned discard requests with an error. However,
> > this is suboptimal, misaligned requests could be handled by discarding a
> > part of the request that is aligned on a
On Tue, Oct 25, 2016 at 12:24:24AM +0530, Kashyap Desai wrote:
> > -Original Message-
> > From: Omar Sandoval [mailto:osan...@osandov.com]
> > Sent: Monday, October 24, 2016 9:11 PM
> > To: Kashyap Desai
> > Cc: linux-s...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
> >
This adds a new request flag, REQ_BG, that callers can use to tell
the block layer that this is background (non-urgent) IO.
Signed-off-by: Jens Axboe
---
include/linux/blk_types.h | 4 +++-
include/linux/fs.h| 3 +++
2 files changed, 6 insertions(+), 1 deletion(-)
diff
We can hook this up to the block layer, to help throttle buffered
writes.
wbt registers a few trace points that can be used to track what is
happening in the system:
wbt_lat: 259:0: latency 2446318
wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
If we're doing background type writes, then use the appropriate
write command for that.
Signed-off-by: Jens Axboe
---
include/linux/writeback.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index
Note in the bdi_writeback structure whenever a task ends up sleeping
waiting for progress. We can use that information in the lower layers
to increase the priority of writes.
Signed-off-by: Jens Axboe
---
include/linux/backing-dev-defs.h | 2 ++
mm/backing-dev.c |
For blk-mq, ->nr_requests does track queue depth, at least at init
time. But for the older queue paths, it's simply a soft setting.
On top of that, it's generally larger than the hardware setting
on purpose, to allow backup of requests for merging.
Fill a hole in struct request with a
For legacy block, we simply track them in the request queue. For
blk-mq, we track them on a per-sw queue basis, which we can then
sum up through the hardware queues and finally to a per device
state.
The stats are tracked in, roughly, 0.1s interval windows.
Add sysfs files to display the stats.
Enable throttling of buffered writeback to make it a lot
more smooth, and has way less impact on other system activity.
Background writeback should be, by definition, background
activity. The fact that we flush huge bundles of it at the time
means that it potentially has heavy impacts on
Since the dawn of time, our background buffered writeback has sucked.
When we do background buffered writeback, it should have little impact
on foreground activity. That's the definition of background activity...
But for as long as I can remember, heavy buffered writers have not
behaved like that.
On 10/26/2016 01:26 PM, Mikulas Patocka wrote:
The brd driver refuses misaligned discard requests with an error. However,
this is suboptimal, misaligned requests could be handled by discarding a
part of the request that is aligned on a page boundary. This patch changes
the code so that it
On Tue, 25 Oct 2016, Jens Axboe wrote:
> On 10/25/2016 08:37 AM, Mike Snitzer wrote:
> > On Tue, Oct 25 2016 at 9:07P -0400,
> > Christoph Hellwig wrote:
> >
> > > I think the right fix is to kill off the BLKFLSBUF special case in
> > > brd. Yes, it break compatibility -
This patch extends rcu read sections, so that all manipulations of the
page and its data are within read sections.
This patch is a prerequisite for discarding pages using rcu.
Note that the page pointer escapes the rcu section in the function
brd_direct_access, however, direct access is not
The brd driver refuses misaligned discard requests with an error. However,
this is suboptimal, misaligned requests could be handled by discarding a
part of the request that is aligned on a page boundary. This patch changes
the code so that it handles misaligned requests.
Signed-off-by: Mikulas
Hello Jens,
Some time ago I created the attached state diagrams for myself to avoid
that I would have to reread the entire block layer core source code if
it has been a while since I had a look at it. Do you think it would be
useful to add these diagrams somewhere in the Documentation
Implement page discard using rcu. Each page has built-in entry rcu_head
that could be used to free the page from rcu.
Regarding the commant that "re-allocating the pages can result in
writeback deadlocks under heavy load" - if the user is in the risk of such
deadlocks, he should mount the
Remove the function brd_zero_page. This function was used to zero a page
when the discard request came in.
The discard request is used for performance or space optimization, it
makes no sense to zero pages on discard request, as it neither improves
performance nor saves memory.
Signed-off-by:
On 10/26/2016 10:04 AM, Paolo Valente wrote:
Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe ha scritto:
On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
The question to ask first is whether to actually have
On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
The question to ask first is whether to actually have pluggable
schedulers on blk-mq at all, or just have one that is meant to
do the right thing in every case (and possibly can be
On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
> The question to ask first is whether to actually have pluggable
> schedulers on blk-mq at all, or just have one that is meant to
> do the right thing in every case (and possibly can be bypassed
> completely).
That would be my
On Wednesday, October 26, 2016 8:05:11 AM CEST Bart Van Assche wrote:
> On 10/26/2016 04:34 AM, Jan Kara wrote:
> > On Wed 26-10-16 03:19:03, Christoph Hellwig wrote:
> >> Just as last time:
> >>
> >> big NAK for introducing giant new infrastructure like a new I/O scheduler
> >> for the legacy
On Wed, Oct 26, 2016 at 09:53:43AM -0400, Bob Peterson wrote:
> It's unlikely, but bio_alloc can return NULL; shouldn't the code be
> checking for that?
No, a sleeping bio_alloc can not return NULL. If it did our I/O
code would break down badly - take a look at the implementation,
the
- Original Message -
| This adds a full fledget direct I/O implementation using the iomap
| interface. Full fledged in this case means all features are supported:
| AIO, vectored I/O, any iov_iter type including kernel pointers, bvecs
| and pipes, support for hole filling and async
On 10/26/2016 02:57 AM, Ming Lei wrote:
This patch fixes one issue reported by Kent, which can
be triggered in bcachefs over sata disk. Actually it
is a generic issue in block flush vs. blk-tag.
Looks good to me. Had to double check we don't get there for the mq
path, but we have our own
On Wed, Oct 26, 2016 at 1:24 AM, Haggai Eran wrote:
[..]
>> I wonder if we could (ab)use a
>> software-defined 'pasid' as the requester id for a peer-to-peer
>> mapping that needs address translation.
> Why would you need that? Isn't it enough to map the peer-to-peer
>
Just as last time:
big NAK for introducing giant new infrastructure like a new I/O scheduler
for the legacy request structure.
Please direct your engergy towards blk-mq instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to
The feedback-loop algorithm used by BFQ to compute queue (process)
budgets is basically a set of three update rules, one for each of the
main reasons why a queue may be expired. If many processes suddenly
switch from sporadic I/O to greedy and sequential I/O, then these
rules are quite slow to
This patch introduces a simple heuristic to load applications quickly,
and to perform the I/O requested by interactive applications just as
quickly. To this purpose, both a newly-created queue and a queue
associated with an interactive application (we explain in a moment how
BFQ decides whether
This patch introduces an heuristic that reduces latency when the
I/O-request pool is saturated. This goal is achieved by disabling
device idling, for non-weight-raised queues, when there are weight-
raised queues with pending or in-flight requests. In fact, as
explained in more detail in the
From: Arianna Avanzini
Add complete support for full hierarchical scheduling, with a cgroups
interface. Full hierarchical scheduling is implemented through the
'entity' abstraction: both bfq_queues, i.e., the internal BFQ queues
associated with processes, and groups
Hi,
this new patch series turns back to the initial approach, i.e., it
adds BFQ as an extra scheduler, instead of replacing CFQ with
BFQ. This patch series also contains all the improvements and bug
fixes recommended by Tejun [5], plus new features of BFQ-v8r5. Details
about old and new features
This patch deals with two sources of unfairness, which can also cause
high latencies and throughput loss. The first source is related to
write requests. Write requests tend to starve read requests, basically
because, on one side, writes are slower than reads, whereas, on the
other side, storage
This patch fixes one issue reported by Kent, which can
be triggered in bcachefs over sata disk. Actually it
is a generic issue in block flush vs. blk-tag.
Cc: Christoph Hellwig
Reported-by: Kent Overstreet
Signed-off-by: Ming Lei
On 10/19/2016 6:51 AM, Dan Williams wrote:
> On Tue, Oct 18, 2016 at 2:42 PM, Stephen Bates wrote:
>> 1. Address Translation. Suggestions have been made that in certain
>> architectures and topologies the dma_addr_t passed to the DMA master
>> in a peer-2-peer transfer will
On Tue, Oct 25, 2016 at 11:51:53AM -0800, Kent Overstreet wrote:
> So - you're hitting inode locks on each call to iomap_begin()/iomap_end()? :/
Depends on your defintion of inode locks. In XFS we have three inode
locks:
(1) the IOLOCK, which this patch series actually replaces entirely by
On Tue, Oct 25, 2016 at 09:13:29AM -0800, Kent Overstreet wrote:
> Also - what are you doing about the race between shooting down the range in
> the
> pagecache and dirty pages being readded? The existing direct IO code falls
> back
> to buffered IO for that, but your code doesn't appear to - I
On Tue, Oct 25, 2016 at 10:13:13PM -0600, Andreas Dilger wrote:
> Why wouldn't you have all the pool sizes in between? Definitely 1MB has
> been too small already for high-bandwidth IO. I wouldn't mind BIOs up to
> 4MB or larger since most high-end RAID hardware does best with 4MB IOs.
I/O
On Wed, Oct 26, 2016 at 03:30:05PM +0800, Ming Lei wrote:
> I am preparing for the multipage bvec support[1], and once it is ready the
> default 256 bvecs should be enough for normal cases.
Yes, multipage bvecs are defintively the way to got to efficiently
support I/O on huge pages.
--
To
On Wed, Oct 26, 2016 at 12:13 PM, Andreas Dilger wrote:
> On Oct 25, 2016, at 6:54 AM, Kirill A. Shutemov wrote:
>>
>> On Tue, Oct 25, 2016 at 12:21:22AM -0700, Christoph Hellwig wrote:
>>> On Tue, Oct 25, 2016 at 03:13:17AM +0300, Kirill A. Shutemov
58 matches
Mail list logo