On Tue, Dec 04, 2018 at 02:21:17PM -0700, Keith Busch wrote:
> On Tue, Dec 04, 2018 at 11:33:33AM -0800, James Smart wrote:
> > On 12/4/2018 9:48 AM, Keith Busch wrote:
> > > Once quiesced, the proposed iterator can handle the final termination
> > > of the request, perf
On Tue, Dec 04, 2018 at 11:33:33AM -0800, James Smart wrote:
>
>
> On 12/4/2018 9:48 AM, Keith Busch wrote:
> > On Tue, Dec 04, 2018 at 09:38:29AM -0800, Sagi Grimberg wrote:
> > > > > > Yes, I'm very much in favour of this, too.
> > > > >
On Tue, Dec 04, 2018 at 09:38:29AM -0800, Sagi Grimberg wrote:
>
> > > > Yes, I'm very much in favour of this, too.
> > > > We always have this IMO slightly weird notion of stopping the queue, set
> > > > some error flags in the driver, then _restarting_ the queue, just so
> > > > that the driver
On Mon, Dec 03, 2018 at 05:33:06PM -0800, Sagi Grimberg wrote:
> >>
> > Yes, I'm very much in favour of this, too.
> > We always have this IMO slightly weird notion of stopping the queue, set
> > some error flags in the driver, then _restarting_ the queue, just so
> > that the driver then sees
f-by: Christoph Hellwig
Looks good.
Reviewed-by: Keith Busch
wed anyway since the head's current path could change,
and you end up polling the wrong request_queue. Not really harmful other
than some wasted CPU cycles, but might be worth thinking about if we
want to bring mpath polling back.
Reviewed-by: Keith Busch
h we already take it without IRQ disabling.
>
> Signed-off-by: Christoph Hellwig
Looks good.
Reviewed-by: Keith Busch
On Fri, Nov 30, 2018 at 01:36:09PM -0700, Jens Axboe wrote:
> On 11/30/18 1:26 PM, Keith Busch wrote:
> > A driver may wish to iterate every tagged request, not just ones that
> > satisfy blk_mq_request_started(). The intended use is so a driver may
> > terminate entered
ve to deal
with these conditions.
Signed-off-by: Keith Busch
---
drivers/nvme/host/core.c | 10 --
drivers/nvme/host/pci.c | 43 +++
2 files changed, 35 insertions(+), 18 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91
A driver may wish to iterate every tagged request, not just ones that
satisfy blk_mq_request_started(). The intended use is so a driver may
terminate entered requests on quiesced queues.
Signed-off-by: Keith Busch
---
block/blk-mq-tag.c | 41 +++--
block
On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote:
> On 11/30/18 8:24 AM, Christoph Hellwig wrote:
> > Various fixlets all over, including throwing in a 'default y' for the
> > multipath code, given that we want people to actually enable it for full
> > functionality.
>
> Why enable it
On Fri, Nov 30, 2018 at 12:08:09AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> > > +
> > > + /* handle any remaining CQEs */
On Fri, Nov 30, 2018 at 12:00:13AM -0800, Christoph Hellwig wrote:
> On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote:
> > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote:
> > > +enum hctx_type {
> > > + HCTX_TYPE_DEFAULT, /* all
k. Since you're not using a lock anymore,
a further optimization can complete the CQE inline with moving the cq
head so that we don't go through queue twice.
That can be a follow on, though, this patch looks fine.
Reviewed-by: Keith Busch
On Thu, Nov 29, 2018 at 08:13:03PM +0100, Christoph Hellwig wrote:
> Pass the opcode for the delete SQ/CQ command as an argument instead of
> the somewhat confusing pass loop.
>
> Signed-off-by: Christoph Hellwig
Looks good.
Reviewed-by: Keith Busch
On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote:
> This is the last place outside of nvme_irq that handles CQEs from
> interrupt context, and thus is in the way of removing the cq_lock for
> normal queues, and avoiding lockdep warnings on the poll queues, for
> which we already
On Thu, Nov 29, 2018 at 08:13:00PM +0100, Christoph Hellwig wrote:
> Use a bit flag to mark if the SQ was allocated from the CMB, and clean
> up the surrounding code a bit.
>
> Signed-off-by: Christoph Hellwig
Looks good.
Reviewed-by: Keith Busch
On Thu, Nov 29, 2018 at 08:12:59PM +0100, Christoph Hellwig wrote:
> This gets rid of all the messing with cq_vector and the ->polled field
> by using an atomic bitop to mark the queue enabled or not.
>
> Signed-off-by: Christoph Hellwig
Looks good.
Reviewed-by: Keith Busch
of any kind */
> +
> + HCTX_MAX_TYPES,
> };
Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO
queues!
I'm not that sad about it, though.
Reviewed-by: Keith Busch
On Thu, Nov 29, 2018 at 10:06:20AM -0700, Jens Axboe wrote:
> On 11/29/18 10:04 AM, Christoph Hellwig wrote:
> > gcc never warns about unused static inline functions. Which makes a lot
> > of sense at least for headers..
>
> Not so much for non-headers :-)
You can #include .c files too! :)
be longer term, if
a request_queue has an active reference, it might make sense to have
blk-mq's APIs consistently handle these sorts of things.
Reviewed-by: Keith Busch
> ---
> drivers/nvme/host/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a
On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote:
> > index 5d83a162d03b..c1d5e4e36125 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request
> > *req)
> >
> > static void
On Wed, Nov 14, 2018 at 12:43:22AM -0800, Christoph Hellwig wrote:
> static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> {
> struct nvme_queue *nvmeq = hctx->driver_data;
> > u16 start, end;
> bool found;
> >
> > if (!nvme_cqe_pending(nvmeq))
> >
On Tue, Nov 13, 2018 at 08:42:28AM -0700, Jens Axboe wrote:
> If we're polling for IO on a device that doesn't use interrupts, then
> IO completion loop (and wake of task) is done by submitting task itself.
> If that is the case, then we don't need to enter the wake_up_process()
> function, we can
On Thu, Nov 08, 2018 at 07:10:58PM +0800, Ming Lei wrote:
> I guess the issue may depend on specific QEMU version, just tried the test
> over
> virtio-scsi/sata/usb-storage emulated via qemu-2.10.2-1.fc27, not observed
> this problem.
I actually didn't use virtio-scsi, but it really doesn't
On Thu, Nov 08, 2018 at 09:12:59AM +0800, Ming Lei wrote:
> Is it NVMe specific issue or common problem in other storage hardware? SCSI
> does call blk_update_request() and handles partial completion.
Not specific to NVMe.
An example using SG_IO dumping 2MB of unsanitized kernel memory:
On Thu, Nov 08, 2018 at 12:03:41AM +0800, Ming Lei wrote:
> On Wed, Nov 7, 2018 at 11:47 PM Keith Busch wrote:
> >
> > On Wed, Nov 07, 2018 at 11:44:59PM +0800, Ming Lei wrote:
> > > blk_update_request() may tell us how much progress made, :-)
> >
> >
On Wed, Nov 07, 2018 at 11:44:59PM +0800, Ming Lei wrote:
> blk_update_request() may tell us how much progress made, :-)
Except when it doesn't, which is 100% of the time for many block
drivers, including nvme.
On Wed, Nov 07, 2018 at 11:09:27PM +0800, Ming Lei wrote:
> On Wed, Nov 7, 2018 at 10:42 PM Keith Busch wrote:
> >
> > If the kernel allocates a bounce buffer for user read data, this memory
> > needs to be cleared before copying it to the user, otherwise it may leak
>
If the kernel allocates a bounce buffer for user read data, this memory
needs to be cleared before copying it to the user, otherwise it may leak
kernel memory to user space.
Signed-off-by: Keith Busch
---
block/bio.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/bio.c b/block/bio.c
On Mon, Oct 08, 2018 at 02:58:21PM +0800, Dongli Zhang wrote:
> I got the same result when emulating nvme with qemu: the VM has 12 cpu, while
> the num_queues of nvme is 8.
>
> # uname -r
> 4.14.1
> # ll /sys/block/nvme*n1/mq/*/cpu_list
> -r--r--r-- 1 root root 4096 Oct 8 14:30
e
> Signed-off-by: Hannes Reinecke
Thanks, looks great!
Reviewed-by: Keith Busch
but looking forward
to it in the future. Looks good.
Reviewed-by: Keith Busch
afe to directly access memory mapped by memremap()/devm_memremap_pages().
> Architectures where this is not safe will not be supported by memremap()
> and therefore will not be support PCI P2P and have no support for CMB.
>
> Signed-off-by: Logan Gunthorpe
Looks good to me.
Reviewed-by: Keith Busch
_P2P flag which tells the core to
> set QUEUE_FLAG_PCI_P2P in the request queue.
>
> Signed-off-by: Logan Gunthorpe
> Reviewed-by: Sagi Grimberg
> Reviewed-by: Christoph Hellwig
Looks good to me as well.
Reviewed-by: Keith Busch
>
> Signed-off-by: Logan Gunthorpe
> Reviewed-by: Sagi Grimberg
The implementation looks fine, and this is why we have quirks ... but
CMB has existed since 2014! :)
Reviewed-by: Keith Busch
of reading it, which allows callbacks to make blocking
calls.
Fixes: f5e4d6357 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")
Cc: Jianchao Wang
Signed-off-by: Keith Busch
---
v1 -> v2:
Leave the iterator interface as-is, making this chang
On Tue, Sep 25, 2018 at 08:14:45AM -0700, Bart Van Assche wrote:
> On Tue, 2018-09-25 at 08:39 -0600, Keith Busch wrote:
> > On Tue, Sep 25, 2018 at 10:39:46AM +0800, jianchao.wang wrote:
> > > But the issue is the left part of blk_mq_timeout_work is moved out of
> > &g
On Tue, Sep 25, 2018 at 10:39:46AM +0800, jianchao.wang wrote:
> Hi Bart
>
> On 09/25/2018 10:20 AM, Bart Van Assche wrote:
> > On 9/24/18 7:11 PM, jianchao.wang wrote:
> >> Hi Keith
> >>
> >> On 09/25/2018 05:09 AM, Keith Busch wrote:
> >>>
reference if the queue is not frozen instead of a rcu read lock.
Fixes: f5e4d6357 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter")
Cc: Jianchao Wang
Signed-off-by: Keith Busch
---
block/blk-mq-tag.c | 30 +-
block/blk-mq-tag.h | 2
On Mon, Sep 24, 2018 at 12:51:07PM -0700, Bart Van Assche wrote:
> On Mon, 2018-09-24 at 13:13 -0600, Keith Busch wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 85a1c1a59c72..28d128450621 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
>
On Mon, Sep 24, 2018 at 12:44:13PM -0600, Jens Axboe wrote:
> Hi,
>
> This commit introduced a rcu_read_lock() inside
> blk_mq_queue_tag_busy_iter() - this is problematic for the timout code,
> since we now end up holding the RCU read lock over the timeout code. As
> just one example, nvme ends
On Fri, Jul 27, 2018 at 05:14:18PM +, Bart Van Assche wrote:
> On Fri, 2018-07-27 at 11:04 -0600, Keith Busch wrote:
> > On Fri, Jul 27, 2018 at 04:59:34PM +, Bart Van Assche wrote:
> > > On Fri, 2018-07-27 at 10:57 -0600, Keith Busch wrote:
> > > > You skip t
On Fri, Jul 27, 2018 at 04:59:34PM +, Bart Van Assche wrote:
> On Fri, 2018-07-27 at 10:57 -0600, Keith Busch wrote:
> > You skip that code if the driver returns BLK_EH_DONT_RESET_TIMER.
>
> How about applying the following patch on top of this series?
That works for me if you
On Fri, Jul 27, 2018 at 04:51:05PM +, Bart Van Assche wrote:
> On Fri, 2018-07-27 at 10:46 -0600, Keith Busch wrote:
> > On Fri, Jul 27, 2018 at 09:20:42AM -0700, Bart Van Assche wrote:
> > > + ret = req->q->mq_ops->timeout(req, reserved);
> > > + /*
>
On Fri, Jul 27, 2018 at 09:20:42AM -0700, Bart Van Assche wrote:
> + ret = req->q->mq_ops->timeout(req, reserved);
> + /*
> + * BLK_EH_DONT_RESET_TIMER means that the block driver either
> + * completed the request or still owns the request and will
> + * continue processing
On Fri, Jul 27, 2018 at 09:20:40AM -0700, Bart Van Assche wrote:
> +void blk_mq_add_timer(struct request *req)
> +{
> + struct request_queue *q = req->q;
> +
> + if (!q->mq_ops)
> + lockdep_assert_held(q->queue_lock);
> +
> + /* blk-mq has its own handler, so we don't need
> support).
I'm afraid I won't be able to test any of this in the near future, but
the code looks good to me!
Acked-by: Keith Busch
On Wed, Jul 25, 2018 at 03:52:17PM +, Bart Van Assche wrote:
> On Mon, 2018-07-23 at 08:37 -0600, Keith Busch wrote:
> > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> > index 8932ae81a15a..2715cdaa669c 100644
> > --- a/drivers/scsi/scsi_error.c
&
On Wed, Jul 25, 2018 at 01:22:47PM +0200, Christoph Hellwig wrote:
> > + pmap = kmap_atomic(iv.bv_page) + iv.bv_offset;
> > + p = pmap;
>
> Maybe:
>
> pmap = p = kmap_atomic(iv.bv_page) + iv.bv_offset;
Max pointed out that even with
On Tue, Jul 24, 2018 at 04:33:41PM +0300, Max Gurtovoy wrote:
> +void t10_pi_prepare(struct request *rq, u8 protection_type)
> +{
> + const int tuple_sz = rq->q->integrity.tuple_size;
> + u32 ref_tag = t10_pi_ref_tag(rq);
> + struct bio *bio;
> +
> + if (protection_type ==
On Sun, Jul 22, 2018 at 12:49:57PM +0300, Max Gurtovoy wrote:
> +void blk_integrity_dif_prepare(struct request *rq, u8 protection_type,
> +u32 ref_tag)
> +{
> + const int tuple_sz = sizeof(struct t10_pi_tuple);
> + struct bio *bio;
> + struct t10_pi_tuple
On Thu, Jul 19, 2018 at 06:29:24PM +0200, h...@lst.de wrote:
> How do we get a fix into 4.18 at this part of the cycle? I think that
> is the most important prirority right now.
Even if you were okay at this point to incorporate the concepts from
Bart's patch, it still looks like trouble for
On Thu, Jul 19, 2018 at 04:04:46PM +, Bart Van Assche wrote:
> On Thu, 2018-07-19 at 09:56 -0600, Keith Busch wrote:
> > Even some scsi drivers are susceptible to losing their requests with the
> > reverted behavior: take virtio-scsi for example, which returns RESET_TIMER
> &
On Thu, Jul 19, 2018 at 08:59:31AM -0600, Keith Busch wrote:
> > And we should never even hit the timeout handler for those as it
> > is rather pointless (although it looks we currently do..).
>
> I don't see why we'd expect to never hit timeout for at least some of
> thes
On Thu, Jul 19, 2018 at 03:19:04PM +0200, h...@lst.de wrote:
> On Wed, Jul 18, 2018 at 03:17:11PM -0600, Keith Busch wrote:
> > And there may be other drivers that don't want their completions
> > ignored, so breaking them again is also a step in the
On Wed, Jul 18, 2018 at 10:39:36PM +0200, h...@lst.de wrote:
> On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote:
> > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote:
> > > Of the two you mentioned, yours is preferable IMO. While I appreciate
> &g
On Wed, Jul 18, 2018 at 09:30:11PM +, Bart Van Assche wrote:
> On Wed, 2018-07-18 at 15:17 -0600, Keith Busch wrote:
> > There are not that many blk-mq drivers, so we can go through them all.
>
> I'm not sure that's the right approach. I think it is the responsibility of
&g
On Wed, Jul 18, 2018 at 08:58:48PM +, Bart Van Assche wrote:
> On Wed, 2018-07-18 at 14:53 -0600, Keith Busch wrote:
> > If scsi needs this behavior, why not just put that behavior in scsi? It
> > can set the state to complete and then everything can play ou
On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote:
> On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote:
> > Of the two you mentioned, yours is preferable IMO. While I appreciate
> > Jianchao's detailed analysis, it's hard to take a proposal seriously
> >
On Wed, Jul 18, 2018 at 05:18:45PM +, Bart Van Assche wrote:
> On Wed, 2018-07-18 at 11:00 -0600, Keith Busch wrote:
> > - cancel_work_sync(>timeout_work);
> > -
> > if (q->mq_ops) {
> > struct blk_mq_hw_ctx *hctx;
> >
-off-by: Keith Busch
---
block/blk-core.c | 5 ++---
block/blk-mq.c | 45 -
block/blk-timeout.c| 16 ++--
include/linux/blk-mq.h | 2 ++
4 files changed, 34 insertions(+), 34 deletions(-)
diff --git a/block/blk-core.c b
On Fri, Jul 13, 2018 at 11:03:18PM +, Bart Van Assche wrote:
> How do you want to go forward from here? Do you prefer the approach of the
> patch I had posted (https://www.spinics.net/lists/linux-block/msg26489.html),
> Jianchao's approach (https://marc.info/?l=linux-block=152950093831738) or
On Fri, Jul 13, 2018 at 03:52:38PM +, Bart Van Assche wrote:
> No. What I'm saying is that a behavior change has been introduced in the block
> layer that was not documented in the patch description.
Did you read the changelog?
> I think that behavior change even can trigger a kernel crash.
On Thu, Jul 12, 2018 at 10:24:42PM +, Bart Van Assche wrote:
> Before commit 12f5b9314545 ("blk-mq: Remove generation seqeunce"), if a
> request completion was reported after request timeout processing had
> started, completion handling was skipped. The following code in
>
On Thu, Jul 12, 2018 at 06:16:12PM +, Bart Van Assche wrote:
> On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote:
> > /*
> > -* We marked @rq->aborted_gstate and waited for RCU. If there were
> > -* completions that we lost to, they would have finishe
A device may have boundary restrictions where the number of sectors
between boundaries exceeds its max transfer size. In this case, we need
to cap the max size to the smaller of the two limits.
Reported-by: Jitendra Bhivare
Tested-by: Jitendra Bhivare
Cc:
Signed-off-by: Keith Busch
On Wed, Jun 20, 2018 at 11:08:05AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 18, 2018 at 11:32:06AM -0600, Keith Busch wrote:
> > The default mapping of a cpu to a hardware context is often generally
> > applicable, however a user may know of a more appropriate mapping for
>
device is removed and
re-added, or if the driver re-runs the queue mapping.
Signed-off-by: Keith Busch
---
block/blk-mq-debugfs.c | 16 ++
block/blk-mq-debugfs.h | 11 +++
block/blk-mq-sysfs.c | 80 +-
block/blk-mq.c | 9
On Tue, Jun 12, 2018 at 04:41:54PM -0700, austin.bo...@dell.com wrote:
> It looks like the test is setting the Link Disable bit. But this is not
> a good simulation for hot-plug surprise removal testing or surprise link
> down (SLD) testing, if that is the intent. One reason is that Link
>
On Wed, Jun 06, 2018 at 01:42:15PM +0800, Yi Zhang wrote:
> Here is the output, and I can see "HotPlug+ Surprise+" on SltCap
Thanks. That looks like a perfectly capable port. I even have the same
switch in one of my machines, but the test doesn't trigger fatal
firmware-first errors.
Might need
On Tue, Jun 05, 2018 at 10:18:53AM -0600, Keith Busch wrote:
> On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:
> > Hi Keith
> > I found blktest block/019 also can lead my NVMe server hang with
> > 4.17.0-rc7, let me know if you need more info, thanks.
> &g
On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote:
> Hi Keith
> I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7,
> let me know if you need more info, thanks.
>
> Server: Dell R730xd
> NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd
On Tue, Jun 05, 2018 at 06:47:33AM +0200, Christoph Hellwig wrote:
> But let's assume we don't want to use the list due to this concern:
> we'd still have to read the log page, as per the NVMe spec the only
> think clearing a pending AEN is reading the associated log page.
> We'd then need to
The PCI sysfs interface may not be a dependable method for toggling the
PCI device state to trigger the timeouts. This patch goes directly to
the config space to make device failure occur.
Signed-off-by: Keith Busch
---
v1 -> v2:
Toggling only PCI Command Register BME bit, rather t
On Sat, May 26, 2018 at 12:27:34PM +0200, Christoph Hellwig wrote:
> Per section 5.2 we need to issue the corresponding log page to clear an
> AEN, so for a namespace data changed AEN we need to read the changed
> namespace list log. And once we read that log anyway we might as well
> use it to
On Tue, May 29, 2018 at 12:54:28PM -0700, Omar Sandoval wrote:
> What's the plan for this test? Do you have a v2 coming?
Sorry for the delay. I've been out on holiday, but I'm catching up
quickly and will send a v2 shortly.
Thanks,
Keith
On Wed, May 16, 2018 at 12:03:04PM +0800, Ming Lei wrote:
> +static void nvme_set_host_mem_end_io(struct request *rq, blk_status_t sts)
> +{
> + struct completion *waiting = rq->end_io_data;
> +
> + rq->end_io_data = NULL;
> + blk_mq_free_request(rq);
> +
> + /*
> + * complete
On Wed, May 23, 2018 at 08:34:48AM +0800, Ming Lei wrote:
> Let's consider the normal NVMe timeout code path:
>
> 1) one request is timed out;
>
> 2) controller is shutdown, this timed-out request is requeued from
> nvme_cancel_request(), but can't dispatch because queues are quiesced
>
> 3)
On Wed, May 23, 2018 at 02:19:40PM +0200, Christoph Hellwig wrote:
> -static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx)
> - __releases(hctx->srcu)
> -{
> - if (!(hctx->flags & BLK_MQ_F_BLOCKING))
> - rcu_read_unlock();
> - else
> -
On Wed, May 23, 2018 at 08:02:31AM -0600, Keith Busch wrote:
> Looks like the cmpxchg is also needed if old_state is MQ_RQ_TIMED_OUT,
> otherwise its guaranteed to return 'true' and there's no point to the
> loop and 'if' check.
And I see v14 is already posted with that fix! :)
On Tue, May 22, 2018 at 09:25:15AM -0700, Bart Van Assche wrote:
> +static bool blk_mq_change_rq_state(struct request *rq,
> +enum mq_rq_state old_state,
> +enum mq_rq_state new_state)
> +{
> + union blk_generation_and_state
On Tue, May 22, 2018 at 08:36:27PM +, Bart Van Assche wrote:
>
> Have you noticed that if blk_mq_complete_request() encounters a request with
> state MQ_RQ_TIMED_OUT that it doesn't call __blk_mq_complete_request()? I
> think
> the code in blk_mq_complete_request() together with the above
On Tue, May 22, 2018 at 08:36:27PM +, Bart Van Assche wrote:
> Have you noticed that if blk_mq_complete_request() encounters a request with
> state MQ_RQ_TIMED_OUT that it doesn't call __blk_mq_complete_request()? I
> think
> the code in blk_mq_complete_request() together with the above code
On Tue, May 22, 2018 at 09:25:15AM -0700, Bart Van Assche wrote:
> @@ -848,13 +843,22 @@ static void blk_mq_rq_timed_out(struct request *req,
> bool reserved)
> case BLK_EH_RESET_TIMER:
> + blk_mq_add_timer(req);
> /*
> + * The loop is for the unlikely
On Tue, May 22, 2018 at 01:38:06PM -0600, Jens Axboe wrote:
> I guess it would be cleaner to actually do the transition, in
> blk_mq_rq_timed_out():
>
> case BLK_EH_HANDLED:
>
> if (blk_mq_change_rq_state(req,
On Tue, May 22, 2018 at 04:30:41PM +, Bart Van Assche wrote:
>
> Please have a look at v13 of the timeout handling rework patch that I posted.
> That patch should not introduce any new race conditions and should also handle
> the scenario fine in which blk_mq_complete_request() is called
On Tue, May 22, 2018 at 04:29:07PM +, Bart Van Assche wrote:
> Please have another look at the current code that handles request timeouts
> and completions. The current implementation guarantees that no double
> completions can occur but your patch removes essential aspects of that
>
On Tue, May 22, 2018 at 11:07:07PM +0800, Ming Lei wrote:
> > At approximately the same time, you're saying the driver that returned
> > EH_HANDLED may then call blk_mq_complete_request() in reference to the
> > *old* request that it returned EH_HANDLED for, incorrectly completing
>
> Because
On Tue, May 22, 2018 at 10:57:40PM +0800, Ming Lei wrote:
> OK, that still depends on driver's behaviour, even though it is true
> for NVMe, you still have to audit other drivers about this sync
> because it is required by your patch.
Okay, forget about nvme for a moment here. Please run this
On Tue, May 22, 2018 at 10:37:32PM +0800, Ming Lei wrote:
> On Tue, May 22, 2018 at 10:20 PM, Keith Busch
> <keith.bu...@linux.intel.com> wrote:
> > In the event the driver requests a normal completion, the timeout work
> > releasing the last reference doesn't do a se
On Tue, May 22, 2018 at 10:49:11AM +0800, Ming Lei wrote:
> On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote:
> > -static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
> > +static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
> > str
On Mon, May 21, 2018 at 11:29:06PM +, Bart Van Assche wrote:
> On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote:
> > switch (ret) {
> > case BLK_EH_HANDLED:
> > - __blk_mq_complete_request(req);
> > - break;
>
On Mon, May 21, 2018 at 11:29:21PM +, Bart Van Assche wrote:
> Can you explain why the NVMe driver needs reference counting of requests but
> no other block driver needs this? Additionally, why is it that for all block
> drivers except NVMe the current block layer API is sufficient
>
On Tue, May 22, 2018 at 09:46:32AM +0800, Ming Lei wrote:
> It isn't a good practice to rely on timing for avoiding race, IMO.
Sure thing, and it's easy enough to avoid this one.
> OK, please fix other issues mentioned in the following comment together,
> then I will review further and see if
, the request's
reference is taken only when it appears that actual timeout handling
needs to be done.
Keith Busch (3):
blk-mq: Reference count request usage
blk-mq: Fix timeout and state order
blk-mq: Remove generation seqeunce
block/blk-core.c | 6 -
block/blk-mq-debugfs.c | 1
The block layer had been setting the state to in-flight prior to updating
the timer. This is the wrong order since the timeout handler could observe
the in-flight state with the older timeout, believing the request had
expired when in fact it is just getting started.
Signed-off-by: Keith Busch
being reallocated as a new sequence
while timeout handling is operating on it.
Signed-off-by: Keith Busch <keith.bu...@intel.com>
---
block/blk-core.c | 6 --
block/blk-mq-debugfs.c | 1 -
block/blk-mq.c | 259 ++---
block/bl
This patch adds a struct kref to struct request so that request users
can be sure they're operating on the same request without it changing
while they're processing it. The request's tag won't be released for
reuse until the last user is done with it.
Signed-off-by: Keith Busch <keith
On Tue, May 22, 2018 at 12:08:37AM +0800, Ming Lei wrote:
> Please take a look at blk_mq_complete_request(). Even with Bart's
> change, the request still won't be completed by driver. The request can
> only be completed by either driver or blk-mq, not both.
So you're saying blk-mq can't complete
1 - 100 of 305 matches
Mail list logo