Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread Keith Busch
On Tue, Dec 04, 2018 at 02:21:17PM -0700, Keith Busch wrote: > On Tue, Dec 04, 2018 at 11:33:33AM -0800, James Smart wrote: > > On 12/4/2018 9:48 AM, Keith Busch wrote: > > > Once quiesced, the proposed iterator can handle the final termination > > > of the request, perf

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread Keith Busch
On Tue, Dec 04, 2018 at 11:33:33AM -0800, James Smart wrote: > > > On 12/4/2018 9:48 AM, Keith Busch wrote: > > On Tue, Dec 04, 2018 at 09:38:29AM -0800, Sagi Grimberg wrote: > > > > > > Yes, I'm very much in favour of this, too. > > > > >

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread Keith Busch
On Tue, Dec 04, 2018 at 09:38:29AM -0800, Sagi Grimberg wrote: > > > > > Yes, I'm very much in favour of this, too. > > > > We always have this IMO slightly weird notion of stopping the queue, set > > > > some error flags in the driver, then _restarting_ the queue, just so > > > > that the driver

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-12-04 Thread Keith Busch
On Mon, Dec 03, 2018 at 05:33:06PM -0800, Sagi Grimberg wrote: > >> > > Yes, I'm very much in favour of this, too. > > We always have this IMO slightly weird notion of stopping the queue, set > > some error flags in the driver, then _restarting_ the queue, just so > > that the driver then sees

Re: [PATCH 04/13] nvme-pci: only allow polling with separate poll queues

2018-12-03 Thread Keith Busch
f-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 10/13] nvme-mpath: remove I/O polling support

2018-12-03 Thread Keith Busch
wed anyway since the head's current path could change, and you end up polling the wrong request_queue. Not really harmful other than some wasted CPU cycles, but might be worth thinking about if we want to bring mpath polling back. Reviewed-by: Keith Busch

Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-12-03 Thread Keith Busch
h we already take it without IRQ disabling. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 01:36:09PM -0700, Jens Axboe wrote: > On 11/30/18 1:26 PM, Keith Busch wrote: > > A driver may wish to iterate every tagged request, not just ones that > > satisfy blk_mq_request_started(). The intended use is so a driver may > > terminate entered

[PATCH 2/2] nvme: Remove queue flushing hack

2018-11-30 Thread Keith Busch
ve to deal with these conditions. Signed-off-by: Keith Busch --- drivers/nvme/host/core.c | 10 -- drivers/nvme/host/pci.c | 43 +++ 2 files changed, 35 insertions(+), 18 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 91

[PATCH 1/2] blk-mq: Export iterating all tagged requests

2018-11-30 Thread Keith Busch
A driver may wish to iterate every tagged request, not just ones that satisfy blk_mq_request_started(). The intended use is so a driver may terminate entered requests on quiesced queues. Signed-off-by: Keith Busch --- block/blk-mq-tag.c | 41 +++-- block

Re: [GIT PULL] nvme fixes for 4.20

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 08:26:24AM -0700, Jens Axboe wrote: > On 11/30/18 8:24 AM, Christoph Hellwig wrote: > > Various fixlets all over, including throwing in a 'default y' for the > > multipath code, given that we want people to actually enable it for full > > functionality. > > Why enable it

Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 12:08:09AM -0800, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 01:36:32PM -0700, Keith Busch wrote: > > On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote: > > > + > > > + /* handle any remaining CQEs */

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-30 Thread Keith Busch
On Fri, Nov 30, 2018 at 12:00:13AM -0800, Christoph Hellwig wrote: > On Thu, Nov 29, 2018 at 01:19:14PM -0700, Keith Busch wrote: > > On Thu, Nov 29, 2018 at 08:12:58PM +0100, Christoph Hellwig wrote: > > > +enum hctx_type { > > > + HCTX_TYPE_DEFAULT, /* all

Re: [PATCH 08/13] nvme-pci: remove the CQ lock for interrupt driven queues

2018-11-29 Thread Keith Busch
k. Since you're not using a lock anymore, a further optimization can complete the CQE inline with moving the cq head so that we don't go through queue twice. That can be a follow on, though, this patch looks fine. Reviewed-by: Keith Busch

Re: [PATCH 06/13] nvme-pci: refactor nvme_disable_io_queues

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:03PM +0100, Christoph Hellwig wrote: > Pass the opcode for the delete SQ/CQ command as an argument instead of > the somewhat confusing pass loop. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 07/13] nvme-pci: don't poll from irq context when deleting queues

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:04PM +0100, Christoph Hellwig wrote: > This is the last place outside of nvme_irq that handles CQEs from > interrupt context, and thus is in the way of removing the cq_lock for > normal queues, and avoiding lockdep warnings on the poll queues, for > which we already

Re: [PATCH 03/13] nvme-pci: cleanup SQ allocation a bit

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:13:00PM +0100, Christoph Hellwig wrote: > Use a bit flag to mark if the SQ was allocated from the CMB, and clean > up the surrounding code a bit. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 02/13] nvme-pci: use atomic bitops to mark a queue enabled

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 08:12:59PM +0100, Christoph Hellwig wrote: > This gets rid of all the messing with cq_vector and the ->polled field > by using an atomic bitop to mark the queue enabled or not. > > Signed-off-by: Christoph Hellwig Looks good. Reviewed-by: Keith Busch

Re: [PATCH 01/13] block: move queues types to the block layer

2018-11-29 Thread Keith Busch
of any kind */ > + > + HCTX_MAX_TYPES, > }; Well, there goes my plan to use this with Weighted-Round-Robin NVMe IO queues! I'm not that sad about it, though. Reviewed-by: Keith Busch

Re: [PATCH 3/7] nvme: implement mq_ops->commit_rqs() hook

2018-11-29 Thread Keith Busch
On Thu, Nov 29, 2018 at 10:06:20AM -0700, Jens Axboe wrote: > On 11/29/18 10:04 AM, Christoph Hellwig wrote: > > gcc never warns about unused static inline functions. Which makes a lot > > of sense at least for headers.. > > Not so much for non-headers :-) You can #include .c files too! :)

Re: [PATCH] nvme: Fix PCIe surprise removal scenario

2018-11-26 Thread Keith Busch
be longer term, if a request_queue has an active reference, it might make sense to have blk-mq's APIs consistently handle these sorts of things. Reviewed-by: Keith Busch > --- > drivers/nvme/host/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a

Re: [PATCHv3 2/3] scsi: Do not rely on blk-mq for double completions

2018-11-19 Thread Keith Busch
On Mon, Nov 19, 2018 at 12:58:15AM -0800, Christoph Hellwig wrote: > > index 5d83a162d03b..c1d5e4e36125 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -1635,8 +1635,11 @@ static blk_status_t scsi_mq_prep_fn(struct request > > *req) > > > > static void

Re: [PATCH 1/6] nvme: don't disable local ints for polled queue

2018-11-14 Thread Keith Busch
On Wed, Nov 14, 2018 at 12:43:22AM -0800, Christoph Hellwig wrote: > static int nvme_poll_noirq(struct blk_mq_hw_ctx *hctx, unsigned int tag) > { > struct nvme_queue *nvmeq = hctx->driver_data; > > u16 start, end; > bool found; > > > > if (!nvme_cqe_pending(nvmeq)) > >

Re: [PATCH 06/11] block: add polled wakeup task helper

2018-11-13 Thread Keith Busch
On Tue, Nov 13, 2018 at 08:42:28AM -0700, Jens Axboe wrote: > If we're polling for IO on a device that doesn't use interrupts, then > IO completion loop (and wake of task) is done by submitting task itself. > If that is the case, then we don't need to enter the wake_up_process() > function, we can

Re: [PATCH] block: Clear kernel memory before copying to user

2018-11-08 Thread Keith Busch
On Thu, Nov 08, 2018 at 07:10:58PM +0800, Ming Lei wrote: > I guess the issue may depend on specific QEMU version, just tried the test > over > virtio-scsi/sata/usb-storage emulated via qemu-2.10.2-1.fc27, not observed > this problem. I actually didn't use virtio-scsi, but it really doesn't

Re: [PATCH] block: Clear kernel memory before copying to user

2018-11-07 Thread Keith Busch
On Thu, Nov 08, 2018 at 09:12:59AM +0800, Ming Lei wrote: > Is it NVMe specific issue or common problem in other storage hardware? SCSI > does call blk_update_request() and handles partial completion. Not specific to NVMe. An example using SG_IO dumping 2MB of unsanitized kernel memory:

Re: [PATCH] block: Clear kernel memory before copying to user

2018-11-07 Thread Keith Busch
On Thu, Nov 08, 2018 at 12:03:41AM +0800, Ming Lei wrote: > On Wed, Nov 7, 2018 at 11:47 PM Keith Busch wrote: > > > > On Wed, Nov 07, 2018 at 11:44:59PM +0800, Ming Lei wrote: > > > blk_update_request() may tell us how much progress made, :-) > > > >

Re: [PATCH] block: Clear kernel memory before copying to user

2018-11-07 Thread Keith Busch
On Wed, Nov 07, 2018 at 11:44:59PM +0800, Ming Lei wrote: > blk_update_request() may tell us how much progress made, :-) Except when it doesn't, which is 100% of the time for many block drivers, including nvme.

Re: [PATCH] block: Clear kernel memory before copying to user

2018-11-07 Thread Keith Busch
On Wed, Nov 07, 2018 at 11:09:27PM +0800, Ming Lei wrote: > On Wed, Nov 7, 2018 at 10:42 PM Keith Busch wrote: > > > > If the kernel allocates a bounce buffer for user read data, this memory > > needs to be cleared before copying it to the user, otherwise it may leak >

[PATCH] block: Clear kernel memory before copying to user

2018-11-07 Thread Keith Busch
If the kernel allocates a bounce buffer for user read data, this memory needs to be cleared before copying it to the user, otherwise it may leak kernel memory to user space. Signed-off-by: Keith Busch --- block/bio.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/bio.c b/block/bio.c

Re: nvme-pci: number of queues off by one

2018-10-08 Thread Keith Busch
On Mon, Oct 08, 2018 at 02:58:21PM +0800, Dongli Zhang wrote: > I got the same result when emulating nvme with qemu: the VM has 12 cpu, while > the num_queues of nvme is 8. > > # uname -r > 4.14.1 > # ll /sys/block/nvme*n1/mq/*/cpu_list > -r--r--r-- 1 root root 4096 Oct 8 14:30

Re: [PATCH 2/5] nvme: register ns_id attributes as default sysfs groups

2018-09-28 Thread Keith Busch
e > Signed-off-by: Hannes Reinecke Thanks, looks great! Reviewed-by: Keith Busch

Re: [PATCH v8 13/13] nvmet: Optionally use PCI P2P memory

2018-09-27 Thread Keith Busch
but looking forward to it in the future. Looks good. Reviewed-by: Keith Busch

Re: [PATCH v8 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-09-27 Thread Keith Busch
afe to directly access memory mapped by memremap()/devm_memremap_pages(). > Architectures where this is not safe will not be supported by memremap() > and therefore will not be support PCI P2P and have no support for CMB. > > Signed-off-by: Logan Gunthorpe Looks good to me. Reviewed-by: Keith Busch

Re: [PATCH v8 10/13] nvme-pci: Add support for P2P memory in requests

2018-09-27 Thread Keith Busch
_P2P flag which tells the core to > set QUEUE_FLAG_PCI_P2P in the request queue. > > Signed-off-by: Logan Gunthorpe > Reviewed-by: Sagi Grimberg > Reviewed-by: Christoph Hellwig Looks good to me as well. Reviewed-by: Keith Busch

Re: [PATCH v8 11/13] nvme-pci: Add a quirk for a pseudo CMB

2018-09-27 Thread Keith Busch
> > Signed-off-by: Logan Gunthorpe > Reviewed-by: Sagi Grimberg The implementation looks fine, and this is why we have quirks ... but CMB has existed since 2014! :) Reviewed-by: Keith Busch

[PATCHv2] blk-mq: Allow blocking queue tag iter callbacks

2018-09-25 Thread Keith Busch
of reading it, which allows callbacks to make blocking calls. Fixes: f5e4d6357 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter") Cc: Jianchao Wang Signed-off-by: Keith Busch --- v1 -> v2: Leave the iterator interface as-is, making this chang

Re: [PATCH] blk-mq: Allow blocking queue tag iter callbacks

2018-09-25 Thread Keith Busch
On Tue, Sep 25, 2018 at 08:14:45AM -0700, Bart Van Assche wrote: > On Tue, 2018-09-25 at 08:39 -0600, Keith Busch wrote: > > On Tue, Sep 25, 2018 at 10:39:46AM +0800, jianchao.wang wrote: > > > But the issue is the left part of blk_mq_timeout_work is moved out of > > &g

Re: [PATCH] blk-mq: Allow blocking queue tag iter callbacks

2018-09-25 Thread Keith Busch
On Tue, Sep 25, 2018 at 10:39:46AM +0800, jianchao.wang wrote: > Hi Bart > > On 09/25/2018 10:20 AM, Bart Van Assche wrote: > > On 9/24/18 7:11 PM, jianchao.wang wrote: > >> Hi Keith > >> > >> On 09/25/2018 05:09 AM, Keith Busch wrote: > >>>

[PATCH] blk-mq: Allow blocking queue tag iter callbacks

2018-09-24 Thread Keith Busch
reference if the queue is not frozen instead of a rcu read lock. Fixes: f5e4d6357 ("blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter") Cc: Jianchao Wang Signed-off-by: Keith Busch --- block/blk-mq-tag.c | 30 +- block/blk-mq-tag.h | 2

Re: Regression caused by f5bbbbe4d635

2018-09-24 Thread Keith Busch
On Mon, Sep 24, 2018 at 12:51:07PM -0700, Bart Van Assche wrote: > On Mon, 2018-09-24 at 13:13 -0600, Keith Busch wrote: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 85a1c1a59c72..28d128450621 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c >

Re: Regression caused by f5bbbbe4d635

2018-09-24 Thread Keith Busch
On Mon, Sep 24, 2018 at 12:44:13PM -0600, Jens Axboe wrote: > Hi, > > This commit introduced a rcu_read_lock() inside > blk_mq_queue_tag_busy_iter() - this is problematic for the timout code, > since we now end up holding the RCU read lock over the timeout code. As > just one example, nvme ends

Re: [PATCH 5/5] blk-mq: Rework blk-mq timeout handling again

2018-07-27 Thread Keith Busch
On Fri, Jul 27, 2018 at 05:14:18PM +, Bart Van Assche wrote: > On Fri, 2018-07-27 at 11:04 -0600, Keith Busch wrote: > > On Fri, Jul 27, 2018 at 04:59:34PM +, Bart Van Assche wrote: > > > On Fri, 2018-07-27 at 10:57 -0600, Keith Busch wrote: > > > > You skip t

Re: [PATCH 5/5] blk-mq: Rework blk-mq timeout handling again

2018-07-27 Thread Keith Busch
On Fri, Jul 27, 2018 at 04:59:34PM +, Bart Van Assche wrote: > On Fri, 2018-07-27 at 10:57 -0600, Keith Busch wrote: > > You skip that code if the driver returns BLK_EH_DONT_RESET_TIMER. > > How about applying the following patch on top of this series? That works for me if you

Re: [PATCH 5/5] blk-mq: Rework blk-mq timeout handling again

2018-07-27 Thread Keith Busch
On Fri, Jul 27, 2018 at 04:51:05PM +, Bart Van Assche wrote: > On Fri, 2018-07-27 at 10:46 -0600, Keith Busch wrote: > > On Fri, Jul 27, 2018 at 09:20:42AM -0700, Bart Van Assche wrote: > > > + ret = req->q->mq_ops->timeout(req, reserved); > > > + /* >

Re: [PATCH 5/5] blk-mq: Rework blk-mq timeout handling again

2018-07-27 Thread Keith Busch
On Fri, Jul 27, 2018 at 09:20:42AM -0700, Bart Van Assche wrote: > + ret = req->q->mq_ops->timeout(req, reserved); > + /* > + * BLK_EH_DONT_RESET_TIMER means that the block driver either > + * completed the request or still owns the request and will > + * continue processing

Re: [PATCH 3/5] block: Split blk_add_timer()

2018-07-27 Thread Keith Busch
On Fri, Jul 27, 2018 at 09:20:40AM -0700, Bart Van Assche wrote: > +void blk_mq_add_timer(struct request *req) > +{ > + struct request_queue *q = req->q; > + > + if (!q->mq_ops) > + lockdep_assert_held(q->queue_lock); > + > + /* blk-mq has its own handler, so we don't need

Re: [PATCH v4 3/3] nvme: use blk API to remap ref tags for IOs with metadata

2018-07-25 Thread Keith Busch
> support). I'm afraid I won't be able to test any of this in the near future, but the code looks good to me! Acked-by: Keith Busch

Re: [PATCHv2 2/2] scsi: set timed out out mq requests to complete

2018-07-25 Thread Keith Busch
On Wed, Jul 25, 2018 at 03:52:17PM +, Bart Van Assche wrote: > On Mon, 2018-07-23 at 08:37 -0600, Keith Busch wrote: > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > > index 8932ae81a15a..2715cdaa669c 100644 > > --- a/drivers/scsi/scsi_error.c &

Re: [PATCH v3 2/3] block: move dif_prepare/dif_complete functions to block layer

2018-07-25 Thread Keith Busch
On Wed, Jul 25, 2018 at 01:22:47PM +0200, Christoph Hellwig wrote: > > + pmap = kmap_atomic(iv.bv_page) + iv.bv_offset; > > + p = pmap; > > Maybe: > > pmap = p = kmap_atomic(iv.bv_page) + iv.bv_offset; Max pointed out that even with

Re: [PATCH 2/3] block: move dif_prepare/dif_complete functions to block layer

2018-07-24 Thread Keith Busch
On Tue, Jul 24, 2018 at 04:33:41PM +0300, Max Gurtovoy wrote: > +void t10_pi_prepare(struct request *rq, u8 protection_type) > +{ > + const int tuple_sz = rq->q->integrity.tuple_size; > + u32 ref_tag = t10_pi_ref_tag(rq); > + struct bio *bio; > + > + if (protection_type ==

Re: [PATCH 1/2] block: move dif_prepare/dif_complete functions to block layer

2018-07-23 Thread Keith Busch
On Sun, Jul 22, 2018 at 12:49:57PM +0300, Max Gurtovoy wrote: > +void blk_integrity_dif_prepare(struct request *rq, u8 protection_type, > +u32 ref_tag) > +{ > + const int tuple_sz = sizeof(struct t10_pi_tuple); > + struct bio *bio; > + struct t10_pi_tuple

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 06:29:24PM +0200, h...@lst.de wrote: > How do we get a fix into 4.18 at this part of the cycle? I think that > is the most important prirority right now. Even if you were okay at this point to incorporate the concepts from Bart's patch, it still looks like trouble for

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 04:04:46PM +, Bart Van Assche wrote: > On Thu, 2018-07-19 at 09:56 -0600, Keith Busch wrote: > > Even some scsi drivers are susceptible to losing their requests with the > > reverted behavior: take virtio-scsi for example, which returns RESET_TIMER > &

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 08:59:31AM -0600, Keith Busch wrote: > > And we should never even hit the timeout handler for those as it > > is rather pointless (although it looks we currently do..). > > I don't see why we'd expect to never hit timeout for at least some of > thes

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-19 Thread Keith Busch
On Thu, Jul 19, 2018 at 03:19:04PM +0200, h...@lst.de wrote: > On Wed, Jul 18, 2018 at 03:17:11PM -0600, Keith Busch wrote: > > And there may be other drivers that don't want their completions > > ignored, so breaking them again is also a step in the

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 10:39:36PM +0200, h...@lst.de wrote: > On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote: > > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > > > Of the two you mentioned, yours is preferable IMO. While I appreciate > &g

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 09:30:11PM +, Bart Van Assche wrote: > On Wed, 2018-07-18 at 15:17 -0600, Keith Busch wrote: > > There are not that many blk-mq drivers, so we can go through them all. > > I'm not sure that's the right approach. I think it is the responsibility of &g

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 08:58:48PM +, Bart Van Assche wrote: > On Wed, 2018-07-18 at 14:53 -0600, Keith Busch wrote: > > If scsi needs this behavior, why not just put that behavior in scsi? It > > can set the state to complete and then everything can play ou

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 09:56:50PM +0200, h...@lst.de wrote: > On Fri, Jul 13, 2018 at 05:58:08PM -0600, Keith Busch wrote: > > Of the two you mentioned, yours is preferable IMO. While I appreciate > > Jianchao's detailed analysis, it's hard to take a proposal seriously > >

Re: [RFC PATCH] blk-mq: move timeout handling from queue to tagset

2018-07-18 Thread Keith Busch
On Wed, Jul 18, 2018 at 05:18:45PM +, Bart Van Assche wrote: > On Wed, 2018-07-18 at 11:00 -0600, Keith Busch wrote: > > - cancel_work_sync(>timeout_work); > > - > > if (q->mq_ops) { > > struct blk_mq_hw_ctx *hctx; > >

[RFC PATCH] blk-mq: move timeout handling from queue to tagset

2018-07-18 Thread Keith Busch
-off-by: Keith Busch --- block/blk-core.c | 5 ++--- block/blk-mq.c | 45 - block/blk-timeout.c| 16 ++-- include/linux/blk-mq.h | 2 ++ 4 files changed, 34 insertions(+), 34 deletions(-) diff --git a/block/blk-core.c b

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Keith Busch
On Fri, Jul 13, 2018 at 11:03:18PM +, Bart Van Assche wrote: > How do you want to go forward from here? Do you prefer the approach of the > patch I had posted (https://www.spinics.net/lists/linux-block/msg26489.html), > Jianchao's approach (https://marc.info/?l=linux-block=152950093831738) or

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Keith Busch
On Fri, Jul 13, 2018 at 03:52:38PM +, Bart Van Assche wrote: > No. What I'm saying is that a behavior change has been introduced in the block > layer that was not documented in the patch description. Did you read the changelog? > I think that behavior change even can trigger a kernel crash.

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-13 Thread Keith Busch
On Thu, Jul 12, 2018 at 10:24:42PM +, Bart Van Assche wrote: > Before commit 12f5b9314545 ("blk-mq: Remove generation seqeunce"), if a > request completion was reported after request timeout processing had > started, completion handling was skipped. The following code in >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-07-12 Thread Keith Busch
On Thu, Jul 12, 2018 at 06:16:12PM +, Bart Van Assche wrote: > On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote: > > /* > > -* We marked @rq->aborted_gstate and waited for RCU. If there were > > -* completions that we lost to, they would have finishe

[PATCH] block: Fix transfer when chuck sectors exceeds max

2018-06-26 Thread Keith Busch
A device may have boundary restrictions where the number of sectors between boundaries exceeds its max transfer size. In this case, we need to cap the max size to the smaller of the two limits. Reported-by: Jitendra Bhivare Tested-by: Jitendra Bhivare Cc: Signed-off-by: Keith Busch

Re: [RFC PATCH] blk-mq: User defined HCTX CPU mapping

2018-06-20 Thread Keith Busch
On Wed, Jun 20, 2018 at 11:08:05AM +0200, Christoph Hellwig wrote: > On Mon, Jun 18, 2018 at 11:32:06AM -0600, Keith Busch wrote: > > The default mapping of a cpu to a hardware context is often generally > > applicable, however a user may know of a more appropriate mapping for >

[RFC PATCH] blk-mq: User defined HCTX CPU mapping

2018-06-18 Thread Keith Busch
device is removed and re-added, or if the driver re-runs the queue mapping. Signed-off-by: Keith Busch --- block/blk-mq-debugfs.c | 16 ++ block/blk-mq-debugfs.h | 11 +++ block/blk-mq-sysfs.c | 80 +- block/blk-mq.c | 9

Re: blktests block/019 lead system hang

2018-06-13 Thread Keith Busch
On Tue, Jun 12, 2018 at 04:41:54PM -0700, austin.bo...@dell.com wrote: > It looks like the test is setting the Link Disable bit. But this is not > a good simulation for hot-plug surprise removal testing or surprise link > down (SLD) testing, if that is the intent. One reason is that Link >

Re: blktests block/019 lead system hang

2018-06-06 Thread Keith Busch
On Wed, Jun 06, 2018 at 01:42:15PM +0800, Yi Zhang wrote: > Here is the output, and I can see "HotPlug+ Surprise+" on SltCap Thanks. That looks like a perfectly capable port. I even have the same switch in one of my machines, but the test doesn't trigger fatal firmware-first errors. Might need

Re: blktests block/019 lead system hang

2018-06-05 Thread Keith Busch
On Tue, Jun 05, 2018 at 10:18:53AM -0600, Keith Busch wrote: > On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote: > > Hi Keith > > I found blktest block/019 also can lead my NVMe server hang with > > 4.17.0-rc7, let me know if you need more info, thanks. > &g

Re: blktests block/019 lead system hang

2018-06-05 Thread Keith Busch
On Wed, May 30, 2018 at 03:26:54AM -0400, Yi Zhang wrote: > Hi Keith > I found blktest block/019 also can lead my NVMe server hang with 4.17.0-rc7, > let me know if you need more info, thanks. > > Server: Dell R730xd > NVMe SSD: 85:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd

Re: [PATCH 13/14] nvme: use the changed namespaces list log to clear ns data changed AENs

2018-06-05 Thread Keith Busch
On Tue, Jun 05, 2018 at 06:47:33AM +0200, Christoph Hellwig wrote: > But let's assume we don't want to use the list due to this concern: > we'd still have to read the log page, as per the NVMe spec the only > think clearing a pending AEN is reading the associated log page. > We'd then need to

[blktests PATCHv2] Fix block/011 to not use sysfs for device disabling

2018-06-04 Thread Keith Busch
The PCI sysfs interface may not be a dependable method for toggling the PCI device state to trigger the timeouts. This patch goes directly to the config space to make device failure occur. Signed-off-by: Keith Busch --- v1 -> v2: Toggling only PCI Command Register BME bit, rather t

Re: [PATCH 13/14] nvme: use the changed namespaces list log to clear ns data changed AENs

2018-06-04 Thread Keith Busch
On Sat, May 26, 2018 at 12:27:34PM +0200, Christoph Hellwig wrote: > Per section 5.2 we need to issue the corresponding log page to clear an > AEN, so for a namespace data changed AEN we need to read the changed > namespace list log. And once we read that log anyway we might as well > use it to

Re: [PATCH blktests] Fix block/011 to not use sysfs for device disabling

2018-06-04 Thread Keith Busch
On Tue, May 29, 2018 at 12:54:28PM -0700, Omar Sandoval wrote: > What's the plan for this test? Do you have a v2 coming? Sorry for the delay. I've been out on holiday, but I'm catching up quickly and will send a v2 shortly. Thanks, Keith

Re: [PATCH V6 02/11] nvme: pci: cover timeout for admin commands running in EH

2018-05-24 Thread Keith Busch
On Wed, May 16, 2018 at 12:03:04PM +0800, Ming Lei wrote: > +static void nvme_set_host_mem_end_io(struct request *rq, blk_status_t sts) > +{ > + struct completion *waiting = rq->end_io_data; > + > + rq->end_io_data = NULL; > + blk_mq_free_request(rq); > + > + /* > + * complete

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-23 Thread Keith Busch
On Wed, May 23, 2018 at 08:34:48AM +0800, Ming Lei wrote: > Let's consider the normal NVMe timeout code path: > > 1) one request is timed out; > > 2) controller is shutdown, this timed-out request is requeued from > nvme_cancel_request(), but can't dispatch because queues are quiesced > > 3)

Re: [PATCH 13/14] blk-mq: Remove generation seqeunce

2018-05-23 Thread Keith Busch
On Wed, May 23, 2018 at 02:19:40PM +0200, Christoph Hellwig wrote: > -static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) > - __releases(hctx->srcu) > -{ > - if (!(hctx->flags & BLK_MQ_F_BLOCKING)) > - rcu_read_unlock(); > - else > -

Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

2018-05-23 Thread Keith Busch
On Wed, May 23, 2018 at 08:02:31AM -0600, Keith Busch wrote: > Looks like the cmpxchg is also needed if old_state is MQ_RQ_TIMED_OUT, > otherwise its guaranteed to return 'true' and there's no point to the > loop and 'if' check. And I see v14 is already posted with that fix! :)

Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

2018-05-23 Thread Keith Busch
On Tue, May 22, 2018 at 09:25:15AM -0700, Bart Van Assche wrote: > +static bool blk_mq_change_rq_state(struct request *rq, > +enum mq_rq_state old_state, > +enum mq_rq_state new_state) > +{ > + union blk_generation_and_state

Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 08:36:27PM +, Bart Van Assche wrote: > > Have you noticed that if blk_mq_complete_request() encounters a request with > state MQ_RQ_TIMED_OUT that it doesn't call __blk_mq_complete_request()? I > think > the code in blk_mq_complete_request() together with the above

Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 08:36:27PM +, Bart Van Assche wrote: > Have you noticed that if blk_mq_complete_request() encounters a request with > state MQ_RQ_TIMED_OUT that it doesn't call __blk_mq_complete_request()? I > think > the code in blk_mq_complete_request() together with the above code

Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 09:25:15AM -0700, Bart Van Assche wrote: > @@ -848,13 +843,22 @@ static void blk_mq_rq_timed_out(struct request *req, > bool reserved) > case BLK_EH_RESET_TIMER: > + blk_mq_add_timer(req); > /* > + * The loop is for the unlikely

Re: [PATCH v13] blk-mq: Rework blk-mq timeout handling again

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 01:38:06PM -0600, Jens Axboe wrote: > I guess it would be cleaner to actually do the transition, in > blk_mq_rq_timed_out(): > > case BLK_EH_HANDLED: > > if (blk_mq_change_rq_state(req,

Re: [RFC PATCH 0/3] blk-mq: Timeout rework

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 04:30:41PM +, Bart Van Assche wrote: > > Please have a look at v13 of the timeout handling rework patch that I posted. > That patch should not introduce any new race conditions and should also handle > the scenario fine in which blk_mq_complete_request() is called

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 04:29:07PM +, Bart Van Assche wrote: > Please have another look at the current code that handles request timeouts > and completions. The current implementation guarantees that no double > completions can occur but your patch removes essential aspects of that >

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 11:07:07PM +0800, Ming Lei wrote: > > At approximately the same time, you're saying the driver that returned > > EH_HANDLED may then call blk_mq_complete_request() in reference to the > > *old* request that it returned EH_HANDLED for, incorrectly completing > > Because

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 10:57:40PM +0800, Ming Lei wrote: > OK, that still depends on driver's behaviour, even though it is true > for NVMe, you still have to audit other drivers about this sync > because it is required by your patch. Okay, forget about nvme for a moment here. Please run this

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 10:37:32PM +0800, Ming Lei wrote: > On Tue, May 22, 2018 at 10:20 PM, Keith Busch > <keith.bu...@linux.intel.com> wrote: > > In the event the driver requests a normal completion, the timeout work > > releasing the last reference doesn't do a se

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 10:49:11AM +0800, Ming Lei wrote: > On Mon, May 21, 2018 at 05:11:31PM -0600, Keith Busch wrote: > > -static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx, > > +static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx, > > str

Re: [RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-22 Thread Keith Busch
On Mon, May 21, 2018 at 11:29:06PM +, Bart Van Assche wrote: > On Mon, 2018-05-21 at 17:11 -0600, Keith Busch wrote: > > switch (ret) { > > case BLK_EH_HANDLED: > > - __blk_mq_complete_request(req); > > - break; >

Re: [RFC PATCH 0/3] blk-mq: Timeout rework

2018-05-22 Thread Keith Busch
On Mon, May 21, 2018 at 11:29:21PM +, Bart Van Assche wrote: > Can you explain why the NVMe driver needs reference counting of requests but > no other block driver needs this? Additionally, why is it that for all block > drivers except NVMe the current block layer API is sufficient >

Re: [PATCH 3/6] nvme: Move all IO out of controller reset

2018-05-22 Thread Keith Busch
On Tue, May 22, 2018 at 09:46:32AM +0800, Ming Lei wrote: > It isn't a good practice to rely on timing for avoiding race, IMO. Sure thing, and it's easy enough to avoid this one. > OK, please fix other issues mentioned in the following comment together, > then I will review further and see if

[RFC PATCH 0/3] blk-mq: Timeout rework

2018-05-21 Thread Keith Busch
, the request's reference is taken only when it appears that actual timeout handling needs to be done. Keith Busch (3): blk-mq: Reference count request usage blk-mq: Fix timeout and state order blk-mq: Remove generation seqeunce block/blk-core.c | 6 - block/blk-mq-debugfs.c | 1

[RFC PATCH 2/3] blk-mq: Fix timeout and state order

2018-05-21 Thread Keith Busch
The block layer had been setting the state to in-flight prior to updating the timer. This is the wrong order since the timeout handler could observe the in-flight state with the older timeout, believing the request had expired when in fact it is just getting started. Signed-off-by: Keith Busch

[RFC PATCH 3/3] blk-mq: Remove generation seqeunce

2018-05-21 Thread Keith Busch
being reallocated as a new sequence while timeout handling is operating on it. Signed-off-by: Keith Busch <keith.bu...@intel.com> --- block/blk-core.c | 6 -- block/blk-mq-debugfs.c | 1 - block/blk-mq.c | 259 ++--- block/bl

[RFC PATCH 1/3] blk-mq: Reference count request usage

2018-05-21 Thread Keith Busch
This patch adds a struct kref to struct request so that request users can be sure they're operating on the same request without it changing while they're processing it. The request's tag won't be released for reuse until the last user is done with it. Signed-off-by: Keith Busch <keith

Re: [PATCH 1/6] nvme: Sync request queues on reset

2018-05-21 Thread Keith Busch
On Tue, May 22, 2018 at 12:08:37AM +0800, Ming Lei wrote: > Please take a look at blk_mq_complete_request(). Even with Bart's > change, the request still won't be completed by driver. The request can > only be completed by either driver or blk-mq, not both. So you're saying blk-mq can't complete

  1   2   3   4   >