Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE

2017-09-18 Thread Ming Lei
On Mon, Sep 18, 2017 at 03:18:16PM +, Bart Van Assche wrote: > On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote: > > "if no request has completed before the delay has expired" can't be a > > reason to rerun the queue, because the queue can still be busy. > > That statement of you shows that

Re: [PATCH 00/10] nvme multipath support on top of nvme-4.13 branch

2017-09-18 Thread Anish Jhaveri
On Fri, Sep 15, 2017 at 08:07:01PM +0200, Christoph Hellwig wrote: > Hi Anish, > > I looked over the code a bit, and I'm rather confused by the newly > added commands. Which controller supports them? Also the NVMe > working group went down a very different way with the ALUA approch, > which

Re: [PATCH 00/10] nvme multipath support on top of nvme-4.13 branch

2017-09-18 Thread anish . jhaveri
On Wed, Sep 13, 2017 at 08:57:13AM +0200, Hannes Reinecke wrote: > In general I am _not_ in favour of this approach. > > This is essentially the same level of multipath support we had in the > old qlogic and lpfc drivers in 2.4/2.6 series, and it took us _years_ to > get rid of this. > Main

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Ming Lei
On Tue, Sep 19, 2017 at 7:08 AM, Keith Busch wrote: > On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote: >> On Mon, 2017-09-18 at 18:39 -0400, Keith Busch wrote: >> > The nvme driver's use of blk_mq_reinit_tagset only happens during >> > controller

[PATCH V5 09/10] SCSI: transport_spi: resume a quiesced device

2017-09-18 Thread Ming Lei
We have to preempt freeze queue in scsi_device_quiesce(), and unfreeze in scsi_device_resume(), so call scsi_device_resume() for the device which is quiesced by scsi_device_quiesce(). Tested-by: Cathy Avery Tested-by: Oleksandr Natalenko

[PATCH V5 10/10] SCSI: preempt freeze block queue when SCSI device is put into quiesce

2017-09-18 Thread Ming Lei
Simply quiesing SCSI device and waiting for completeion of IO dispatched to SCSI queue isn't safe, it is easy to use up request pool because all allocated requests before can't be dispatched when device is put in QIUESCE. Then no request can be allocated for RQF_PREEMPT, and system may hang

[PATCH V5 07/10] block: introduce preempt version of blk_[freeze|unfreeze]_queue

2017-09-18 Thread Ming Lei
The two APIs are required to allow request allocation of RQF_PREEMPT when queue is preempt frozen. We have to guarantee that normal freeze and preempt freeze are run exclusive. Because for normal freezing, once blk_freeze_queue_wait() is returned, no request can enter queue any more. Another

[PATCH V5 08/10] block: allow to allocate req with RQF_PREEMPT when queue is preempt frozen

2017-09-18 Thread Ming Lei
REQF_PREEMPT is a bit special because the request is required to be dispatched to lld even when SCSI device is quiesced. So this patch introduces __blk_get_request() to allow block layer to allocate request when queue is preempt frozen, since we will preempt freeze queue before quiescing SCSI

[PATCH V5 05/10] block: rename .mq_freeze_wq and .mq_freeze_depth

2017-09-18 Thread Ming Lei
Both two are used for legacy and blk-mq, so rename them as .freeze_wq and .freeze_depth for avoiding to confuse people. No functional change. Tested-by: Cathy Avery Tested-by: Oleksandr Natalenko Signed-off-by: Ming Lei ---

[PATCH V5 06/10] block: pass flags to blk_queue_enter()

2017-09-18 Thread Ming Lei
We need to pass PREEMPT flags to blk_queue_enter() for allocating request with RQF_PREEMPT in the following patch. Tested-by: Cathy Avery Tested-by: Oleksandr Natalenko Signed-off-by: Ming Lei --- block/blk-core.c | 10

[PATCH V5 01/10] blk-mq: only run hw queues for blk-mq

2017-09-18 Thread Ming Lei
This patch just makes it explicitely. Tested-by: Cathy Avery Tested-by: Oleksandr Natalenko Reviewed-by: Johannes Thumshirn Signed-off-by: Ming Lei --- block/blk-mq.c | 3 ++- 1 file changed, 2

[PATCH V5 04/10] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait

2017-09-18 Thread Ming Lei
The only change on legacy is that blk_drain_queue() is run from blk_freeze_queue(), which is called in blk_cleanup_queue(). So this patch removes the explicit call of __blk_drain_queue() in blk_cleanup_queue(). Tested-by: Cathy Avery Tested-by: Oleksandr Natalenko

[PATCH V5 0/10] block/scsi: safe SCSI quiescing

2017-09-18 Thread Ming Lei
Hi, The current SCSI quiesce isn't safe and easy to trigger I/O deadlock. Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply waits for completion of I/Os dispatched to SCSI stack. It isn't

[PATCH V5 02/10] block: tracking request allocation with q_usage_counter

2017-09-18 Thread Ming Lei
This usage is basically same with blk-mq, so that we can support to freeze legacy queue easily. Also 'wake_up_all(>mq_freeze_wq)' has to be moved into blk_set_queue_dying() since both legacy and blk-mq may wait on the wait queue of .mq_freeze_wq. Tested-by: Cathy Avery

[PATCH V5 03/10] blk-mq: rename blk_mq_[freeze|unfreeze]_queue

2017-09-18 Thread Ming Lei
We will support to freeze queue on block legacy path too. No functional change. Tested-by: Cathy Avery Tested-by: Oleksandr Natalenko Signed-off-by: Ming Lei --- block/bfq-iosched.c | 2 +- block/blk-cgroup.c | 8

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Keith Busch
On Mon, Sep 18, 2017 at 11:14:38PM +, Bart Van Assche wrote: > On Mon, 2017-09-18 at 19:08 -0400, Keith Busch wrote: > > On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote: > > > Are you sure that scenario can happen? The blk-mq core calls > > > test_and_set_bit() > > > for the

Re: [PATCH] block: move sanity checking ahead of bi_front/back_seg_size updating

2017-09-18 Thread jianchao.wang
On 09/19/2017 07:51 AM, Christoph Hellwig wrote: > On Sat, Sep 16, 2017 at 07:10:30AM +0800, Jianchao Wang wrote: >> If the bio_integrity_merge_rq() return false or nr_phys_segments exceeds >> the max_segments, the merging fails, but the bi_front/back_seg_size may >> have been modified. To avoid

Re: [PATCH v6 1/2] blktrace: Fix potentail deadlock between delete & sysfs ops

2017-09-18 Thread Christoph Hellwig
Taking a look at this it seems like using a lock in struct block_device isn't the right thing to do anyway - all the action is on fields in struct blk_trace, so having a lock inside that would make a lot more sense. It would also help to document what exactly we're actually protecting.

Re: [PATCH] block: move sanity checking ahead of bi_front/back_seg_size updating

2017-09-18 Thread Christoph Hellwig
On Sat, Sep 16, 2017 at 07:10:30AM +0800, Jianchao Wang wrote: > If the bio_integrity_merge_rq() return false or nr_phys_segments exceeds > the max_segments, the merging fails, but the bi_front/back_seg_size may > have been modified. To avoid it, move the sanity checking ahead. > > Signed-off-by:

Re: [PATCH v6 2/2] block_dev: Rename bd_fsfreeze_mutex

2017-09-18 Thread Christoph Hellwig
Don't rename it to a way to long name. Either add a separate mutex for your purpose (unless there is interaction between freezing and blktrace, which I doubt), or properly comment the usage.

[PATCH 9/9] nvme: implement multipath access to nvme subsystems

2017-09-18 Thread Christoph Hellwig
This patch adds initial multipath support to the nvme driver. For each namespace we create a new block device node, which can be used to access that namespace through any of the controllers that refer to it. Currently we will always send I/O to the first available path, this will be changed once

[PATCH 7/9] nvme: introduce a nvme_ns_ids structure

2017-09-18 Thread Christoph Hellwig
This allows us to manage the various uniqueue namespace identifiers together instead needing various variables and arguments. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 69 +++- drivers/nvme/host/nvme.h | 14

[PATCH 8/9] nvme: track shared namespaces

2017-09-18 Thread Christoph Hellwig
Introduce a new struct nvme_ns_head [1] that holds information about an actual namespace, unlike struct nvme_ns, which only holds the per-controller namespace information. For private namespaces there is a 1:1 relation of the two, but for shared namespaces this lets us discover all the paths to

[PATCH 2/9] block: move REQ_NOWAIT

2017-09-18 Thread Christoph Hellwig
This flag should be before the operation-specific REQ_NOUNMAP bit. Signed-off-by: Christoph Hellwig --- include/linux/blk_types.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index

[PATCH 3/9] block: add REQ_DRV bit

2017-09-18 Thread Christoph Hellwig
Set aside a bit in the request/bio flags for driver use. Signed-off-by: Christoph Hellwig --- include/linux/blk_types.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index acc2f3cdc2fc..7ec2ed097a8a 100644 ---

nvme multipath support V2

2017-09-18 Thread Christoph Hellwig
Hi all, this series adds support for multipathing, that is accessing nvme namespaces through multiple controllers to the nvme core driver. It is a very thin and efficient implementation that relies on close cooperation with other bits of the nvme driver, and few small and simple block helpers.

[PATCH 1/9] nvme: allow timed-out ios to retry

2017-09-18 Thread Christoph Hellwig
From: James Smart Currently the nvme_req_needs_retry() applies several checks to see if a retry is allowed. On of those is whether the current time has exceeded the start time of the io plus the timeout length. This check, if an io times out, means there is never a retry

[PATCH 6/9] nvme: track subsystems

2017-09-18 Thread Christoph Hellwig
This adds a new nvme_subsystem structure so that we can track multiple controllers that belong to a single subsystem. For now we only use it to store the NQN, and to check that we don't have duplicate NQNs unless the involved subsystems support multiple controllers. Signed-off-by: Christoph

[PATCH 4/9] block: provide a direct_make_request helper

2017-09-18 Thread Christoph Hellwig
This helper allows reinserting a bio into a new queue without much overhead, but requires all queue limits to be the same for the upper and lower queues, and it does not provide any recursion preventions. Signed-off-by: Christoph Hellwig --- block/blk-core.c | 32

[PATCH 5/9] block: add a blk_steal_bios helper

2017-09-18 Thread Christoph Hellwig
This helpers allows to bounce steal the uncompleted bios from a request so that they can be reissued on another path. Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg --- block/blk-core.c | 20 include/linux/blkdev.h | 2 ++

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Bart Van Assche
On Mon, 2017-09-18 at 19:08 -0400, Keith Busch wrote: > On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote: > > Are you sure that scenario can happen? The blk-mq core calls > > test_and_set_bit() > > for the REQ_ATOM_COMPLETE flag before any completion or timeout handler is > >

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Keith Busch
On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote: > On Mon, 2017-09-18 at 18:39 -0400, Keith Busch wrote: > > The nvme driver's use of blk_mq_reinit_tagset only happens during > > controller initialisation, but I'm seeing lost commands well after that > > during normal and stable

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Bart Van Assche
On Mon, 2017-09-18 at 18:39 -0400, Keith Busch wrote: > The nvme driver's use of blk_mq_reinit_tagset only happens during > controller initialisation, but I'm seeing lost commands well after that > during normal and stable running. > > The timing is pretty narrow to hit, but I'm pretty sure this

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Keith Busch
On Mon, Sep 18, 2017 at 10:07:58PM +, Bart Van Assche wrote: > On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote: > > I think we've always known it's possible to lose a request during timeout > > handling, but just accepted that possibility. It seems to be causing > > problems, though,

Re: [RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Bart Van Assche
On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote: > I think we've always known it's possible to lose a request during timeout > handling, but just accepted that possibility. It seems to be causing > problems, though, leading to unnecessary error escalation and IO failures. > > The possiblity

[RFC PATCH] blk-mq: Fix lost request during timeout

2017-09-18 Thread Keith Busch
I think we've always known it's possible to lose a request during timeout handling, but just accepted that possibility. It seems to be causing problems, though, leading to unnecessary error escalation and IO failures. The possiblity arises when the block layer marks the request complete prior to

Re: [PATCH v6 0/2] blktrace: Fix deadlock problem

2017-09-18 Thread Steven Rostedt
Acked-by: Steven Rostedt (VMware) for the series. Jens, feel free to take this in your tree. -- Steve On Mon, 18 Sep 2017 14:53:49 -0400 Waiman Long wrote: > v6: > - Add a second patch to rename the bd_fsfreeze_mutex to >

[PATCH V3] block/ndb: add WQ_UNBOUND to the knbd-recv workqueue

2017-09-18 Thread Dan Melnic
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound to a single CPU that is selected at device creation time. Signed-off-by: Dan Melnic Reviewed-by: Josef Bacik --- drivers/block/nbd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git

Re: [PATCH v2] fs: pass the write life time hint to the mapped filesystem

2017-09-18 Thread Michael Moy
Thanks, I see that now. What is the file hint using F_SET_FILE_RW_HINT used for? It seems that if both are set, the one set first gets used and, if only the file hint is set, it is not used at all. On 9/18/2017 9:49 AM, Christoph Hellwig wrote: On Mon, Sep 18, 2017 at 09:45:57AM -0600,

[PATCH V2] block/ndb: add WQ_UNBOUND to the knbd-recv workqueue

2017-09-18 Thread Dan Melnic
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound to a single CPU that is selected at device creation time. Signed-off-by: Dan Melnic --- drivers/block/nbd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c

Re: [PATCH V2] block/ndb: add WQ_UNBOUND to the knbd-recv workqueue

2017-09-18 Thread Josef Bacik
On Mon, Sep 18, 2017 at 12:56:17PM -0700, Dan Melnic wrote: > Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound > to a single CPU that is selected at device creation time. > > Signed-off-by: Dan Melnic > --- > drivers/block/nbd.c | 4 +++- > 1 file changed, 3

[PATCH v6 1/2] blktrace: Fix potentail deadlock between delete & sysfs ops

2017-09-18 Thread Waiman Long
The lockdep code had reported the following unsafe locking scenario: CPU0CPU1 lock(s_active#228); lock(>bd_mutex/1); lock(s_active#228); lock(>bd_mutex); *** DEADLOCK

[PATCH v6 2/2] block_dev: Rename bd_fsfreeze_mutex

2017-09-18 Thread Waiman Long
As the bd_fsfreeze_mutex is used by the blktrace subsystem as well, it is now renamed to bd_fsfreeze_blktrace_mutex to better reflect its purpose. Signed-off-by: Waiman Long --- fs/block_dev.c | 14 +++--- fs/gfs2/ops_fstype.c| 6 +++---

[PATCH v6 0/2] blktrace: Fix deadlock problem

2017-09-18 Thread Waiman Long
v6: - Add a second patch to rename the bd_fsfreeze_mutex to bd_fsfreeze_blktrace_mutex. v5: - Overload the bd_fsfreeze_mutex in block_device structure for blktrace protection. v4: - Use blktrace_mutex in blk_trace_ioctl() as well. v3: - Use a global blktrace_mutex to

[PATCH] block/ndb: add WQ_UNBOUND to the knbd-recv workqueue

2017-09-18 Thread Dan Melnic
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound to asingle CPU that is selected at device creation time Signed-off-by: Dan Melnic --- drivers/block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index

Re: [PATCH v5] blktrace: Fix potentail deadlock between delete & sysfs ops

2017-09-18 Thread Bart Van Assche
On Sat, 2017-09-16 at 19:37 -0700, Waiman Long wrote: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 339e737..330b572 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -448,7 +448,7 @@ struct block_device { > > /* The counter of freeze processes */ >

[PATCH v2] fs: pass the write life time hint to the mapped filesystem

2017-09-18 Thread Michael Moy
The write hint needs to be copied to the mapped filesystem so it can be passed down to the nvme device driver. v2: fix tabs in the email Signed-off-by: Michael Moy --- mm/filemap.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/mm/filemap.c

Re: [PATCH v2] fs: pass the write life time hint to the mapped filesystem

2017-09-18 Thread Christoph Hellwig
On Mon, Sep 18, 2017 at 09:45:57AM -0600, Michael Moy wrote: > The write hint needs to be copied to the mapped filesystem > so it can be passed down to the nvme device driver. > > v2: fix tabs in the email If you want the write hint for buffered I/O you need to set it on the inode using

Re: [PATCH 1/5] block: don't call blk_mq_delay_run_hw_queue() in case of BLK_STS_RESOURCE

2017-09-18 Thread Bart Van Assche
On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote: > "if no request has completed before the delay has expired" can't be a > reason to rerun the queue, because the queue can still be busy. That statement of you shows that there are important aspects of the SCSI core and dm-mpath driver that you

Re: compile nvme not find bi_disk

2017-09-18 Thread Jens Axboe
On 09/18/2017 07:47 AM, Tony Yang wrote: > Dear All > > I'm compiling nvme, but encountered the following error, how can I > solve it? Thanks > > CHK include/generated/compile.h > CC [M] drivers/nvme/host/core.o > drivers/nvme/host/core.c: In function ‘__nvme_submit_user_cmd’: >

compile nvme not find bi_disk

2017-09-18 Thread Tony Yang
Dear All I'm compiling nvme, but encountered the following error, how can I solve it? Thanks CHK include/generated/compile.h CC [M] drivers/nvme/host/core.o drivers/nvme/host/core.c: In function ‘__nvme_submit_user_cmd’: drivers/nvme/host/core.c:631: error: ‘struct bio’ has no member

Re: [PATCH] lightnvm: remove already calculated nr_chnls

2017-09-18 Thread Javier González
> On 17 Sep 2017, at 23.04, Rakesh Pandit wrote: > > Remove repeated calculation for number of channels while creating a > target device. > > Signed-off-by: Rakesh Pandit > --- > > This is also a trivial change I found while investigating/working on >