On Mon, Sep 18, 2017 at 03:18:16PM +, Bart Van Assche wrote:
> On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> > "if no request has completed before the delay has expired" can't be a
> > reason to rerun the queue, because the queue can still be busy.
>
> That statement of you shows that
On Fri, Sep 15, 2017 at 08:07:01PM +0200, Christoph Hellwig wrote:
> Hi Anish,
>
> I looked over the code a bit, and I'm rather confused by the newly
> added commands. Which controller supports them? Also the NVMe
> working group went down a very different way with the ALUA approch,
> which
On Wed, Sep 13, 2017 at 08:57:13AM +0200, Hannes Reinecke wrote:
> In general I am _not_ in favour of this approach.
>
> This is essentially the same level of multipath support we had in the
> old qlogic and lpfc drivers in 2.4/2.6 series, and it took us _years_ to
> get rid of this.
> Main
On Tue, Sep 19, 2017 at 7:08 AM, Keith Busch wrote:
> On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote:
>> On Mon, 2017-09-18 at 18:39 -0400, Keith Busch wrote:
>> > The nvme driver's use of blk_mq_reinit_tagset only happens during
>> > controller
We have to preempt freeze queue in scsi_device_quiesce(),
and unfreeze in scsi_device_resume(), so call scsi_device_resume()
for the device which is quiesced by scsi_device_quiesce().
Tested-by: Cathy Avery
Tested-by: Oleksandr Natalenko
Simply quiesing SCSI device and waiting for completeion of IO
dispatched to SCSI queue isn't safe, it is easy to use up
request pool because all allocated requests before can't
be dispatched when device is put in QIUESCE. Then no request
can be allocated for RQF_PREEMPT, and system may hang
The two APIs are required to allow request allocation of
RQF_PREEMPT when queue is preempt frozen.
We have to guarantee that normal freeze and preempt freeze
are run exclusive. Because for normal freezing, once
blk_freeze_queue_wait() is returned, no request can enter
queue any more.
Another
REQF_PREEMPT is a bit special because the request is required
to be dispatched to lld even when SCSI device is quiesced.
So this patch introduces __blk_get_request() to allow block
layer to allocate request when queue is preempt frozen, since we
will preempt freeze queue before quiescing SCSI
Both two are used for legacy and blk-mq, so rename them
as .freeze_wq and .freeze_depth for avoiding to confuse
people.
No functional change.
Tested-by: Cathy Avery
Tested-by: Oleksandr Natalenko
Signed-off-by: Ming Lei
---
We need to pass PREEMPT flags to blk_queue_enter()
for allocating request with RQF_PREEMPT in the
following patch.
Tested-by: Cathy Avery
Tested-by: Oleksandr Natalenko
Signed-off-by: Ming Lei
---
block/blk-core.c | 10
This patch just makes it explicitely.
Tested-by: Cathy Avery
Tested-by: Oleksandr Natalenko
Reviewed-by: Johannes Thumshirn
Signed-off-by: Ming Lei
---
block/blk-mq.c | 3 ++-
1 file changed, 2
The only change on legacy is that blk_drain_queue() is run
from blk_freeze_queue(), which is called in blk_cleanup_queue().
So this patch removes the explicit call of __blk_drain_queue() in
blk_cleanup_queue().
Tested-by: Cathy Avery
Tested-by: Oleksandr Natalenko
Hi,
The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
Once SCSI device is put into QUIESCE, no new request except for
RQF_PREEMPT can be dispatched to SCSI successfully, and
scsi_device_quiesce() just simply waits for completion of I/Os
dispatched to SCSI stack. It isn't
This usage is basically same with blk-mq, so that we can
support to freeze legacy queue easily.
Also 'wake_up_all(>mq_freeze_wq)' has to be moved
into blk_set_queue_dying() since both legacy and blk-mq
may wait on the wait queue of .mq_freeze_wq.
Tested-by: Cathy Avery
We will support to freeze queue on block legacy path too.
No functional change.
Tested-by: Cathy Avery
Tested-by: Oleksandr Natalenko
Signed-off-by: Ming Lei
---
block/bfq-iosched.c | 2 +-
block/blk-cgroup.c | 8
On Mon, Sep 18, 2017 at 11:14:38PM +, Bart Van Assche wrote:
> On Mon, 2017-09-18 at 19:08 -0400, Keith Busch wrote:
> > On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote:
> > > Are you sure that scenario can happen? The blk-mq core calls
> > > test_and_set_bit()
> > > for the
On 09/19/2017 07:51 AM, Christoph Hellwig wrote:
> On Sat, Sep 16, 2017 at 07:10:30AM +0800, Jianchao Wang wrote:
>> If the bio_integrity_merge_rq() return false or nr_phys_segments exceeds
>> the max_segments, the merging fails, but the bi_front/back_seg_size may
>> have been modified. To avoid
Taking a look at this it seems like using a lock in struct block_device
isn't the right thing to do anyway - all the action is on fields in
struct blk_trace, so having a lock inside that would make a lot more
sense.
It would also help to document what exactly we're actually protecting.
On Sat, Sep 16, 2017 at 07:10:30AM +0800, Jianchao Wang wrote:
> If the bio_integrity_merge_rq() return false or nr_phys_segments exceeds
> the max_segments, the merging fails, but the bi_front/back_seg_size may
> have been modified. To avoid it, move the sanity checking ahead.
>
> Signed-off-by:
Don't rename it to a way to long name. Either add a separate mutex
for your purpose (unless there is interaction between freezing and
blktrace, which I doubt), or properly comment the usage.
This patch adds initial multipath support to the nvme driver. For each
namespace we create a new block device node, which can be used to access
that namespace through any of the controllers that refer to it.
Currently we will always send I/O to the first available path, this will
be changed once
This allows us to manage the various uniqueue namespace identifiers
together instead needing various variables and arguments.
Signed-off-by: Christoph Hellwig
---
drivers/nvme/host/core.c | 69 +++-
drivers/nvme/host/nvme.h | 14
Introduce a new struct nvme_ns_head [1] that holds information about
an actual namespace, unlike struct nvme_ns, which only holds the
per-controller namespace information. For private namespaces there
is a 1:1 relation of the two, but for shared namespaces this lets us
discover all the paths to
This flag should be before the operation-specific REQ_NOUNMAP bit.
Signed-off-by: Christoph Hellwig
---
include/linux/blk_types.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index
Set aside a bit in the request/bio flags for driver use.
Signed-off-by: Christoph Hellwig
---
include/linux/blk_types.h | 5 +
1 file changed, 5 insertions(+)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index acc2f3cdc2fc..7ec2ed097a8a 100644
---
Hi all,
this series adds support for multipathing, that is accessing nvme
namespaces through multiple controllers to the nvme core driver.
It is a very thin and efficient implementation that relies on
close cooperation with other bits of the nvme driver, and few small
and simple block helpers.
From: James Smart
Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry
This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem. For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.
Signed-off-by: Christoph
This helper allows reinserting a bio into a new queue without much
overhead, but requires all queue limits to be the same for the upper
and lower queues, and it does not provide any recursion preventions.
Signed-off-by: Christoph Hellwig
---
block/blk-core.c | 32
This helpers allows to bounce steal the uncompleted bios from a request so
that they can be reissued on another path.
Signed-off-by: Christoph Hellwig
Reviewed-by: Sagi Grimberg
---
block/blk-core.c | 20
include/linux/blkdev.h | 2 ++
On Mon, 2017-09-18 at 19:08 -0400, Keith Busch wrote:
> On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote:
> > Are you sure that scenario can happen? The blk-mq core calls
> > test_and_set_bit()
> > for the REQ_ATOM_COMPLETE flag before any completion or timeout handler is
> >
On Mon, Sep 18, 2017 at 10:53:12PM +, Bart Van Assche wrote:
> On Mon, 2017-09-18 at 18:39 -0400, Keith Busch wrote:
> > The nvme driver's use of blk_mq_reinit_tagset only happens during
> > controller initialisation, but I'm seeing lost commands well after that
> > during normal and stable
On Mon, 2017-09-18 at 18:39 -0400, Keith Busch wrote:
> The nvme driver's use of blk_mq_reinit_tagset only happens during
> controller initialisation, but I'm seeing lost commands well after that
> during normal and stable running.
>
> The timing is pretty narrow to hit, but I'm pretty sure this
On Mon, Sep 18, 2017 at 10:07:58PM +, Bart Van Assche wrote:
> On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote:
> > I think we've always known it's possible to lose a request during timeout
> > handling, but just accepted that possibility. It seems to be causing
> > problems, though,
On Mon, 2017-09-18 at 18:03 -0400, Keith Busch wrote:
> I think we've always known it's possible to lose a request during timeout
> handling, but just accepted that possibility. It seems to be causing
> problems, though, leading to unnecessary error escalation and IO failures.
>
> The possiblity
I think we've always known it's possible to lose a request during timeout
handling, but just accepted that possibility. It seems to be causing
problems, though, leading to unnecessary error escalation and IO failures.
The possiblity arises when the block layer marks the request complete
prior to
Acked-by: Steven Rostedt (VMware)
for the series.
Jens, feel free to take this in your tree.
-- Steve
On Mon, 18 Sep 2017 14:53:49 -0400
Waiman Long wrote:
> v6:
> - Add a second patch to rename the bd_fsfreeze_mutex to
>
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound
to a single CPU that is selected at device creation time.
Signed-off-by: Dan Melnic
Reviewed-by: Josef Bacik
---
drivers/block/nbd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git
Thanks, I see that now.
What is the file hint using F_SET_FILE_RW_HINT used for?
It seems that if both are set, the one set first gets used
and, if only the file hint is set, it is not used at all.
On 9/18/2017 9:49 AM, Christoph Hellwig wrote:
On Mon, Sep 18, 2017 at 09:45:57AM -0600,
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound
to a single CPU that is selected at device creation time.
Signed-off-by: Dan Melnic
---
drivers/block/nbd.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
On Mon, Sep 18, 2017 at 12:56:17PM -0700, Dan Melnic wrote:
> Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound
> to a single CPU that is selected at device creation time.
>
> Signed-off-by: Dan Melnic
> ---
> drivers/block/nbd.c | 4 +++-
> 1 file changed, 3
The lockdep code had reported the following unsafe locking scenario:
CPU0CPU1
lock(s_active#228);
lock(>bd_mutex/1);
lock(s_active#228);
lock(>bd_mutex);
*** DEADLOCK
As the bd_fsfreeze_mutex is used by the blktrace subsystem as well,
it is now renamed to bd_fsfreeze_blktrace_mutex to better reflect
its purpose.
Signed-off-by: Waiman Long
---
fs/block_dev.c | 14 +++---
fs/gfs2/ops_fstype.c| 6 +++---
v6:
- Add a second patch to rename the bd_fsfreeze_mutex to
bd_fsfreeze_blktrace_mutex.
v5:
- Overload the bd_fsfreeze_mutex in block_device structure for
blktrace protection.
v4:
- Use blktrace_mutex in blk_trace_ioctl() as well.
v3:
- Use a global blktrace_mutex to
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound to asingle CPU
that is selected at device creation time
Signed-off-by: Dan Melnic
---
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index
On Sat, 2017-09-16 at 19:37 -0700, Waiman Long wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 339e737..330b572 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -448,7 +448,7 @@ struct block_device {
>
> /* The counter of freeze processes */
>
The write hint needs to be copied to the mapped filesystem
so it can be passed down to the nvme device driver.
v2: fix tabs in the email
Signed-off-by: Michael Moy
---
mm/filemap.c | 10 ++
1 file changed, 10 insertions(+)
diff --git a/mm/filemap.c
On Mon, Sep 18, 2017 at 09:45:57AM -0600, Michael Moy wrote:
> The write hint needs to be copied to the mapped filesystem
> so it can be passed down to the nvme device driver.
>
> v2: fix tabs in the email
If you want the write hint for buffered I/O you need to set it on the
inode using
On Sun, 2017-09-17 at 20:40 +0800, Ming Lei wrote:
> "if no request has completed before the delay has expired" can't be a
> reason to rerun the queue, because the queue can still be busy.
That statement of you shows that there are important aspects of the SCSI
core and dm-mpath driver that you
On 09/18/2017 07:47 AM, Tony Yang wrote:
> Dear All
>
> I'm compiling nvme, but encountered the following error, how can I
> solve it? Thanks
>
> CHK include/generated/compile.h
> CC [M] drivers/nvme/host/core.o
> drivers/nvme/host/core.c: In function ‘__nvme_submit_user_cmd’:
>
Dear All
I'm compiling nvme, but encountered the following error, how can I
solve it? Thanks
CHK include/generated/compile.h
CC [M] drivers/nvme/host/core.o
drivers/nvme/host/core.c: In function ‘__nvme_submit_user_cmd’:
drivers/nvme/host/core.c:631: error: ‘struct bio’ has no member
> On 17 Sep 2017, at 23.04, Rakesh Pandit wrote:
>
> Remove repeated calculation for number of channels while creating a
> target device.
>
> Signed-off-by: Rakesh Pandit
> ---
>
> This is also a trivial change I found while investigating/working on
>
52 matches
Mail list logo